net: phy: add support for disabling PHY-autonomous EEE
Some PHYs (e.g. Broadcom BCM54xx, Realtek RTL8211F) implement
autonomous EEE where the PHY manages LPI signaling without forwarding
it to the MAC. This conflicts with MAC drivers that implement their own
LPI control.
Add a .disable_autonomous_eee callback to struct phy_driver and call it
from phy_support_eee(). When a MAC driver indicates it supports EEE via
phy_support_eee(), the PHY's autonomous EEE is automatically disabled so
the MAC can manage LPI entry/exit.
net: airoha: Add missing RX_CPU_IDX() configuration in airoha_qdma_cleanup_rx_queue()
When the descriptor index written in REG_RX_CPU_IDX() is equal to the one
stored in REG_RX_DMA_IDX(), the hw will stop since the QDMA RX ring is
empty.
Add missing REG_RX_CPU_IDX() configuration in airoha_qdma_cleanup_rx_queue
routine during QDMA RX ring cleanup.
====================
ynl/ethtool/netlink: fix nla_len overflow for large string sets
This series addresses a silent data corruption issue triggered when ynl
retrieves string sets from NICs with a large number of statistics entries
(e.g. mlx5_core with thousands of ETH_SS_STATS strings).
The root cause is that struct nlattr.nla_len is a __u16 (max 65535
bytes). When a NIC exports enough statistics strings, the
ETHTOOL_A_STRINGSET_STRINGS nest built by strset_fill_set() exceeds
this limit. nla_nest_end() silently truncates the length on assignment,
producing a corrupted netlink message.
Patch 1 moves ethtool.py to selftest.
Patch 2 improves the ethtool tool: rename the doit/dumpit helpers
to do_set/do_get and convert do_get to use ynl.do() with an
explicit device header instead of a full dump with client-side filtering.
Patch 3 adds a --dbg-small-recv option to the YNL ethtool tool,
matching the same option already present in cli.py, to help debug netlink
message size issues
Patch 4 adds a new helper nla_nest_end_safe() to check whether the nla_len
is overflow and return -EMSGSIZE early if so.
Patch 5 uses the new helper in ethtool to make sure the ethtool doesn't
reply a corrupted netlink message.
====================
Hangbin Liu [Wed, 8 Apr 2026 07:08:53 +0000 (15:08 +0800)]
ethtool: strset: check nla_len overflow
The netlink attribute length field nla_len is a __u16, which can only
represent values up to 65535 bytes. NICs with a large number of
statistics strings (e.g. mlx5_core with thousands of ETH_SS_STATS
entries) can produce a ETHTOOL_A_STRINGSET_STRINGS nest that exceeds
this limit.
When nla_nest_end() writes the actual nest size back to nla_len, the
value is silently truncated. This results in a corrupted netlink message
being sent to userspace: the parser reads a wrong (truncated) attribute
length and misaligns all subsequent attribute boundaries, causing decode
errors.
Fix this by using the new helper nla_nest_end_safe and error out if
the size exceeds U16_MAX.
Hangbin Liu [Wed, 8 Apr 2026 07:08:52 +0000 (15:08 +0800)]
netlink: add a nla_nest_end_safe() helper
The nla_len field in struct nlattr is a __u16, which can only hold
values up to 65535. If a nested attribute grows beyond this limit,
nla_nest_end() silently truncates the length, producing a corrupted
netlink message with no indication of the problem.
Since nla_nest_end() is used everywhere and this issue rarely happens,
let's add a new helper to check the length.
Hangbin Liu [Wed, 8 Apr 2026 07:08:51 +0000 (15:08 +0800)]
tools: ynl: ethtool: add --dbg-small-recv option
Add a --dbg-small-recv debug option to control the recv() buffer size
used by YNL, matching the same option already present in cli.py. This
is useful if user need to get large netlink message.
Hangbin Liu [Wed, 8 Apr 2026 07:08:50 +0000 (15:08 +0800)]
tools: ynl: ethtool: use doit instead of dumpit for per-device GET
Rename the local helper doit() to do_set() and dumpit() to do_get() to
better reflect their purpose.
Convert do_get() to use ynl.do() with an explicit device header instead
of ynl.dump() followed by client-side filtering. This is more efficient
as the kernel only processes and returns data for the requested device,
rather than dumping all devices across the netns.
====================
net: mana: Fix debugfs directory naming and file lifecycle
This series fixes two pre-existing debugfs issues in the MANA driver.
Patch 1 fixes the per-device debugfs directory naming to use the unique
PCI BDF address via pci_name(), avoiding a potential NULL pointer
dereference when pdev->slot is NULL (e.g. VFIO passthrough, nested KVM)
and preventing name collisions across multiple PFs or VFs.
Patch 2 moves the current_speed debugfs file creation from
mana_probe_port() to mana_init_port() so it survives detach/attach
cycles triggered by MTU changes or XDP program changes.
====================
net: mana: Move current_speed debugfs file to mana_init_port()
Move the current_speed debugfs file creation from mana_probe_port() to
mana_init_port(). The file was previously created only during initial
probe, but mana_cleanup_port_context() removes the entire vPort debugfs
directory during detach/attach cycles. Since mana_init_port() recreates
the directory on re-attach, moving current_speed here ensures it survives
these cycles.
Fixes: 75cabb46935b ("net: mana: Add support for net_shaper_ops") Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260408081224.302308-3-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: mana: Use pci_name() for debugfs directory naming
Use pci_name(pdev) for the per-device debugfs directory instead of
hardcoded "0" for PFs and pci_slot_name(pdev->slot) for VFs. The
previous approach had two issues:
1. pci_slot_name() dereferences pdev->slot, which can be NULL for VFs
in environments like generic VFIO passthrough or nested KVM,
causing a NULL pointer dereference.
2. Multiple PFs would all use "0", and VFs across different PCI
domains or buses could share the same slot name, leading to
-EEXIST errors from debugfs_create_dir().
pci_name(pdev) returns the unique BDF address, is always valid, and is
unique across the system.
Fixes: 6607c17c6c5e ("net: mana: Enable debugfs files for MANA device") Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260408081224.302308-2-ernis@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nfc: llcp: add missing return after LLCP_CLOSED checks
In nfc_llcp_recv_hdlc() and nfc_llcp_recv_disc(), when the socket
state is LLCP_CLOSED, the code correctly calls release_sock() and
nfc_llcp_sock_put() but fails to return. Execution falls through to
the remainder of the function, which calls release_sock() and
nfc_llcp_sock_put() again. This results in a double release_sock()
and a refcount underflow via double nfc_llcp_sock_put(), leading to
a use-after-free.
Add the missing return statements after the LLCP_CLOSED branches
in both functions to prevent the fall-through.
====================
bng_en: add link management and statistics support
This series enhances the bng_en driver by adding:
1. Link/PHY support
a. Link query
b. Async Link events
c. Ethtool link set/get functionality
2. Hardware statistics reporting via ethtool -S
This version incorporates feedback received prior to splitting the
original series into two parts.
====================
Implement the legacy ethtool statistics interface (get_sset_count,
get_strings, get_ethtool_stats) to expose hardware counters not
available through standard kernel stats APIs.
Implement netdev_stat_ops to provide standardized per-queue
statistics via the Netlink API.
Below is the description of the hardware drop counters:
rx-hw-drop-overruns: Packets dropped by HW due to resource limitations
(e.g., no BDs available in the host ring).
rx-hw-drops: Total packets dropped by HW (sum of overruns and error
drops).
tx-hw-drop-errors: Packets dropped by HW because they were invalid or
malformed.
tx-hw-drops: Total packets dropped by HW (sum of resource limitations
and error drops).
The implementation was verified using the ynl tool:
Implement the ndo_get_stats64 callback to report aggregate network
statistics. The driver gathers these by accumulating the per-ring
counters into the provided rtnl_link_stats64 structure.
bng_en: periodically fetch and accumulate hardware statistics
Use the timer to schedule periodic stats collection via
the workqueue when the link is up. Fetch fresh counters from
hardware via DMA and accumulate them into 64-bit software
shadows, handling wrap-around for counters narrower than
64 bits.
bng_en: add HW stats infra and structured ethtool ops
Implement the hardware-level statistics foundation and modern structured
ethtool operations.
1. Infrastructure: Add HWRM firmware wrappers (FUNC_QSTATS_EXT,
PORT_QSTATS_EXT, and PORT_QSTATS) to query ring and port counters.
2. Structured ops: Implement .get_eth_phy_stats, .get_eth_mac_stats,
.get_eth_ctrl_stats, .get_pause_stats, and .get_rmon_stats.
Stats are initially reported as 0; accumulation logic is added
in a subsequent patch.
Register for firmware asynchronous events, including link-status,
link-speed, and PHY configuration changes. Upon event reception,
re-query the PHY and update ethtool settings accordingly.
Implement .get_pauseparam and .set_pauseparam to support flow control
configuration. This allows reporting and setting of autoneg, RX pause,
and TX pause states.
bng_en: add ethtool link settings, get_link, and nway_reset
Add get/set_link_ksettings, get_link, and nway_reset support.
Report supported, advertised, and link-partner speeds across NRZ,
PAM4, and PAM4-112 signaling modes. Enable lane count reporting.
bng_en: query PHY capabilities and report link status
Query PHY capabilities and supported speeds from firmware,
retrieve current link state (speed, duplex, pause, FEC),
and log the information. Seed initial link state during probe.
bng_en: add per-PF workqueue, timer, and slow-path task
Add a dedicated single-thread workqueue and a timer for each PF
to drive deferred slow-path work such as link event handling and
stats collection. The timer is stopped via timer_delete_sync()
when interrupts are disabled and restarted on open.
While the close path stops the timer to prevent new tasks from
being scheduled, the sp_task and workqueue are preserved to
maintain state continuity. Final draining and destruction of
the workqueue are handled during PCI remove.
Igor Pylypiv [Sun, 12 Apr 2026 15:36:37 +0000 (08:36 -0700)]
ata: libata-scsi: fix requeue of deferred ATA PASS-THROUGH commands
Commit 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
introduced ata_scsi_requeue_deferred_qc() to handle commands deferred
during resets or NCQ failures. This deferral logic completed commands
with DID_SOFT_ERROR to trigger a retry in the SCSI mid-layer.
However, DID_SOFT_ERROR is subject to scsi_cmd_retry_allowed() checks.
ATA PASS-THROUGH commands sent via SG_IO ioctl have scmd->allowed set
to zero. This causes the mid-layer to fail the command immediately
instead of retrying, even though the command was never actually issued
to the hardware.
Switch to DID_REQUEUE to ensure these commands are inserted back into
the request queue regardless of retry limits.
Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Niklas Cassel <cassel@kernel.org>
====================
Add TSO map-once DMA helpers and bnxt SW USO support
Greetings:
This series extends net/tso to add a data structure and some helpers allowing
drivers to DMA map headers and packet payloads a single time. The helpers can
then be used to reference slices of shared mapping for each segment. This
helps to avoid the cost of repeated DMA mappings, especially on systems which
use an IOMMU. N per-packet DMA maps are replaced with a single map for the
entire GSO skb. As of v3, the series uses the DMA IOVA API (as suggested by
Leon [1]) and provides a fallback path when an IOMMU is not in use. The DMA
IOVA API provides even better efficiency than the v2; see below.
The added helpers are then used in bnxt to add support for software UDP
Segmentation Offloading (SW USO) for older bnxt devices which do not have
support for USO in hardware. Since the helpers are generic, other drivers
can be extended similarly.
The v2 showed a ~4x reduction in DMA mapping calls at the same wire packet
rate on production traffic with a bnxt device. The v3, however, shows a larger
reduction of about ~6x at the same wire packet rate. This is thanks to Leon's
suggestion of using the DMA IOVA API [1].
Special care is taken to make bnxt ethtool operations work correctly: the ring
size cannot be reduced below a minimum threshold while USO is enabled and
growing the ring automatically re-enables USO if it was previously blocked.
This v10 contains some cosmetic changes (wrapping long lines), moves the test
to the correct directory, and attempts to fix the slot availability check
added in the v9.
I re-ran the python test and the test passed on my bnxt system. I also ran
this on a production system.
====================
Joe Damato [Wed, 8 Apr 2026 23:05:58 +0000 (16:05 -0700)]
net: bnxt: Dispatch to SW USO
Wire in the SW USO path added in preceding commits when hardware USO is
not possible.
When a GSO skb with SKB_GSO_UDP_L4 arrives and the NIC lacks HW USO
capability, redirect to bnxt_sw_udp_gso_xmit() which handles software
segmentation into individual UDP frames submitted directly to the TX
ring.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-10-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Wed, 8 Apr 2026 23:05:57 +0000 (16:05 -0700)]
net: bnxt: Add SW GSO completion and teardown support
Update __bnxt_tx_int and bnxt_free_one_tx_ring_skbs to handle SW GSO
segments:
- MID segments: adjust tx_pkts/tx_bytes accounting and skip skb free
(the skb is shared across all segments and freed only once)
- LAST segments: call tso_dma_map_complete() to tear down the IOVA
mapping if one was used. On the fallback path, payload DMA unmapping
is handled by the existing per-BD dma_unmap_len walk.
Both MID and LAST completions advance tx_inline_cons to release the
segment's inline header slot back to the ring.
is_sw_gso is initialized to zero, so the new code paths are not run.
Add logic for feature advertisement and guardrails for ring sizing.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-9-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Wed, 8 Apr 2026 23:05:56 +0000 (16:05 -0700)]
net: bnxt: Implement software USO
Implement bnxt_sw_udp_gso_xmit() using the core tso_dma_map API and
the pre-allocated TX inline buffer for per-segment headers.
The xmit path:
1. Calls tso_start() to initialize TSO state
2. Stack-allocates a tso_dma_map and calls tso_dma_map_init() to
DMA-map the linear payload and all frags upfront.
3. For each segment:
- Copies and patches headers via tso_build_hdr() into the
pre-allocated tx_inline_buf (DMA-synced per segment)
- Counts payload BDs via tso_dma_map_count()
- Emits long BD (header) + ext BD + payload BDs
- Payload BDs use tso_dma_map_next() which yields (dma_addr,
chunk_len, mapping_len) tuples.
Header BDs set dma_unmap_len=0 since the inline buffer is pre-allocated
and unmapped only at ring teardown.
Completion state is updated by calling tso_dma_map_completion_save() for
the last segment.
Joe Damato [Wed, 8 Apr 2026 23:05:55 +0000 (16:05 -0700)]
net: bnxt: Add boilerplate GSO code
Add bnxt_gso.c and bnxt_gso.h with a stub bnxt_sw_udp_gso_xmit()
function, SW USO constants (BNXT_SW_USO_MAX_SEGS,
BNXT_SW_USO_MAX_DESCS), and the is_sw_gso field in bnxt_sw_tx_bd
with BNXT_SW_GSO_MID/LAST markers.
The full SW USO implementation will be added in a future commit.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-7-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Wed, 8 Apr 2026 23:05:54 +0000 (16:05 -0700)]
net: bnxt: Add TX inline buffer infrastructure
Add per-ring pre-allocated inline buffer fields (tx_inline_buf,
tx_inline_dma, tx_inline_size) to bnxt_tx_ring_info and helpers to
allocate and free them. A producer and consumer (tx_inline_prod,
tx_inline_cons) are added to track which slot(s) of the inline buffer
are in-use.
The inline buffer will be used by the SW USO path for pre-allocated,
pre-DMA-mapped per-segment header copies. In the future, this
could be extended to support TX copybreak.
Allocation helper is marked __maybe_unused in this commit because it
will be wired in later.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-6-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Wed, 8 Apr 2026 23:05:53 +0000 (16:05 -0700)]
net: bnxt: Use dma_unmap_len for TX completion unmapping
Store the DMA mapping length in each TX buffer descriptor via
dma_unmap_len_set at submit time, and use dma_unmap_len at completion
time.
This is a no-op for normal packets but prepares for software USO,
where header BDs set dma_unmap_len to 0 because the header buffer
is unmapped collectively rather than per-segment.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-5-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Wed, 8 Apr 2026 23:05:50 +0000 (16:05 -0700)]
net: tso: Introduce tso_dma_map and helpers
Add struct tso_dma_map to tso.h for tracking DMA addresses of mapped
GSO payload data and tso_dma_map_completion_state.
The tso_dma_map combines DMA mapping storage with iterator state, allowing
drivers to walk pre-mapped DMA regions linearly. Includes fields for
the DMA IOVA path (iova_state, iova_offset, total_len) and a fallback
per-region path (linear_dma, frags[], frag_idx, offset).
The tso_dma_map_completion_state makes the IOVA completion state opaque
for drivers. Drivers are expected to allocate this and use the added
helpers to update the completion state.
Adds skb_frag_phys() to skbuff.h, returning the physical address
of a paged fragment's data, which is used by the tso_dma_map helpers
introduced in this commit described below.
The added TSO DMA map helpers are:
tso_dma_map_init(): DMA-maps the linear payload region and all frags
upfront. Prefers the DMA IOVA API for a single contiguous mapping with
one IOTLB sync; falls back to per-region dma_map_phys() otherwise.
Returns 0 on success, cleans up partial mappings on failure.
tso_dma_map_cleanup(): Handles both IOVA and fallback teardown paths.
tso_dma_map_count(): counts how many descriptors the next N bytes of
payload will need. Returns 1 if IOVA is used since the mapping is
contiguous.
tso_dma_map_next(): yields the next (dma_addr, chunk_len) pair.
On the IOVA path, each segment is a single contiguous chunk. On the
fallback path, indicates when a chunk starts a new DMA mapping so the
driver can set dma_unmap_len on that descriptor for completion-time
unmapping.
tso_dma_map_completion_save(): updates the completion state. Drivers
will call this at xmit time.
tso_dma_map_complete(): tears down the mapping at completion time and
returns true if the IOVA path was used. If it was not used, this is a
no-op and returns false.
Merge tag 'wq-for-7.0-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"This is a fix for a stall which triggers on ordered workqueues when
there are multiple inactive work items during workqueue property
changes through sysfs, which doesn't happen that frequently.
While really late, the fix is very low risk as it just repeats an
operation which is already being performed:
- Fix incomplete activation of multiple inactive works when
unplugging a pool_workqueue, where the pending_pwqs list
wasn't being updated for subsequent works"
* tag 'wq-for-7.0-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Add pool_workqueue to pending_pwqs list when unplugging multiple inactive works
Matt Turner [Fri, 3 Apr 2026 15:01:28 +0000 (11:01 -0400)]
alpha: Define pgprot_modify to silence tautological comparison warnings
Alpha's pgprot_noncached, pgprot_writecombine, and pgprot_device are
all identity macros, so the generic pgprot_modify() produces
tautological self-comparisons that GCC warns about:
include/linux/pgtable.h:1701:25: warning: self-comparison always
evaluates to true [-Wtautological-compare]
Since all caching attributes are no-ops on Alpha, define
pgprot_modify() to simply return newprot.
Magnus Lindholm [Thu, 9 Apr 2026 17:10:15 +0000 (19:10 +0200)]
alpha: add support for SECCOMP and SECCOMP_FILTER
Add SECCOMP and SECCOMP_FILTER support to the Alpha architecture and fix
syscall entry and ptrace issues uncovered by the seccomp-bpf selftests.
The syscall entry path is reworked to consistently track syscall state
using r0, r1 and r2:
- r1 holds the active syscall number
- r2 preserves the original syscall number for restart
- r0 carries the return value, with r19 (a3) indicating success/error
This allows syscall restarts to be permitted only for valid ERESTART*
return codes and prevents kernel-internal restart values from leaking to
userspace. The syscall tracing error marker is corrected to use the saved
syscall number slot, matching the Alpha ABI.
Additionally, implement minimal PTRACE_GETREGSET and PTRACE_SETREGSET
support for NT_PRSTATUS, exporting struct pt_regs directly. This fixes
ptrace-based seccomp tests that previously failed with -EIO.
With these changes, seccomp-bpf and ptrace syscall tests pass reliably on
Alpha.
Merge tag 'timers-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two fixes for the time/timers subsystem:
- Invert the inverted fastpath decision in check_tick_dependency(),
which prevents NOHZ full to stop the tick. That's a regression
introduced in the 7.0 merge window.
- Prevent a unpriviledged DoS in the clockevents code, where user
space can starve the timer interrupt by arming a timerfd or posix
interval timer in a tight loop with an absolute expiry time in the
past. The fix turned out to be incomplete and was was amended
yesterday to make it work on some 20 years old AMD machines as
well. All issues with it have been confirmed to be resolved by
various reporters"
* tag 'timers-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clockevents: Prevent timer interrupt starvation
tick/nohz: Fix inverted return value in check_tick_dependency() fast path
net: skb: clean up dead code after skb_kfree_head() simplification
Since commit 0f42e3f4fe2a ("net: skb: fix cross-cache free of
KFENCE-allocated skb head"), skb_kfree_head() always calls kfree()
and no longer uses end_offset to distinguish between skb_small_head_cache
and generic kmalloc caches.
Clean up the leftovers:
- Remove the unused end_offset parameter from skb_kfree_head() and
update all callers.
- Remove the SKB_SMALL_HEAD_HEADROOM guard in __skb_unclone_keeptruesize()
which was protecting the old skb_kfree_head() logic.
- Update the SKB_SMALL_HEAD_CACHE_SIZE comment to reflect that the
non-power-of-2 sizing is no longer used for free-path disambiguation.
Daniel Borkmann [Fri, 10 Apr 2026 07:23:34 +0000 (09:23 +0200)]
netkit: Don't emit scrub attribute for single device mode
When userspace reads a single mode netkit device via RTM_GETLINK,
it receives IFLA_NETKIT_SCRUB=NETKIT_SCRUB_DEFAULT attribute from
netkit_fill_info(). If that attribute is echoed back to recreate
the device, the seen_scrub presence check in netkit_new_link()
causes creation to fail with -EOPNOTSUPP. Since it has no meaning
for single devices at this point, just don't dump it.
Fixes: 481038960538 ("netkit: Add single device mode for netkit") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260410072334.548232-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The variable chip_rev stores the value read from the FPGA_REV
register and represents the FPGA revision. Rename it to fpga_rev
to better reflect its meaning.
Jakub Kicinski [Sun, 12 Apr 2026 16:39:20 +0000 (09:39 -0700)]
Merge tag 'nf-next-26-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter: updates for net-next
1-3) IPVS updates from Julian Anastasov to enhance visibility into
IPVS internal state by exposing hash size, load factor etc and
allows userspace to tune the load factor used for resizing hash
tables.
4) reject empty/not nul terminated device names from xt_physdev.
This isn't a bug fix; existing code doesn't require a c-string.
But clean this up anyway because conceptually the interface name
definitely should be a c-string.
5) Switch nfnetlink to skb_mac_header helpers that didn't exist back
when this code was written. This gives us additional debug checks
but is not intended to change functionality.
6) Let the xt ttl/hoplimit match reject unknown operator modes.
This is a cleanup, the evaluation function simply returns false when
the mode is out of range. From Marino Dzalto.
7) xt_socket match should enable defrag after all other checks. This
bug is harmless, historically defrag could not be disabled either
except by rmmod.
8) remove UDP-Lite conntrack support, from Fernando Fernandez Mancera.
9) Avoid a couple -Wflex-array-member-not-at-end warnings in the old
xtables 32bit compat code, from Gustavo A. R. Silva.
10) nftables fwd expression should drop packets when their ttl/hl has
expired. This is a bug fix deferred, its not deemed important
enough for -rc8.
11) Add additional checks before assuming the mac header is an ethernet
header, from Zhengchuan Liang.
* tag 'nf-next-26-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: require Ethernet MAC header before using eth_hdr()
netfilter: nft_fwd_netdev: check ttl/hl before forwarding
netfilter: x_tables: Avoid a couple -Wflex-array-member-not-at-end warnings
netfilter: conntrack: remove UDP-Lite conntrack support
netfilter: xt_socket: enable defrag after all other checks
netfilter: xt_HL: add pr_fmt and checkentry validation
netfilter: nfnetlink: prefer skb_mac_header helpers
netfilter: x_physdev: reject empty or not-nul terminated device names
ipvs: add conn_lfactor and svc_lfactor sysctl vars
ipvs: add ip_vs_status info
ipvs: show the current conn_tab size to users
====================
net/sched: act_ct: Only release RCU read lock after ct_ft
When looking up a flow table in act_ct in tcf_ct_flow_table_get(),
rhashtable_lookup_fast() internally opens and closes an RCU read critical
section before returning ct_ft.
The tcf_ct_flow_table_cleanup_work() can complete before refcount_inc_not_zero()
is invoked on the returned ct_ft resulting in a UAF on the already freed ct_ft
object. This vulnerability can lead to privilege escalation.
Analysis from zdi-disclosures@trendmicro.com:
When initializing act_ct, tcf_ct_init() is called, which internally triggers
tcf_ct_flow_table_get().
static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params)
At [1], rhashtable_lookup_fast() looks up and returns the corresponding ct_ft
from zones_ht . The lookup is performed within an RCU read critical section
through rcu_read_lock() / rcu_read_unlock(), which prevents the object from
being freed. However, at the point of function return, rcu_read_unlock() has
already been called, and there is nothing preventing ct_ft from being freed
before reaching refcount_inc_not_zero(&ct_ft->ref) at [2]. This interval becomes
the race window, during which ct_ft can be freed.
Free Process:
tcf_ct_flow_table_put() is executed through the path tcf_ct_cleanup() call_rcu()
tcf_ct_params_free_rcu() tcf_ct_params_free() tcf_ct_flow_table_put().
tcf_ct_flow_table_cleanup_work() frees ct_ft at [4]. When this function executes
between [1] and [2], UAF occurs.
This race condition has a very short race window, making it generally
difficult to trigger. Therefore, to trigger the vulnerability an msleep(100) was
inserted after[1]
Fixes: 138470a9b2cc2 ("net/sched: act_ct: fix lockdep splat in tcf_ct_flow_table_get") Reported-by: zdi-disclosures@trendmicro.com Tested-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260410111627.46611-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ASoC: dt-bindings: rockchip: convert rk3399-gru-sound to DT Schema
Convert the rockchip,rk3399-gru-sound.txt DT binding to DT Schema
format.
Update rockchip,cpu from a single I2S controller phandle to a
phandle-array. Add an optional second entry for the SPDIF controller,
as seen in rk3399-gru.dtsi, required by boards with DisplayPort audio.
Jakub Kicinski [Sun, 12 Apr 2026 16:17:42 +0000 (09:17 -0700)]
Merge tag 'wireless-next-2026-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Final updates, notably:
- crypto: move Michael MIC code into wireless (only)
- mac80211:
- multi-link 4-addr support
- NAN data support (but no drivers yet)
- ath10k: DT quirk to make it work on some devices
- ath12k: IPQ5424 support
- rtw89: USB improvements for performance
* tag 'wireless-next-2026-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (124 commits)
wifi: cfg80211: Explicitly include <linux/export.h> in michael-mic.c
wifi: ath10k: Add device-tree quirk to skip host cap QMI requests
dt-bindings: wireless: ath10k: Add quirk to skip host cap QMI requests
crypto: Remove michael_mic from crypto_shash API
wifi: ipw2x00: Use michael_mic() from cfg80211
wifi: ath12k: Use michael_mic() from cfg80211
wifi: ath11k: Use michael_mic() from cfg80211
wifi: mac80211, cfg80211: Export michael_mic() and move it to cfg80211
wifi: ipw2x00: Rename michael_mic() to libipw_michael_mic()
wifi: libertas_tf: refactor endpoint lookup
wifi: libertas: refactor endpoint lookup
wifi: at76c50x: refactor endpoint lookup
wifi: ath12k: Enable IPQ5424 WiFi device support
wifi: ath12k: Add CE remap hardware parameters for IPQ5424
wifi: ath12k: add ath12k_hw_regs for IPQ5424
wifi: ath12k: add ath12k_hw_version_map entry for IPQ5424
wifi: ath12k: Add ath12k_hw_params for IPQ5424
dt-bindings: net: wireless: add ath12k wifi device IPQ5424
wifi: ath10k: fix station lookup failure during disconnect
wifi: ath12k: Create symlink for each radio in a wiphy
...
====================
Daniel Borkmann [Fri, 10 Apr 2026 13:06:02 +0000 (15:06 +0200)]
net: Rename ifq_idx to rxq_idx in netif_mp_* helpers
Rename the leftover ifq_idx parameter naming to rxq_idx to be
consistent with the rest of the file and the header declaration.
Back then this was taken out of the queue leasing series given
the cleanup is independent. No functional change.
Jakub Kicinski [Fri, 10 Apr 2026 01:39:21 +0000 (18:39 -0700)]
selftests: net: py: add test case filtering and listing
When developing new test cases and reproducing failures in
existing ones we currently have to run the entire test which
can take minutes to finish.
Add command line options for test selection, modeled after
kselftest_harness.h:
-l list tests (filtered, if filters were specified)
-t name include test
-T name exclude test
Since we don't have as clean separation into fixture / variant /
test as kselftest_harness this is not really a 1 to 1 match.
We have to lean on glob patterns instead.
Like in kselftest_harness filters are evaluated in order, first
match wins. If only exclusions are specified everything else is
included and vice versa.
Glob patterns (*, ?, [) are supported in addition to exact
matching.
Jakub Kicinski [Fri, 10 Apr 2026 15:36:00 +0000 (08:36 -0700)]
net: fix reference tracker mismanagement in netdev_put_lock()
dev_put() releases a reference which didn't have a tracker.
References without a tracker are accounted in the tracking
code as "no_tracker". We can't free the tracker and then
call dev_put(). The references themselves will be fine
but the tracking code will think it's a double-release:
refcount_t: decrement hit 0; leaking memory.
IOW commit under fixes confused dev_put() (release never tracked
reference) with __dev_put() (just release the reference, skipping
the reference tracking infra).
Since __netdev_put_lock() uses dev_put() we can't feed a previously
tracked netdev ref into it. Let's flip things around.
netdev_put(dev, NULL) is the same as dev_put(dev) so make
netdev_put_lock() the real function and have __netdev_put_lock()
feed it a NULL tracker for all the cases that were untracked.
Eric Dumazet [Thu, 9 Apr 2026 08:52:38 +0000 (08:52 +0000)]
ipvlan: avoid spinlock contention in ipvlan_multicast_enqueue()
Under high stress, we spend a lot of time cloning skbs,
then acquiring a spinlock, then freeing the clone because
the queue is full.
Add a shortcut to avoid these costs under pressure, as we did
in macvlan with commit 0d5dc1d7aad1 ("macvlan: avoid spinlock
contention in macvlan_broadcast_enqueue()")
Network devices can have the same name within different network namespaces.
To help distinguish these devices, add the net_cookie value which can be
used to identify the netns.
====================
net: dsa: tag_rtl8_4: fixes doc and set keep
This small series addresses two points in the rtl8_4 tagger used by the
realtel rtl8365mb driver.
The first patch updates the documentation of the tag format while the
second patch sets the KEEP flag bit, ensuring that the switch
respects the frame's VLAN format as provided by the kernel.
These patches were previously part of a larger series but are being
submitted independently as they are self-contained and already
received review.
KEEP=1 is needed because we should respect the format of the packet as
the kernel sends it to us. Unless tx forward offloading is used, the
kernel is giving us the packet exactly as it should leave the specified
port on the wire. Until now this was not needed because the ports were
always functioning in a standalone mode in a VLAN-unaware way, so the
switch would not tag or untag frames anyway. But arguably it should have
been KEEP=1 all along.
Co-developed-by: Alvin Å ipraga <alsi@bang-olufsen.dk> Signed-off-by: Alvin Å ipraga <alsi@bang-olufsen.dk> Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260408-realtek_fixes-v1-2-915ff1404d56@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/sched: cls_fw: fix NULL dereference of "old" filters before change()
Like pointed out by Sashiko [1], since commit ed76f5edccc9 ("net: sched:
protect filter_chain list with filter_chain_lock mutex") TC filters are
added to a shared block and published to datapath before their ->change()
function is called. This is a problem for cls_fw: an invalid filter
created with the "old" method can still classify some packets before it
is destroyed by the validation logic added by Xiang.
Therefore, insisting with repeated runs of the following script:
# ip link add dev crash0 type dummy
# ip link set dev crash0 up
# mausezahn crash0 -c 100000 -P 10 \
> -A 4.3.2.1 -B 1.2.3.4 -t udp "dp=1234" -q &
# sleep 1
# tc qdisc add dev crash0 egress_block 1 clsact
# tc filter add block 1 protocol ip prio 1 matchall \
> action skbedit mark 65536 continue
# tc filter add block 1 protocol ip prio 2 fw
# ip link del dev crash0
can still make fw_classify() hit the WARN_ON() in [2]:
Skip "old-style" classification on shared blocks, so that the NULL
dereference is fixed and WARN_ON() is not hit anymore in the short
lifetime of invalid cls_fw "old-style" filters.
Save the current mode of flow control, and enhance the statistics of
pause frames.
The received pause frames are divided into XON and XOFF to be counted.
And due to the hardware defect of SP devices, XON packets cannot be
trasmitted correctly, so Tx XON pause is disabled by default for those
devices.
The WX_PX_MPRC registers are not clear-on-read hardware counters. The
previous implementation directly read and accumulated these 32-bit values
into a 64-bit software counter. Now implement a rd32_wrap() helper
function to calculate the delta counter to correct the statistic.
net: wangxun: schedule hardware stats update in watchdog
Hardware statistics should be updated periodically in the watchdog to
prevent 32-bit registers from overflowing. This is also required for the
upcoming pause frame accounting logic, which relies on regular statistics
sampling.
net: wangxun: reorder timer and work sync cancellations
When removing the device, timer_delete_sync(&wx->service_timer) is
called in .ndo_stop() after cancel_work_sync(&wx->service_task). This
may cause new work to be queued after device down.
Move unregister_netdev() before cancel_work_sync(), and use
timer_shutdown_sync() to prevent the timer from being re-armed.
====================
dpll: zl3073x: add ref-sync pair support
This series adds Reference-Sync pair support to the ZL3073x DPLL driver.
A Ref-Sync pair consists of a clock reference and a low-frequency sync
signal (e.g. 1 PPS) where the DPLL locks to the clock reference but
phase-aligns to the sync reference.
Patches 1-3 are preparatory cleanups and helper additions:
- Clean up esync get/set callbacks with early returns and use the
zl3073x_out_is_ndiv() helper
- Convert open-coded clear-and-set bitfield patterns to FIELD_MODIFY()
- Add ref sync control and output clock type accessor helpers
Patch 4 adds the 'ref-sync-sources' phandle-array property to the
dpll-pin device tree binding schema and updates the ZL3073x binding
examples.
Patch 5 implements the driver support:
- ref_sync_get/set callbacks with frequency validation
- Automatic sync source exclusion from reference selection
- Device tree based ref-sync pair registration
Tested and verified on Microchip EDS2 (pcb8385) development board.
====================
Ivan Vecera [Wed, 8 Apr 2026 10:27:16 +0000 (12:27 +0200)]
dpll: zl3073x: add ref-sync pair support
Add support for ref-sync pair registration using the 'ref-sync-sources'
phandle property from device tree. A ref-sync pair consists of a clock
reference and a low-frequency sync signal where the DPLL locks to the
clock reference but phase-aligns to the sync reference.
The implementation:
- Stores fwnode handle in zl3073x_dpll_pin during pin registration
- Adds ref_sync_get/set callbacks to read and write the sync control
mode and pair registers
- Validates ref-sync frequency constraints: sync signal must be 8 kHz
or less, clock reference must be 1 kHz or more and higher than sync
- Excludes sync source from automatic reference selection by setting
its priority to NONE on connect; on disconnect the priority is left
as NONE and the user must explicitly make the pin selectable again
- Iterates ref-sync-sources phandles to register declared pairings
via dpll_pin_ref_sync_pair_add()
Reviewed-by: Petr Oros <poros@redhat.com> Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20260408102716.443099-6-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Wed, 8 Apr 2026 10:27:15 +0000 (12:27 +0200)]
dt-bindings: dpll: add ref-sync-sources property
Add ref-sync-sources phandle-array property to the dpll-pin schema
allowing board designers to declare which input pins can serve as
sync sources in a Reference-Sync pair. A Ref-Sync pair consists of
a clock reference and a low-frequency sync signal where the DPLL locks
to the clock but phase-aligns to the sync reference.
Update both examples in the Microchip ZL3073x binding to demonstrate
the new property with a 1 PPS sync source paired to a clock source.
Reviewed-by: Petr Oros <poros@redhat.com> Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20260408102716.443099-5-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ivan Vecera [Wed, 8 Apr 2026 10:27:12 +0000 (12:27 +0200)]
dpll: zl3073x: clean up esync get/set and use zl3073x_out_is_ndiv()
Return -EOPNOTSUPP early in esync_get callbacks when esync is not
supported instead of conditionally populating the range at the end.
This simplifies the control flow by removing the finish label/goto
in the output variant and the conditional range assignment in both
input and output variants.
Replace open-coded N-div signal format switch statements with
zl3073x_out_is_ndiv() helper in esync_get, esync_set and
frequency_set callbacks.
Reviewed-by: Petr Oros <poros@redhat.com> Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20260408102716.443099-2-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge tag 'perf-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Four Intel uncore PMU driver fixes by Zide Chen"
* tag 'perf-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Remove extra double quote mark
perf/x86/intel/uncore: Fix die ID init and look up bugs
perf/x86/intel/uncore: Skip discovery table for offline dies
perf/x86/intel/uncore: Fix iounmap() leak on global_init failure
Jorge Marques [Mon, 23 Mar 2026 16:11:33 +0000 (17:11 +0100)]
i3c: master: adi: Fix error propagation for CCCs
adi_i3c_master_send_ccc_cmd() always returned 0, ignoring the transfer
result populated in the completion path. As a consequence, CCC command
errors were silently dropped, including the default -ETIMEDOUT and
later overwritten by adi_i3c_master_end_xfer_locked().
Fix this by returning xfer->ret so that callers correctly receive any
transfer error codes.
Fixes: a79ac2cdc91d ("i3c: master: Add driver for Analog Devices I3C Controller IP") Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Jorge Marques <jorge.marques@analog.com> Link: https://patch.msgid.link/20260323-ad4062-positive-error-fix-v3-5-30bdc68004be@analog.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Jorge Marques [Mon, 23 Mar 2026 16:11:32 +0000 (17:11 +0100)]
i3c: master: Fix error codes at send_ccc_cmd
i3c_master_send_ccc_cmd_locked() would propagate cmd->err (positive,
Mx codes) to the ret variable, cascading down multiple methods until
reaching methods that explicitly stated they would return 0 on success
or negative error code. For example, the call chain:
Fix this by returning the ret value, callers can still read the cmd->err
value if ret is negative.
All corner cases where the Mx codes do need to be handled individually,
are resolved in previous commits. Those corner cases are all scenarios
when I3C_ERROR_M2 is expected and acceptable.
The prerequisite patches for the fix are:
Jorge Marques [Mon, 23 Mar 2026 16:11:31 +0000 (17:11 +0100)]
i3c: master: Move bus_init error suppression
Prepare to fix improper Mx positive error propagation in later commits
by handling Mx error codes where the i3c_ccc_cmd command is allocated.
The CCC DISEC to broadcast address is invoked with
i3c_master_enec_disec_locked() and yields error I3C_ERROR_M2 if there
are no devices active on the bus. This is expected at the bus
initialization stage, where it is not known yet that there are no active
devices on the bus. Add bool suppress_m2 argument to
i3c_master_enec_disec_locked() and update the call site at
i3c_master_bus_init() with the exact corner case to not require
propagating positive Mx error codes. Other call site should not suppress
the error code, for example, if a driver requests to peripheral to
disable events and the transfer is not acknowledged, this is an error
and should not proceed.
Jorge Marques [Mon, 23 Mar 2026 16:11:30 +0000 (17:11 +0100)]
i3c: master: Move entdaa error suppression
Prepare to fix improper Mx positive error propagation in later commits
by handling Mx error codes where the i3c_ccc_cmd command is allocated.
The CCC ENTDAA is invoked with i3c_master_entdaa_locked() and yields
error I3C_ERROR_M2 if there are no devices active on the bus. Some
controllers may also yield if there are no more devices need an dynamic
address, since the sequence do always end in a NACK. Handle inside
i3c_master_entdaa_locked(), checking cmd->err directly. Both call sites
are updated, adi_i3c_master_do_daa() and cdns_i3c_master_do_daa().
Jorge Marques [Mon, 23 Mar 2026 16:11:29 +0000 (17:11 +0100)]
i3c: master: Move rstdaa error suppression
Prepare to fix improper Mx positive error propagation in later
commits by handling Mx error codes where the i3c_ccc_cmd command
is allocated. Two of the four i3c_master_rstdaa_locked() are error
paths that already suppressed the return value, the remaining two
are changed to handle the I3C_ERROR_M2 Mx error code inside
i3c_master_rstdaa_locked(), checking cmd->err directly.
Felix Gu [Sat, 4 Apr 2026 10:32:31 +0000 (18:32 +0800)]
i3c: dw: Simplify xfer cleanup with __free(kfree)
Convert dw-i3c-master to use __free(kfree) guards for struct dw_i3c_xfer
allocations. This frees xfer objects automatically on scope exit, and
removes the now-unused dw_i3c_master_free_xfer() helper.
Felix Gu [Sat, 4 Apr 2026 10:32:30 +0000 (18:32 +0800)]
i3c: dw: Fix memory leak in dw_i3c_master_i3c_xfers()
The dw_i3c_master_i3c_xfers() function allocates memory for the xfer
structure using dw_i3c_master_alloc_xfer(). If pm_runtime_resume_and_get()
fails, the function returns without freeing the allocated xfer, resulting
in a memory leak.
Since dw_i3c_master_free_xfer() is a thin wrapper around kfree(), use
the __free(kfree) cleanup attribute to handle the free automatically on
all exit paths.
Fixes: 62fe9d06f570 ("i3c: dw: Add power management support") Signed-off-by: Felix Gu <ustc.gu@gmail.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260404-dw-i3c-2-v3-1-8f7d146549c1@gmail.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Felix Gu [Mon, 6 Apr 2026 12:43:16 +0000 (20:43 +0800)]
i3c: master: renesas: Fix memory leak in renesas_i3c_i3c_xfers()
The xfer structure allocated by renesas_i3c_alloc_xfer() was never freed
in the renesas_i3c_i3c_xfers() function. Use the __free(kfree) cleanup
attribute to automatically free the memory when the variable goes out of
scope.
Fixes: d028219a9f14 ("i3c: master: Add basic driver for the Renesas I3C controller") Tested-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com> Reviewed-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Felix Gu <ustc.gu@gmail.com> Link: https://patch.msgid.link/20260406-renesas-v3-1-4b724d7708f4@gmail.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
When DW_I3C_DISABLE_RUNTIME_PM_QUIRK is set, the probe function calls
pm_runtime_get_noresume() to prevent runtime suspend. However, if
i3c_master_register() fails, the error path does not balance this
call, leaving the usage count incremented.
Add pm_runtime_put_noidle() in the error cleanup path to properly
balance the usage count.
Fixes: fba0e56ee752 ("i3c: dw: Disable runtime PM on Agilex5 to avoid bus hang on IBI") Signed-off-by: Felix Gu <ustc.gu@gmail.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260321-dw-i3c-1-v1-1-821623aac7bb@gmail.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Felix Gu [Fri, 20 Mar 2026 14:18:02 +0000 (22:18 +0800)]
i3c: master: dw-i3c: Fix missing reset assertion in remove() callback
The reset line acquired during probe is currently left deasserted when
the driver is unbound.
Switch to devm_reset_control_get_optional_exclusive_deasserted() to
ensure the reset is automatically re-asserted by the devres core when
the driver is removed.
Fixes: 62fe9d06f570 ("i3c: dw: Add power management support") Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Felix Gu <ustc.gu@gmail.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260320-dw-i3c-v3-1-477040c2e3f5@gmail.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Adrian Hunter [Fri, 6 Mar 2026 08:53:38 +0000 (10:53 +0200)]
i3c: mipi-i3c-hci-pci: Enable IBI while runtime suspended for Intel controllers
Intel LPSS I3C controllers can wake from runtime suspend to receive
in-band interrupts (IBIs), and they also implement the MIPI I3C HCI
Multi-Bus Instance capability. When multiple I3C bus instances share the
same PCI wakeup, the PCI parent must coordinate runtime PM so that all
instances suspend together and their mipi-i3c-hci runtime suspend
callbacks are invoked in a consistent manner.
Enable IBI-based wakeup by setting HCI_QUIRK_RPM_IBI_ALLOWED for the
intel-lpss-i3c platform device. Also set HCI_QUIRK_RPM_PARENT_MANAGED so
that the mipi-i3c-hci core driver expects runtime PM to be controlled by
the PCI parent rather than by individual instances. For all Intel HCI PCI
configurations, enable the corresponding control_instance_pm flag in the
PCI driver.
Adrian Hunter [Fri, 6 Mar 2026 08:53:37 +0000 (10:53 +0200)]
i3c: mipi-i3c-hci-pci: Add optional ability to manage child runtime PM
Some platforms implement the MIPI I3C HCI Multi-Bus Instance capability,
where a single parent device hosts multiple I3C controller instances. In
such designs, the parent - not the individual child instances - may need to
coordinate runtime PM so that all controllers runtime PM callbacks are
invoked in a controlled and synchronized manner.
For example, if the parent enables IBI-wakeup when transitioning into a
low-power state, every bus instance must remain able to receive IBIs up
until that point. This requires deferring the individual controllers'
runtime suspend callbacks (which disable bus activity) until the parent
decides it is safe for all instances to suspend together.
To support this usage model:
* Add runtime PM and system PM callbacks in the PCI driver to invoke
the mipi-i3c-hci driver's runtime PM callbacks for each instance.
* Introduce a driver-data flag, control_instance_pm, which opts into
the new parent-managed PM behaviour.
* Ensure the callbacks are only used when the corresponding instance is
operational at suspend time. This is reliable because the operational
state cannot change while the parent device is undergoing a PM
transition, and PCI always performs a runtime resume before system
suspend on current configurations, so that suspend and resume alternate
irrespective of whether it is runtime or system PM.
By that means, parent-managed runtime PM coordination for multi-instance
MIPI I3C HCI PCI devices is provided without altering existing behaviour on
platforms that do not require it.
Adrian Hunter [Fri, 6 Mar 2026 08:53:36 +0000 (10:53 +0200)]
i3c: mipi-i3c-hci: Allow parent to manage runtime PM
Some platforms implement the MIPI I3C HCI Multi-Bus Instance capability,
where a single parent device hosts multiple I3C controller instances. In
such designs, the parent - not the individual child instances - may need to
coordinate runtime PM so that all controllers runtime PM callbacks are
invoked in a controlled and synchronized manner.
For example, if the parent enables IBI-wakeup when transitioning into a
low-power state, every bus instance must remain able to receive IBIs up
until that point. This requires deferring the individual controllers'
runtime suspend callbacks (which disable bus activity) until the parent
decides it is safe for all instances to suspend together.
To support this usage model:
* Export the low-level runtime PM suspend and resume helpers so that
the parent can explicitly invoke them.
* Add a new quirk, HCI_QUIRK_RPM_PARENT_MANAGED, allowing platforms to
bypass per-instance runtime PM callbacks and delegate control to the
parent device.
* Move DEFAULT_AUTOSUSPEND_DELAY_MS into the header so it can be shared
by parent-managed PM implementations.
The new quirk allows platforms with multi-bus parent-managed PM
infrastructure to correctly coordinate runtime PM across all I3C HCI
instances.
Adrian Hunter [Fri, 6 Mar 2026 08:53:35 +0000 (10:53 +0200)]
i3c: mipi-i3c-hci: Add quirk to allow IBI while runtime suspended
Some I3C controllers can be automatically runtime-resumed in order to
handle in-band interrupts (IBIs), meaning that runtime suspend does not
need to be blocked when IBIs are enabled.
For example, a PCI-attached controller in a low-power state may generate
a Power Management Event (PME) when the SDA line is pulled low to signal
the START condition of an IBI. The PCI subsystem will then runtime-resume
the device, allowing the IBI to be received without requiring the
controller to remain active.
Introduce a new quirk, HCI_QUIRK_RPM_IBI_ALLOWED, so that drivers can
opt-in to this capability via driver data.