'struct regmap_config' are not modified in these drivers. They be
statically defined instead of allocated and populated at run-time.
The main benefits are:
- it saves some memory at runtime
- the structures can be declared as 'const', which is always better for
structures that hold some function pointers
- the code is less verbose
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 12 Jul 2025 00:50:26 +0000 (17:50 -0700)]
Merge tag 'batadv-next-pullrequest-20250710' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- batman-adv: store hard_iface as iflink private data,
by Matthias Schiffer
* tag 'batadv-next-pullrequest-20250710' of git://git.open-mesh.org/linux-merge:
batman-adv: store hard_iface as iflink private data
batman-adv: Start new development cycle
====================
Jakub Kicinski [Sat, 12 Jul 2025 00:33:06 +0000 (17:33 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: cleanups and preparation for live migration
Jake Keller says:
Various cleanups and preparation to the ice driver code for supporting
SR-IOV live migration.
The logic for unpacking Rx queue context data is added. This is the inverse
of the existing packing logic. Thanks to <linux/packing.h> this is trivial
to add.
Code to enable both reading and writing the Tx queue context for a queue
over a shared hardware register interface is added. Thanks to ice_adapter,
this is locked across all PFs that need to use it, preventing concurrency
issues with multiple PFs.
The RSS hash configuration requested by a VF is cached within the VF
structure. This will be used to track and restore the same configuration
during migration load.
ice_sriov_set_msix_vec_count() is updated to use pci_iov_vf_id() instead of
open-coding a worse equivalent, and checks to avoid rebuilding MSI-X if the
current request is for the existing amount of vectors.
A new ice_get_vf_by_dev() helper function is added to simplify accessing a
VF from its PCI device structure. This will be used more heavily within the
live migration code itself.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: introduce ice_get_vf_by_dev() wrapper
ice: avoid rebuilding if MSI-X vector count is unchanged
ice: use pci_iov_vf_id() to get VF ID
ice: expose VF functions used by live migration
ice: move ice_vsi_update_l2tsel to ice_lib.c
ice: save RSS hash configuration for migration
ice: add functions to get and set Tx queue context
ice: add support for reading and unpacking Rx queue context
====================
net: ll_temac: Fix incorrect PHY node reference in debug message
In temac_probe(), the debug message intended to print the resolved
PHY node was mistakenly using the controller node temac_np
instead of the actual PHY node lp->phy_node. This patch corrects
the log to reference the correct device tree node.
====================
netdevsim: support setting a permanent address
Network management daemons that match on the device permanent address
currently have no virtual interface types to test against.
NetworkManager, in particular, has carried an out of tree patch to set
the permanent address on netdevsim devices to use in its CI for this
purpose.
This series adds support to netdevsim to set a permanent address on port
creation, and adds a test script to test setting and getting of the
different L2 address types.
selftests: net: add netdev-l2addr.sh for testing L2 address functionality
Add a new test script to the network selftests which tests getting and
setting of layer 2 addresses through netlink, including the newly added
support for setting a permaddr on netdevsim devices.
net: netdevsim: Support setting dev->perm_addr on port creation
Network management daemons that match on the device permanent address
currently have no virtual interface types to test against.
NetworkManager, in particular, has carried an out of tree patch to set
the permanent address on netdevsim devices to use in its CI for this
purpose.
To support this use case, support setting netdev->perm_addr when
creating a netdevsim port.
selftests: flip local/remote endpoints in iou-zcrx.py
The iou-zcrx selftest currently runs the server on the remote host
and the client on the local host. This commit flips the endpoints
such that server runs on localhost and client on remote.
This change brings the iou-zcrx selftest in convention with other
selftests.
Drive-by fix for a missing import exception that happens when the
network interface has less than 2 combined channels.
Test plan: ran iou-zcrx.py selftest between 2 physical machines
Edward Cree [Thu, 10 Jul 2025 17:32:13 +0000 (18:32 +0100)]
sfc: falcon: refactor and document ef4_ethtool_get_rxfh_fields
The code had some rather odd control flow inherited from when it was
shared with siena and ef10 before this driver was split out.
Simplify that for easier reading.
Also add a comment explaining why we return the values we do, since
some Falcon documents and datasheets confusingly mention the part
supporting 4-tuple UDP hashing.
(I couldn't find any record of exactly what was "broken" about the
original Falcon A hash, I'm just trusting that falcon_init_rx_cfg()
had a good reason for not using it.)
====================
net_sched: act: extend RCU use in dump() methods
We are trying to get away from central RTNL in favor of fine-grained
mutexes. While looking at net/sched, I found that act already uses
RCU in the fast path for the most cases, and could also be used
in dump() methods.
This series is not complete and will be followed by a second one.
Eric Dumazet [Wed, 9 Jul 2025 09:01:57 +0000 (09:01 +0000)]
net_sched: act_ctinfo: use atomic64_t for three counters
Commit 21c167aa0ba9 ("net/sched: act_ctinfo: use percpu stats")
missed that stats_dscp_set, stats_dscp_error and stats_cpmark_set
might be written (and read) locklessly.
Use atomic64_t for these three fields, I doubt act_ctinfo is used
heavily on big SMP hosts anyway.
Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pedro Tammela <pctammela@mojatatu.com> Link: https://patch.msgid.link/20250709090204.797558-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 9 Jul 2025 20:59:10 +0000 (13:59 -0700)]
eth: fbnic: fix ubsan complaints about OOB accesses
UBSAN complains that we reach beyond the end of the log entry:
UBSAN: array-index-out-of-bounds in drivers/net/ethernet/meta/fbnic/fbnic_fw_log.c:94:50
index 71 is out of range for type 'char [*]'
Call Trace:
<TASK>
ubsan_epilogue+0x5/0x2b
fbnic_fw_log_write+0x120/0x960
fbnic_fw_parse_logs+0x161/0x210
We're just taking the address of the character after the array,
so this really seems like something that should be legal.
But whatever, easy enough to silence by doing direct pointer math.
Fixes: c2b93d6beca8 ("eth: fbnic: Create ring buffer for firmware logs") Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250709205910.3107691-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
virtio_net: simplify tx queue wake condition check
Consolidate the two nested if conditions for checking tx queue wake
conditions into a single combined condition. This improves code
readability without changing functionality. And move netif_tx_wake_queue
into if condition to reduce unnecessary checks for queue stops.
William Liu [Tue, 8 Jul 2025 16:44:05 +0000 (16:44 +0000)]
selftests/tc-testing: Add tests for restrictions on netem duplication
Ensure that a duplicating netem cannot exist in a tree with other netems
in both qdisc addition and change. This is meant to prevent the soft
lockup and OOM loop scenario discussed in [1]. Also adjust a HFSC's
re-entrancy test case with netem for this new restriction - KASAN
still triggers upon its failure.
Signed-off-by: William Liu <will@willsroot.io> Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20250708164219.875521-1-will@willsroot.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>
William Liu [Tue, 8 Jul 2025 16:43:26 +0000 (16:43 +0000)]
net/sched: Restrict conditions for adding duplicating netems to qdisc tree
netem_enqueue's duplication prevention logic breaks when a netem
resides in a qdisc tree with other netems - this can lead to a
soft lockup and OOM loop in netem_dequeue, as seen in [1].
Ensure that a duplicating netem cannot exist in a tree with other
netems.
Previous approaches suggested in discussions in chronological order:
1) Track duplication status or ttl in the sk_buff struct. Considered
too specific a use case to extend such a struct, though this would
be a resilient fix and address other previous and potential future
DOS bugs like the one described in loopy fun [2].
2) Restrict netem_enqueue recursion depth like in act_mirred with a
per cpu variable. However, netem_dequeue can call enqueue on its
child, and the depth restriction could be bypassed if the child is a
netem.
3) Use the same approach as in 2, but add metadata in netem_skb_cb
to handle the netem_dequeue case and track a packet's involvement
in duplication. This is an overly complex approach, and Jamal
notes that the skb cb can be overwritten to circumvent this
safeguard.
4) Prevent the addition of a netem to a qdisc tree if its ancestral
path contains a netem. However, filters and actions can cause a
packet to change paths when re-enqueued to the root from netem
duplication, leading us to the current solution: prevent a
duplicating netem from inhabiting the same tree as other netems.
Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication") Reported-by: William Liu <will@willsroot.io> Reported-by: Savino Dicanosa <savy@syst3mfailure.io> Signed-off-by: William Liu <will@willsroot.io> Signed-off-by: Savino Dicanosa <savy@syst3mfailure.io> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20250708164141.875402-1-will@willsroot.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cross-merge networking fixes after downstream PR (net-6.16-rc6-2).
No conflicts.
Adjacent changes:
drivers/net/wireless/mediatek/mt76/mt7925/mcu.c c701574c5412 ("wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan") b3a431fe2e39 ("wifi: mt76: mt7925: fix off by one in mt7925_mcu_hw_scan()")
drivers/net/wireless/mediatek/mt76/mt7996/mac.c 62da647a2b20 ("wifi: mt76: mt7996: Add MLO support to mt7996_tx_check_aggr()") dc66a129adf1 ("wifi: mt76: add a wrapper for wcid access with validation")
drivers/net/wireless/mediatek/mt76/mt7996/main.c 3dd6f67c669c ("wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl()") 8989d8e90f5f ("wifi: mt76: mt7996: Do not set wcid.sta to 1 in mt7996_mac_sta_event()")
net/mac80211/cfg.c 58fcb1b4287c ("wifi: mac80211: reject VHT opmode for unsupported channel widths") 037dc18ac3fb ("wifi: mac80211: add support for storing station S1G capabilities")
Merge tag 'net-6.16-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull more networking fixes from Jakub Kicinski
"Big chunk of fixes for WiFi, Johannes says probably the last for the
release.
The Netlink fixes (on top of the tree) restore operation of iw (WiFi
CLI) which uses sillily small recv buffer, and is the reason for this
'emergency PR'.
The GRE multicast fix also stands out among the user-visible
regressions.
Current release - fix to a fix:
- netlink: make sure we always allow at least one skb to be queued,
even if the recvbuf is (mis)configured to be tiny
Previous releases - regressions:
- gre: fix IPv6 multicast route creation
Previous releases - always broken:
- wifi: prevent A-MSDU attacks in mesh networks
- wifi: cfg80211: fix S1G beacon head validation and detection
- wifi: mac80211:
- always clear frame buffer to prevent stack leak in cases which
hit a WARN()
- fix monitor interface in device restart
- wifi: mwifiex: discard erroneous disassoc frames on STA interface
- wifi: mt76:
- prevent null-deref in mt7925_sta_set_decap_offload()
- add missing RCU annotations, and fix sleep in atomic
- fix decapsulation offload
- fixes for scanning
- phy: microchip: improve link establishment and reset handling
- eth: mlx5e: fix race between DIM disable and net_dim()
- bnxt_en: correct DMA unmap len for XDP_REDIRECT"
* tag 'net-6.16-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
netlink: make sure we allow at least one dump skb
netlink: Fix rmem check in netlink_broadcast_deliver().
bnxt_en: Set DMA unmap len correctly for XDP_REDIRECT
bnxt_en: Flush FW trace before copying to the coredump
bnxt_en: Fix DCB ETS validation
net: ll_temac: Fix missing tx_pending check in ethtools_set_ringparam()
net/mlx5e: Add new prio for promiscuous mode
net/mlx5e: Fix race between DIM disable and net_dim()
net/mlx5: Reset bw_share field when changing a node's parent
can: m_can: m_can_handle_lost_msg(): downgrade msg lost in rx message to debug level
selftests: net: lib: fix shift count out of range
selftests: Add IPv6 multicast route generation tests for GRE devices.
gre: Fix IPv6 multicast route creation.
net: phy: microchip: limit 100M workaround to link-down events on LAN88xx
net: phy: microchip: Use genphy_soft_reset() to purge stale LPA bits
ibmvnic: Fix hardcoded NUM_RX_STATS/NUM_TX_STATS with dynamic sizeof
net: appletalk: Fix device refcount leak in atrtr_create()
netfilter: flowtable: account for Ethernet header in nf_flow_pppoe_proto()
wifi: mac80211: add the virtual monitor after reconfig complete
wifi: mac80211: always initialize sdata::key_list
...
Merge tag 'gpio-fixes-for-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix performance regression when setting values of multiple GPIO lines
at once
- make sure the GPIO OF xlate code doesn't end up passing an
uninitialized local variable to GPIO core
- update MAINTAINERS
* tag 'gpio-fixes-for-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: remove bouncing address for Nandor Han
gpio: of: initialize local variable passed to the .of_xlate() callback
gpiolib: fix performance regression when using gpio_chip_get_multiple()
Merge tag 'dma-mapping-6.16-2025-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fix from Marek Szyprowski:
- small fix relevant to arm64 server and custom CMA configuration (Feng
Tang)
* tag 'dma-mapping-6.16-2025-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-contiguous: hornor the cma address limit setup by user
Jakub Kicinski [Fri, 11 Jul 2025 00:11:21 +0000 (17:11 -0700)]
netlink: make sure we allow at least one dump skb
Commit under Fixes tightened up the memory accounting for Netlink
sockets. Looks like the accounting is too strict for some existing
use cases, Marek reported issues with nl80211 / WiFi iw CLI.
To reduce number of iterations Netlink dumps try to allocate
messages based on the size of the buffer passed to previous
recvmsg() calls. If user space uses a larger buffer in recvmsg()
than sk_rcvbuf we will allocate an skb we won't be able to queue.
Make sure we always allow at least one skb to be queued.
Same workaround is already present in netlink_attachskb().
Alternative would be to cap the allocation size to
rcvbuf - rmem_alloc
but as I said, the workaround is already present in other places.
Jakub Kicinski [Fri, 11 Jul 2025 14:28:36 +0000 (07:28 -0700)]
Merge branch 'bnxt_en-3-bug-fixes'
Michael Chan says:
====================
bnxt_en: 3 bug fixes
The first one fixes a possible failure when setting DCB ETS.
The second one fixes the ethtool coredump (-W 2) not containing
all the FW traces. The third one fixes the DMA unmap length when
transmitting XDP_REDIRECT packets.
====================
bnxt_en: Set DMA unmap len correctly for XDP_REDIRECT
When transmitting an XDP_REDIRECT packet, call dma_unmap_len_set()
with the proper length instead of 0. This bug triggers this warning
on a system with IOMMU enabled:
bnxt_en: Flush FW trace before copying to the coredump
bnxt_fill_drv_seg_record() calls bnxt_dbg_hwrm_log_buffer_flush()
to flush the FW trace buffer. This needs to be done before we
call bnxt_copy_ctx_mem() to copy the trace data.
Without this fix, the coredump may not contain all the FW
traces.
Fixes: 3c2179e66355 ("bnxt_en: Add FW trace coredump segments to the coredump") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250710213938.1959625-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In bnxt_ets_validate(), the code incorrectly loops over all possible
traffic classes to check and add the ETS settings. Fix it to loop
over the configured traffic classes only.
The unconfigured traffic classes will default to TSA_ETS with 0
bandwidth. Looping over these unconfigured traffic classes may
cause the validation to fail and trigger this error message:
"rejecting ETS config starving a TC\n"
The .ieee_setets() will then fail.
Fixes: 7df4ae9fe855 ("bnxt_en: Implement DCBNL to support host-based DCBX.") Reviewed-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Shravya KN <shravya.k-n@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250710213938.1959625-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: ll_temac: Fix missing tx_pending check in ethtools_set_ringparam()
The function ll_temac_ethtools_set_ringparam() incorrectly checked
rx_pending twice, once correctly for RX and once mistakenly in place
of tx_pending. This caused tx_pending to be left unchecked against
TX_BD_NUM_MAX.
As a result, invalid TX ring sizes may have been accepted or valid
ones wrongly rejected based on the RX limit, leading to potential
misconfiguration or unexpected results.
This patch corrects the condition to properly validate tx_pending.
Jianbo Liu [Thu, 10 Jul 2025 13:53:44 +0000 (16:53 +0300)]
net/mlx5e: Add new prio for promiscuous mode
An optimization for promiscuous mode adds a high-priority steering
table with a single catch-all rule to steer all traffic directly to
the TTC table.
However, a gap exists between the creation of this table and the
insertion of the catch-all rule. Packets arriving in this brief window
would miss as no rule was inserted yet, unnecessarily incrementing the
'rx_steer_missed_packets' counter and dropped.
This patch resolves the issue by introducing a new prio for this
table, placing it between MLX5E_TC_PRIO and MLX5E_NIC_PRIO. By doing
so, packets arriving during the window now fall through to the next
prio (at MLX5E_NIC_PRIO) instead of being dropped.
Fixes: 1c46d7409f30 ("net/mlx5e: Optimize promiscuous mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1752155624-24095-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/mlx5e: Fix race between DIM disable and net_dim()
There's a race between disabling DIM and NAPI callbacks using the dim
pointer on the RQ or SQ.
If NAPI checks the DIM state bit and sees it still set, it assumes
`rq->dim` or `sq->dim` is valid. But if DIM gets disabled right after
that check, the pointer might already be set to NULL, leading to a NULL
pointer dereference in net_dim().
Fix this by calling `synchronize_net()` before freeing the DIM context.
This ensures all in-progress NAPI callbacks are finished before the
pointer is cleared.
net/mlx5: Reset bw_share field when changing a node's parent
When changing a node's parent, its scheduling element is destroyed and
re-created with bw_share 0. However, the node's bw_share field was not
updated accordingly.
Set the node's bw_share to 0 after re-creation to keep the software
state in sync with the firmware configuration.
Fixes: 9c7bbf4c3304 ("net/mlx5: Add support for setting parent of nodes") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1752155624-24095-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 11 Jul 2025 14:07:56 +0000 (07:07 -0700)]
Merge tag 'linux-can-fixes-for-6.16-20250711' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-07-11
Sean Nyekjaer's patch targets the m_can driver and demotes the "msg
lost in rx" message to debug level to prevent flooding the kernel log
with error messages.
* tag 'linux-can-fixes-for-6.16-20250711' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: m_can: m_can_handle_lost_msg(): downgrade msg lost in rx message to debug level
====================
However, during review, we noticed that the patch conflicts with another
patch in netdev tree:
https://lore.kernel.org/netdev/1749651015-9668-1-git-send-email-shradhagupta@linux.microsoft.com/
As this series has no dependency with the rest of the series, we think it
is best to split out this one and send it to netdev, to avoid conflict
resolution headache later on.
Can netdev maintainers please pick it up?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nam Cao [Mon, 7 Jul 2025 08:20:16 +0000 (10:20 +0200)]
PCI: hv: Switch to msi_create_parent_irq_domain()
Move away from the legacy MSI domain setup, switch to use
msi_create_parent_irq_domain().
While doing the conversion, I noticed that hv_compose_msi_msg() is doing
more than it is supposed to (composing message). This function also
allocates and populates struct tran_int_desc, which should be done in
hv_pcie_domain_alloc() instead. It works, but it is not the correct design.
However, I have no hardware to test such change, therefore I leave a TODO
note.
Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nam Cao [Mon, 7 Jul 2025 08:20:15 +0000 (10:20 +0200)]
irqdomain: Export irq_domain_free_irqs_top()
Export irq_domain_free_irqs_top(), making it usable for drivers compiled as
modules.
Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 11 Jul 2025 01:18:40 +0000 (18:18 -0700)]
Merge tag 'nf-next-25-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next (v2)
The following series contains an initial small batch of Netfilter
updates for net-next:
1) Remove DCCP conntrack support, keep DCCP matches around in order to
avoid breakage when loading ruleset, add Kconfig to wrap the code
so it can be disabled by distributors.
2) Remove buggy code aiming at shrinking netlink deletion event, then
re-add it correctly in another patch. This is to prevent -stable to
pick up on a fix that breaks old userspace. From Phil Sutter.
3) Missing WARN_ON_ONCE() to check for lockdep_commit_lock_is_held()
to uncover bugs. From Fedor Pchelkin.
* tag 'nf-next-25-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: adjust lockdep assertions handling
netfilter: nf_tables: Reintroduce shortened deletion notifications
netfilter: nf_tables: Drop dead code from fill_*_info routines
netfilter: conntrack: remove DCCP protocol support
====================
====================
net: ftgmac100: Add SoC reset support for RMII mode
This patch series adds support for an optional reset line to the
ftgmac100 ethernet controller, as used on Aspeed SoCs. On these SoCs,
the internal MAC reset is not sufficient to reset the RMII interface.
By providing a SoC-level reset via the device tree "resets" property,
the driver can properly reset both the MAC and RMII logic, ensuring
correct operation in RMII mode.
The series includes:
- Device tree binding update to document the new "resets" property.
- Addition of MAC1/2/3/4 reset definitions for AST2600.
- Driver changes to assert/deassert the reset line as needed.
This improves reliability and initialization of the MAC in RMII mode
on Aspeed platforms.
====================
net: ftgmac100: Add optional reset control for RMII mode on Aspeed SoCs
On Aspeed SoCs, the internal MAC reset is insufficient to fully reset the
RMII interface; only the SoC-level reset line can properly reset the RMII
logic. This patch adds support for an optional "resets" property in the
device tree, allowing the driver to assert and deassert the SoC reset line
when operating in RMII mode. This ensures the MAC and RMII interface are
correctly reset and initialized.
dt-bindings: clock: ast2600: Add reset definitions for MAC1 and MAC2
Add ASPEED_RESET_MAC1 and ASPEED_RESET_MAC2 reset definitions to
the ast2600-clock binding header. These are required for proper
reset control of the MAC1 and MAC2 ethernet controllers on the
AST2600 SoC.
In Aspeed AST2600 design, the MAC internal delay on MAC register cannot
fully reset the RMII interfaces, it may cause the RMII incompletely.
Therefore, we need to add resets property to do SoC-level reset line to
reset the whole MAC function that includes ftgmac, RGMII and RMII.
Hangbin Liu [Wed, 9 Jul 2025 09:12:44 +0000 (09:12 +0000)]
selftests: net: lib: fix shift count out of range
I got the following warning when writing other tests:
+ handle_test_result_pass 'bond 802.3ad' '(lacp_active off)'
+ local 'test_name=bond 802.3ad'
+ shift
+ local 'opt_str=(lacp_active off)'
+ shift
+ log_test_result 'bond 802.3ad' '(lacp_active off)' ' OK '
+ local 'test_name=bond 802.3ad'
+ shift
+ local 'opt_str=(lacp_active off)'
+ shift
+ local 'result= OK '
+ shift
+ local retmsg=
+ shift
/net/tools/testing/selftests/net/forwarding/../lib.sh: line 315: shift: shift count out of range
This happens because an extra shift is executed even after all arguments
have been consumed. Remove the last shift in log_test_result() to avoid
this warning.
When fixing IPv6 link-local address generation on GRE devices with
commit 3e6a0243ff00 ("gre: Fix again IPv6 link-local address
generation."), I accidentally broke the default IPv6 multicast route
creation on these GRE devices.
Fix that in patch 1, making the GRE specific code yet a bit closer to
the generic code used by most other network interface types.
Then extend the selftest in patch 2 to cover this case.
====================
selftests: Add IPv6 multicast route generation tests for GRE devices.
The previous patch fixes a bug that prevented the creation of the
default IPv6 multicast route (ff00::/8) for some GRE devices. Now let's
extend the GRE IPv6 selftests to cover this case.
Also, rename check_ipv6_ll_addr() to check_ipv6_device_config() and
adapt comments and script output to take into account the fact that
we're not limited to link-local address generation.
Use addrconf_add_dev() instead of ipv6_find_idev() in
addrconf_gre_config() so that we don't just get the inet6_dev, but also
install the default ff00::/8 multicast route.
Before commit 3e6a0243ff00 ("gre: Fix again IPv6 link-local address
generation."), the multicast route was created at the end of the
function by addrconf_add_mroute(). But this code path is now only taken
in one particular case (gre devices not bound to a local IP address and
in EUI64 mode). For all other cases, the function exits early and
addrconf_add_mroute() is not called anymore.
Using addrconf_add_dev() instead of ipv6_find_idev() in
addrconf_gre_config(), fixes the problem as it will create the default
multicast route for all gre devices. This also brings
addrconf_gre_config() a bit closer to the normal netdevice IPv6
configuration code (addrconf_dev_config()).
Cc: stable@vger.kernel.org Fixes: 3e6a0243ff00 ("gre: Fix again IPv6 link-local address generation.") Reported-by: Aiden Yang <ling@moedove.com> Closes: https://lore.kernel.org/netdev/CANR=AhRM7YHHXVxJ4DmrTNMeuEOY87K2mLmo9KMed1JMr20p6g@mail.gmail.com/ Reviewed-by: Gary Guo <gary@garyguo.net> Tested-by: Gary Guo <gary@garyguo.net> Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/027a923dcb550ad115e6d93ee8bb7d310378bd01.1752070620.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch series improves the reliability of the Microchip LAN88xx
PHYs, particularly in edge cases involving fixed link configurations or
forced speed modes.
Patch 1 assigns genphy_soft_reset() to the .soft_reset hook to ensure
that stale link partner advertisement (LPA) bits are properly cleared
during reconfiguration. Without this, outdated autonegotiation bits may
remain visible in some parallel detection cases.
Patch 2 restricts the 100 Mbps workaround (originally intended to handle
cable length switching) to only run when the link transitions to the
PHY_NOLINK state. This prevents repeated toggling that can confuse
autonegotiating link partners such as the Intel i350, leading to
unstable link cycles.
Both patches were tested on a LAN7850 (with integrated LAN88xx PHY)
against an Intel I350 NIC. The full test suite - autonegotiation, fixed
link, and parallel detection - passed successfully.
====================
net: phy: microchip: limit 100M workaround to link-down events on LAN88xx
Restrict the 100Mbit forced-mode workaround to link-down transitions
only, to prevent repeated link reset cycles in certain configurations.
The workaround was originally introduced to improve signal reliability
when switching cables between long and short distances. It temporarily
forces the PHY into 10 Mbps before returning to 100 Mbps.
However, when used with autonegotiating link partners (e.g., Intel i350),
executing this workaround on every link change can confuse the partner
and cause constant renegotiation loops. This results in repeated link
down/up transitions and the PHY never reaching a stable state.
Limit the workaround to only run during the PHY_NOLINK state. This ensures
it is triggered only once per link drop, avoiding disruptive toggling
while still preserving its intended effect.
Note: I am not able to reproduce the original issue that this workaround
addresses. I can only confirm that 100 Mbit mode works correctly in my
test setup. Based on code inspection, I assume the workaround aims to
reset some internal state machine or signal block by toggling speeds.
However, a PHY reset is already performed earlier in the function via
phy_init_hw(), which may achieve a similar effect. Without a reproducer,
I conservatively keep the workaround but restrict its conditions.
Fixes: e57cf3639c32 ("net: lan78xx: fix accessing the LAN7800's internal phy specific registers from the MAC driver") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250709130753.3994461-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: phy: microchip: Use genphy_soft_reset() to purge stale LPA bits
Enable .soft_reset for the LAN88xx PHY driver by assigning
genphy_soft_reset() to ensure that the phylib core performs a proper
soft reset during reconfiguration.
Previously, the driver left .soft_reset unimplemented, so calls to
phy_init_hw() (e.g., from lan88xx_link_change_notify()) did not fully
reset the PHY. As a result, stale contents in the Link Partner Ability
(LPA) register could persist, causing the PHY to incorrectly report
that the link partner advertised autonegotiation even when it did not.
Using genphy_soft_reset() guarantees a clean reset of the PHY and
corrects the false autoneg reporting in these scenarios.
Fixes: ccb989e4d1ef ("net: phy: microchip: Reset LAN88xx PHY to ensure clean link state on LAN7800/7850") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250709130753.3994461-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mingming Cao [Wed, 9 Jul 2025 15:33:32 +0000 (08:33 -0700)]
ibmvnic: Fix hardcoded NUM_RX_STATS/NUM_TX_STATS with dynamic sizeof
The previous hardcoded definitions of NUM_RX_STATS and
NUM_TX_STATS were not updated when new fields were added
to the ibmvnic_{rx,tx}_queue_stats structures. Specifically,
commit 2ee73c54a615 ("ibmvnic: Add stat for tx direct vs tx
batched") added a fourth TX stat, but NUM_TX_STATS remained 3,
leading to a mismatch.
This patch replaces the static defines with dynamic sizeof-based
calculations to ensure the stat arrays are correctly sized.
This fixes incorrect indexing and prevents incomplete stat
reporting in tools like ethtool.
Fixes: 2ee73c54a615 ("ibmvnic: Add stat for tx direct vs tx batched") Signed-off-by: Mingming Cao <mmc@linux.ibm.com> Reviewed-by: Dave Marquardt <davemarq@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250709153332.73892-1-mmc@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: pse-pd: pd692x0: reduce stack usage in pd692x0_setup_pi_matrix
The pd692x0_manager array in this function is really too big to fit on the
stack, though this never triggered a warning until a recent patch made
it slightly bigger:
drivers/net/pse-pd/pd692x0.c: In function 'pd692x0_setup_pi_matrix':
drivers/net/pse-pd/pd692x0.c:1210:1: error: the frame size of 1584 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]
Change the function to dynamically allocate the array here.
Fixes: 359754013e6a ("net: pse-pd: pd692x0: Add support for PSE PI priority feature") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250709153210.1920125-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kito Xu [Wed, 9 Jul 2025 03:52:51 +0000 (03:52 +0000)]
net: appletalk: Fix device refcount leak in atrtr_create()
When updating an existing route entry in atrtr_create(), the old device
reference was not being released before assigning the new device,
leading to a device refcount leak. Fix this by calling dev_put() to
release the old device reference before holding the new one.
Jakub Kicinski [Tue, 8 Jul 2025 22:06:40 +0000 (15:06 -0700)]
selftests: drv-net: test RSS header field configuration
Test reading RXFH fields over IOCTL and netlink.
# ./tools/testing/selftests/drivers/net/hw/rss_api.py
TAP version 13
1..3
ok 1 rss_api.test_rxfh_indir_ntf
ok 2 rss_api.test_rxfh_indir_ctx_ntf
ok 3 rss_api.test_rxfh_fields
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
Jakub Kicinski [Tue, 8 Jul 2025 22:06:39 +0000 (15:06 -0700)]
ethtool: rss: report which fields are configured for hashing
Implement ETHTOOL_GRXFH over Netlink. The number of flow types is
reasonable (around 20) so report all of them at once for simplicity.
Do not maintain the flow ID mapping with ioctl at the uAPI level.
This gives us a chance to clean up the confusion that come from
RxNFC vs RxFH (flow direction vs hashing) in the ioctl.
Try to align with the names used in ethtool CLI, they seem to have
stood the test of time just fine. One annoyance is that we still
call L4 ports the weird names, but I guess they also apply to IPSec
(where they cover the SPI) so it is what it is.
Jakub Kicinski [Tue, 8 Jul 2025 22:06:38 +0000 (15:06 -0700)]
ethtool: mark ETHER_FLOW as usable for Rx hash
Looks like some drivers (ena, enetc, fbnic.. there's probably more)
consider ETHER_FLOW to be legitimate target for flow hashing.
I'm not sure how intentional that is from the uAPI perspective
vs just an effect of ethtool IOCTL doing minimal input validation.
But Netlink will do strict validation, so we need to decide whether
we allow this use case or not. I don't see a strong reason against
it, and rejecting it would potentially regress a number of drivers.
So update the comments and flow_type_hashable().
Jakub Kicinski [Tue, 8 Jul 2025 22:06:37 +0000 (15:06 -0700)]
tools: ynl: decode enums in auto-ints
Use enum decoding on auto-ints. Looks like we only had enum
auto-ints for input values until now. Upcoming RSS work will
need this to declare an attribute with flags as a uint.
Jakub Kicinski [Tue, 8 Jul 2025 22:06:36 +0000 (15:06 -0700)]
ethtool: rss: make sure dump takes the rss lock
After commit 040cef30b5e6 ("net: ethtool: move get_rxfh callback
under the rss_lock") we're expected to take rss_lock around get.
Switch dump to using the new prep helper and move the locking into it.
Jakub Kicinski [Fri, 11 Jul 2025 00:24:21 +0000 (17:24 -0700)]
Merge tag 'wireless-next-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Quite a bit more work, notably:
- mt76: firmware recovery improvements, MLO work
- iwlwifi: use embedded PNVM in (to be released) FW images
to fix compatibility issues
- cfg80211/mac80211: extended regulatory info support (6 GHz)
- cfg80211: use "faux device" for regulatory
* tag 'wireless-next-2025-07-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (48 commits)
wifi: mac80211: don't complete management TX on SAE commit
wifi: cfg80211/mac80211: implement dot11ExtendedRegInfoSupport
wifi: mac80211: send extended MLD capa/ops if AP has it
wifi: mac80211: copy first_part into HW scan
wifi: cfg80211: add a flag for the first part of a scan
wifi: mac80211: remove DISALLOW_PUNCTURING_5GHZ code
wifi: cfg80211: only verify part of Extended MLD Capabilities
wifi: nl80211: make nl80211_check_scan_flags() type safe
wifi: cfg80211: hide scan internals
wifi: mac80211: fix deactivated link CSA
wifi: mac80211: add mandatory bitrate support for 6 GHz
wifi: mac80211: remove spurious blank line
wifi: mac80211: verify state before connection
wifi: mac80211: avoid weird state in error path
wifi: iwlwifi: mvm: remove support for iwl_wowlan_info_notif_v4
wifi: iwlwifi: bump minimum API version in BZ
wifi: iwlwifi: mvm: remove unneeded argument
wifi: iwlwifi: mvm: remove MLO GTK rekey code
wifi: iwlwifi: pcie: rename iwl_pci_gen1_2_probe() argument
wifi: iwlwifi: match discrete/integrated to fix some names
...
====================
Wang Liang [Tue, 8 Jul 2025 03:33:42 +0000 (11:33 +0800)]
net: replace ND_PRINTK with dynamic debug
ND_PRINTK with val > 1 only works when the ND_DEBUG was set in compilation
phase. Replace it with dynamic debug. Convert ND_PRINTK with val <= 1 to
net_{err,warn}_ratelimited, and convert the rest to net_dbg_ratelimited.
Jakub Kicinski [Thu, 10 Jul 2025 22:02:23 +0000 (15:02 -0700)]
Merge branch 'further-mt7988-devicetree-work'
Frank Wunderlich says:
====================
further mt7988 devicetree work
This series continues mt7988 devicetree work
- Extend cpu frequency scaling with CCI
- GPIO leds
- Basic network-support (ethernet controller + builtin switch + SFP Cages)
depencies (i hope this list is complete and latest patches/series linked):
support interrupt-names is optional again as i re-added the reserved IRQs
(they are not unusable as i thought and can allow features in future)
https://patchwork.kernel.org/project/netdevbpf/patch/20250619132125.78368-2-linux@fw-web.de/
needs change in mtk ethernet driver for the sram to be read from separate node:
https://patchwork.kernel.org/project/netdevbpf/patch/c2b9242229d06af4e468204bcf42daa1535c3a72.1751461762.git.daniel@makrotopia.org/
for SFP-Function (macs currently disabled):
PCS clearance which is a 1.5 year discussion currently ongoing
Daniel asked netdev for a way 2 go:
https://lore.kernel.org/netdev/aEwfME3dYisQtdCj@pidgin.makrotopia.org/
e.g. something like this (one of):
* https://patchwork.kernel.org/project/netdevbpf/patch/20250610233134.3588011-4-sean.anderson@linux.dev/ (v6)
* https://patchwork.kernel.org/project/netdevbpf/patch/20250511201250.3789083-4-ansuelsmth@gmail.com/ (v4)
* https://patchwork.kernel.org/project/netdevbpf/patch/ba4e359584a6b3bc4b3470822c42186d5b0856f9.1721910728.git.daniel@makrotopia.org/
dt-bindings: net: dsa: mediatek,mt7530: add internal mdio bus
Mt7988 buildin switch has own mdio bus where ge-phys are connected.
Add related property for this.
Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patch.msgid.link/20250709111147.11843-7-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
dt-bindings: net: dsa: mediatek,mt7530: add dsa-port definition for mt7988
Add own dsa-port binding for SoC with internal switch where only phy-mode
'internal' is valid.
Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patch.msgid.link/20250709111147.11843-6-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Meditak Filogic SoCs (MT798x) have dedicated MMIO-SRAM for dma operations.
MT7981 and MT7986 currently use static offset to ethernet MAC register
which will be changed in separate patch once this way is accepted.
Add "sram" property to map ethernet controller to dedicated mmio-sram node.
Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250709111147.11843-5-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In preparation for MT7988 and RSS/LRO allow the interrupt-names
property.
In this way driver can request the interrupts by name which is much
more readable in the driver code and SoC's dtsi than relying on a
specific order.
Frame-engine-IRQs (fe0..3):
MT7621, MT7628: 1 FE-IRQ
MT7622, MT7623: 3 FE-IRQs (only two used by the driver for now)
MT7981, MT7986: 4 FE-IRQs (only two used by the driver for now)
RSS/LRO IRQs (pdma0..3) additional only on Filogic (MT798x) with
count of 4. So all IRQ-names (8) for Filogic.
Set boundaries for all compatibles same as irq count.
Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250709111147.11843-4-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
dt-bindings: net: mediatek,net: allow up to 8 IRQs
Increase the maximum IRQ count to 8 (4 FE + 4 RSS/LRO).
Frame-engine-IRQs (max 4):
MT7621, MT7628: 1 FE-IRQ
MT7622, MT7623: 3 FE-IRQs (only two used by the driver for now)
MT7981, MT7986, MT7988: 4 FE-IRQs (only two used by the driver for now)
Mediatek Filogic SoCs (mt798x) have 4 additional IRQs for RSS and/or
LRO. So MT798x have 8 IRQs in total.
MT7981 does not have a ethernet-node yet.
MT7986 Ethernet node is updated with RSS/LRO IRQs in this series.
MT7988 Ethernet node is added in this series.
Jacob Keller [Wed, 18 Jun 2025 22:24:43 +0000 (15:24 -0700)]
ice: introduce ice_get_vf_by_dev() wrapper
The ice_get_vf_by_id() function is used to obtain a reference to a VF
structure based on its ID. The ice_sriov_set_msix_vec_count() function
needs to get a VF reference starting from the VF PCI device, and uses
pci_iov_vf_id() to get the VF ID. This pattern is currently uncommon in the
ice driver. However, the live migration module will introduce many more
such locations.
Add a helper wrapper ice_get_vf_by_dev() which takes the VF PCI device and
calls ice_get_vf_by_id() using pci_iov_vf_id() to get the VF ID.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Wed, 18 Jun 2025 22:24:42 +0000 (15:24 -0700)]
ice: avoid rebuilding if MSI-X vector count is unchanged
Commit 05c16687e0cc ("ice: set MSI-X vector count on VF") added support to
change the vector count for VFs as part of ice_sriov_set_msix_vec_count().
This function modifies and rebuilds the target VF with the requested number
of MSI-X vectors.
Future support for live migration will add a call to
ice_sriov_set_msix_vec_count() to ensure that a migrated VF has the proper
MSI-X vector count. In most cases, this request will be to set the MSI-X
vector count to its current value. In that case, no work is necessary.
Rather than requiring the caller to check this, update the function to
check and exit early if the vector count is already at the requested value.
This avoids an unnecessary VF rebuild.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Wed, 18 Jun 2025 22:24:41 +0000 (15:24 -0700)]
ice: use pci_iov_vf_id() to get VF ID
The ice_sriov_set_msix_vec_count() obtains the VF device ID in a strange
way by iterating over the possible VF IDs and calling
pci_iov_virtfn_devfn to calculate the device and function combos and
compare them to the pdev->devfn.
This is unnecessary. The pci_iov_vf_id() helper already exists which does
the reverse calculation of pci_iov_virtfn_devfn(), which is much simpler
and avoids the loop construction. Use this instead.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Wed, 18 Jun 2025 22:24:40 +0000 (15:24 -0700)]
ice: expose VF functions used by live migration
The live migration process will require configuring the target VF with the
data provided from the source host. A few helper functions in ice_sriov.c
and ice_virtchnl.c will be needed for this process, but are currently
static.
Expose these functions in their respective headers so that the live
migration module can use them during the migration process.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Wed, 18 Jun 2025 22:24:38 +0000 (15:24 -0700)]
ice: save RSS hash configuration for migration
The VF can program the RSS hash configuration over virtchnl. It does this
by sending a u64 bitmask which represents the current hash configuration.
It is not trivial to reverse the hardware configuration back to this hash
set for migration. Instead, save the value to the ice_vf structure when its
modified by the VF.
The rss_hashcfg value is an 8-byte field. Make room for it in ice_vf by
re-arranging some of the existing fields. There is a 4-byte gap after the
first_vector_idx, and a 4-byte gap between max_tx_rate and vf_states. Move
first_vector_idx into the later 4-byte gap, creating an 8 byte area where
rss_hashcfg can be placed. Also move the num_msix field near min_tx_rate,
filling 2 bytes of a 3 byte hole.
The end result of these changes enables placing the rss_hashcfg field into
the structure while also saving 8 bytes in size. It looks like there are a
handful of more possible cleanups to reduce the size even further, but
those have been left as a future cleanup.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Wed, 18 Jun 2025 22:24:37 +0000 (15:24 -0700)]
ice: add functions to get and set Tx queue context
The live migration driver will need to save and restore the Tx queue
context state from the hardware registers. This state contains both static
fields which do not change during Tx traffic as well as dynamic fields
which may change during Tx traffic.
Unlike the Rx context, the Tx queue context is accessed indirectly from
GLCOMM_QTX_CNTX_CTL and GLCOMM_QTX_CNTX_DATA registers. These registers are
shared by multiple PFs on the same PCIe card. Multiple PFs cannot safely
access the registers simultaneously, and there is no hardware semaphore or
logic to control access. To handle this, introduce the txq_ctx_lock to the
ice_adapter structure. This is similar to the ptp_gltsyn_time_lock. All PFs
on the same adapter share this structure, and use it to serialize access to
the registers to prevent error.
Add a new functions to get and set the Tx queue context through the
GLCOMM_QTX_CNTX_CTL interface. The hardware context values are stored in
the registers using the same packed format as the Admin Queue buffer.
The hardware buffer is 40 bytes wide, as it contains an additional 18 bytes
of internal state not sent with the Admin Queue buffer. For this reason, a
separate typedef and packing function must be used. We can share the same
packed fields definitions because we never need to unpack the internal
state. This is preferred, as it ensures the internal state is zero'd when
writing into HW, and avoids issues with reading by u32 registers into a
buffer of 22 bytes in length. Thanks to the typedefs, misuse of the API
with the wrong size buffer can easily be caught at compile time.
Note reading this data from hardware is essential because the current Tx
queue context may be different from the context as initially programmed by
the driver during VF initialization. When migrating a VF we must ensure the
target VF has identical context as the source VF did.
Co-developed-by: Yahui Cao <yahui.cao@intel.com> Signed-off-by: Yahui Cao <yahui.cao@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jakub Kicinski [Thu, 10 Jul 2025 20:32:35 +0000 (13:32 -0700)]
Merge branch 'virtio_udp_tunnel_08_07_2025' of https://github.com/pabeni/linux-devel
Paolo Abeni says:
====================
virtio: introduce GSO over UDP tunnel
Some virtualized deployments use UDP tunnel pervasively and are impacted
negatively by the lack of GSO support for such kind of traffic in the
virtual NIC driver.
The virtio_net specification recently introduced support for GSO over
UDP tunnel, this series updates the virtio implementation to support
such a feature.
Currently the kernel virtio support limits the feature space to 64,
while the virtio specification allows for a larger number of features.
Specifically the GSO-over-UDP-tunnel-related virtio features use bits
65-69.
The first four patches in this series rework the virtio and vhost
feature support to cope with up to 128 bits. The limit is set by
a define and could be easily raised in future, as needed.
This implementation choice is aimed at keeping the code churn as
limited as possible. For the same reason, only the virtio_net driver is
reworked to leverage the extended feature space; all other
virtio/vhost drivers are unaffected, but could be upgraded to support
the extended features space in a later time.
The last four patches bring in the actual GSO over UDP tunnel support.
As per specification, some additional fields are introduced into the
virtio net header to support the new offload. The presence of such
fields depends on the negotiated features.
New helpers are introduced to convert the UDP-tunneled skb metadata to
an extended virtio net header and vice versa. Such helpers are used by
the tun and virtio_net driver to cope with the newly supported offloads.
Tested with basic stream transfer with all the possible permutations of
host kernel/qemu/guest kernel with/without GSO over UDP tunnel support.
====================
Jacob Keller [Wed, 18 Jun 2025 22:24:36 +0000 (15:24 -0700)]
ice: add support for reading and unpacking Rx queue context
In order to support live migration, the ice driver will need to read
certain data from the Rx queue context. This is stored in the hardware in a
packed format.
Since we use <linux/packing.h> for the mapping between the packed hardware
format and the unpacked structure, it is trivial to enable unpacking
support via the unpack_fields() function.
Add the ice_unpack_rxq_ctx() function based on the unpack_fields() API.
Re-use the same field definitions from the packing implementation.
Add ice_copy_rxq_ctx_from_hw() to copy the Rx queue context data from the
hardware registers.
Use these to implement ice_read_rxq_ctx() which will return the Rx queue
context to the caller in its unpacked ice_rlan_ctx struct.
This will enable the migration logic access to the relevant data about the
Rx device queues. It can easily be copied to the target system as part of
the migration payload, where it will be used to configure the Rx queues.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Merge tag 'net-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth.
Current release - regressions:
- tcp: refine sk_rcvbuf increase for ooo packets
- bluetooth: fix attempting to send HCI_Disconnect to BIS handle
- rxrpc: fix over large frame size warning
- eth: bcmgenet: initialize u64 stats seq counter
Previous releases - regressions:
- tcp: correct signedness in skb remaining space calculation
- sched: abort __tc_modify_qdisc if parent class does not exist
- vsock: fix transport_{g2h,h2g} TOCTOU
- rxrpc: fix bug due to prealloc collision
- tipc: fix use-after-free in tipc_conn_close().
- bluetooth: fix not marking Broadcast Sink BIS as connected
- phy: qca808x: fix WoL issue by utilizing at8031_set_wol()
- eth: am65-cpsw-nuss: fix skb size by accounting for skb_shared_info
Previous releases - always broken:
- netlink: fix wraparounds of sk->sk_rmem_alloc.
- atm: fix infinite recursive call of clip_push().
- eth:
- stmmac: fix interrupt handling for level-triggered mode in DWC_XGMAC2
- rtsn: fix a null pointer dereference in rtsn_probe()"
* tag 'net-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
net/sched: sch_qfq: Fix null-deref in agg_dequeue
rxrpc: Fix oops due to non-existence of prealloc backlog struct
rxrpc: Fix bug due to prealloc collision
MAINTAINERS: remove myself as netronome maintainer
selftests/net: packetdrill: add tcp_ooo-before-and-after-accept.pkt
tcp: refine sk_rcvbuf increase for ooo packets
net/sched: Abort __tc_modify_qdisc if parent class does not exist
net: ethernet: ti: am65-cpsw-nuss: Fix skb size by accounting for skb_shared_info
net: thunderx: avoid direct MTU assignment after WRITE_ONCE()
selftests/tc-testing: Create test case for UAF scenario with DRR/NETEM/BLACKHOLE chain
atm: clip: Fix NULL pointer dereference in vcc_sendmsg()
atm: clip: Fix infinite recursive call of clip_push().
atm: clip: Fix memory leak of struct clip_vcc.
atm: clip: Fix potential null-ptr-deref in to_atmarpd().
net: phy: smsc: Fix link failure in forced mode with Auto-MDIX
net: phy: smsc: Force predictable MDI-X state on LAN87xx
net: phy: smsc: Fix Auto-MDIX configuration when disabled by strap
net: stmmac: Fix interrupt handling for level-triggered mode in DWC_XGMAC2
rxrpc: Fix over large frame size warning
net: airoha: Fix an error handling path in airoha_probe()
...
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Many patches, pretty much all of them small, that accumulated while I
was on vacation.
ARM:
- Remove the last leftovers of the ill-fated FPSIMD host state
mapping at EL2 stage-1
- Fix unexpected advertisement to the guest of unimplemented S2 base
granule sizes
- Gracefully fail initialising pKVM if the interrupt controller isn't
GICv3
- Also gracefully fail initialising pKVM if the carveout allocation
fails
- Fix the computing of the minimum MMIO range required for the host
on stage-2 fault
- Fix the generation of the GICv3 Maintenance Interrupt in nested
mode
x86:
- Reject SEV{-ES} intra-host migration if one or more vCPUs are
actively being created, so as not to create a non-SEV{-ES} vCPU in
an SEV{-ES} VM
- Use a pre-allocated, per-vCPU buffer for handling de-sparsification
of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too
large" issue
- Allow out-of-range/invalid Xen event channel ports when configuring
IRQ routing, to avoid dictating a specific ioctl() ordering to
userspace
- Conditionally reschedule when setting memory attributes to avoid
soft lockups when userspace converts huge swaths of memory to/from
private
- Add back MWAIT as a required feature for the MONITOR/MWAIT selftest
- Add a missing field in struct sev_data_snp_launch_start that
resulted in the guest-visible workarounds field being filled at the
wrong offset
- Skip non-canonical address when processing Hyper-V PV TLB flushes
to avoid VM-Fail on INVVPID
- Advertise supported TDX TDVMCALLs to userspace
- Pass SetupEventNotifyInterrupt arguments to userspace
- Fix TSC frequency underflow"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: avoid underflow when scaling TSC frequency
KVM: arm64: Remove kvm_arch_vcpu_run_map_fp()
KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes
KVM: arm64: Don't free hyp pages with pKVM on GICv2
KVM: arm64: Fix error path in init_hyp_mode()
KVM: arm64: Adjust range correctly during host stage-2 faults
KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi()
KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush
KVM: SVM: Add missing member in SNP_LAUNCH_START command structure
Documentation: KVM: Fix unexpected unindent warnings
KVM: selftests: Add back the missing check of MONITOR/MWAIT availability
KVM: Allow CPU to reschedule while setting per-page memory attributes
KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table.
KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks
KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL
KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight
KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities
KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt