]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
5 days agoigb: fix typos in comments
Maximilian Pezzullo [Tue, 9 Jun 2026 21:35:55 +0000 (14:35 -0700)] 
igb: fix typos in comments

Fix spelling errors in code comments:
 - e1000_nvm.c: 'likley' -> 'likely'
 - e1000_mac.c: 'auto-negotitation' -> 'auto-negotiation'
 - e1000_mbx.h: 'exra' -> 'extra'
 - e1000_defines.h: 'Aserted' -> 'Asserted'

Signed-off-by: Maximilian Pezzullo <maximilianpezzullo@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-15-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoe1000e: limit endianness conversion to boundary words
Agalakov Daniil [Tue, 9 Jun 2026 21:35:54 +0000 (14:35 -0700)] 
e1000e: limit endianness conversion to boundary words

[Why]
In e1000_set_eeprom(), the eeprom_buff is allocated to hold a range of
words. However, only the boundary words (the first and the last) are
populated from the EEPROM if the write request is not word-aligned.
The words in the middle of the buffer remain uninitialized because they
are intended to be completely overwritten by the new data via memcpy().

The previous implementation had a loop that performed le16_to_cpus()
on the entire buffer. This resulted in endianness conversion being
performed on uninitialized memory for all interior words.

Fix this by converting the endianness only for the boundary words
immediately after they are successfully read from the EEPROM.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Co-developed-by: Iskhakov Daniil <dish@amicon.ru>
Signed-off-by: Iskhakov Daniil <dish@amicon.ru>
Signed-off-by: Agalakov Daniil <ade@amicon.ru>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-14-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoe1000: limit endianness conversion to boundary words
Agalakov Daniil [Tue, 9 Jun 2026 21:35:53 +0000 (14:35 -0700)] 
e1000: limit endianness conversion to boundary words

[Why]
In e1000_set_eeprom(), the eeprom_buff is allocated to hold a range of
words. However, only the boundary words (the first and the last) are
populated from the EEPROM if the write request is not word-aligned.
The words in the middle of the buffer remain uninitialized because they
are intended to be completely overwritten by the new data via memcpy().

The previous implementation had a loop that performed le16_to_cpus()
on the entire buffer. This resulted in endianness conversion being
performed on uninitialized memory for all interior words.

Fix this by converting the endianness only for the boundary words
immediately after they are successfully read from the EEPROM.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Co-developed-by: Iskhakov Daniil <dish@amicon.ru>
Signed-off-by: Iskhakov Daniil <dish@amicon.ru>
Signed-off-by: Agalakov Daniil <ade@amicon.ru>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-13-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoe1000e: Use __napi_schedule_irqoff()
Matt Vollrath [Tue, 9 Jun 2026 21:35:52 +0000 (14:35 -0700)] 
e1000e: Use __napi_schedule_irqoff()

The __napi_schedule_irqoff() macro is intended to bypass saving and
restoring IRQ state when scheduling is requested from an IRQ handler,
where hard interrupts are already disabled. Use this macro in all three
interrupt handlers.

This was tested on a system with an I218-V and MSI interrupts. Because
this is an optimization, I was interested in measuring the impact, so I
added ktime_get() time measurement to e1000_intr_msi and a print of the
last sample in the watchdog task. For each test case I ran a
bi-directional iperf3 to saturate the line. With some help from awk,
here are the statistics.

49 samples each, all units ns
previous: min 678 max 1265 mean 879.429 median 806 stddev 137.188
noirq:    min 707 max 1165 mean 811.857 median 790 stddev  89.486

According to this informal comparison, the mean time to handle an
interrupt from start to finish is improved by about 8% under load.

Signed-off-by: Matt Vollrath <tactii@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Michal Cohen <michalx.cohen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-12-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoigc: use napi_schedule_irqoff() instead of napi_schedule()
Daiki Harada [Tue, 9 Jun 2026 21:35:51 +0000 (14:35 -0700)] 
igc: use napi_schedule_irqoff() instead of napi_schedule()

Replace napi_schedule() with napi_schedule_irqoff()
in the interrupt handler path in igc driver
Tested on Intel Corporation Ethernet Controller I226-V.

Suggested-by: Kohei Enju <kohei@enjuk.jp>
Signed-off-by: Daiki Harada <daiky0325@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-11-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoigb: use napi_schedule_irqoff() instead of napi_schedule()
Daiki Harada [Tue, 9 Jun 2026 21:35:50 +0000 (14:35 -0700)] 
igb: use napi_schedule_irqoff() instead of napi_schedule()

Replace napi_schedule() with napi_schedule_irqoff()
in the interrupt handler path in igb driver

Tested on QEMU with igb NIC emulation (-nic user,model=igb)

Suggested-by: Kohei Enju <kohei@enjuk.jp>
Signed-off-by: Daiki Harada <daiky0325@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-10-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoe1000e: use ktime_get_real_ns() in e1000e_systim_reset()
Aleksandr Loktionov [Tue, 9 Jun 2026 21:35:49 +0000 (14:35 -0700)] 
e1000e: use ktime_get_real_ns() in e1000e_systim_reset()

Replace ktime_to_ns(ktime_get_real()) with the direct equivalent
ktime_get_real_ns() in e1000e_systim_reset().  Using the combined helper
avoids the unnecessary intermediate ktime_t variable and makes the
intent clearer.

Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Suggested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-9-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoigb: use ktime_get_real helpers in igb_ptp_reset()
Aleksandr Loktionov [Tue, 9 Jun 2026 21:35:48 +0000 (14:35 -0700)] 
igb: use ktime_get_real helpers in igb_ptp_reset()

Replace ktime_to_ns(ktime_get_real()) with the direct equivalent
ktime_get_real_ns() and ktime_to_timespec64(ktime_get_real()) with
ktime_get_real_ts64() in igb_ptp_reset().  Using the combined helpers
makes the intent clearer.

Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-8-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoixgbe: e610: remove redundant assignment
Piotr Kwapulinski [Tue, 9 Jun 2026 21:35:46 +0000 (14:35 -0700)] 
ixgbe: e610: remove redundant assignment

Remove unnecessary code. No functional impact.

Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-6-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agonet/intel: Replace manual array size calculation with ARRAY_SIZE
Jakub Raczynski [Tue, 9 Jun 2026 21:35:45 +0000 (14:35 -0700)] 
net/intel: Replace manual array size calculation with ARRAY_SIZE

There are still places in the code where manual calculation of array size
exist, but it is good to enforce usage of single macro through the whole
code as it makes code bit more readable.
While at it, beautify condition surrounding it by reversing check and remove
unnecessary casting.

Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
Reviewed-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoiavf: iavf_virtchnl_completion: drop duplicate ether_addr_equal() test
Corinna Vinschen [Tue, 9 Jun 2026 21:35:44 +0000 (14:35 -0700)] 
iavf: iavf_virtchnl_completion: drop duplicate ether_addr_equal() test

This is just a simple cleanup fix.  Commit 35a2443d0910f ("iavf: Add
waiting for response from PF in set mac") introduced a duplicate
ether_addr_equal() check, so the current code tests the new MAC twice
against the former MAC.

Remove the outer ether_addr_equal() test, remnant of commit c5c922b3e09b
("iavf: fix MAC address setting for VFs when filter is rejected")

Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoice: remove redundant checks from PTP init
Natalia Wochtman [Tue, 9 Jun 2026 21:35:43 +0000 (14:35 -0700)] 
ice: remove redundant checks from PTP init

Remove unnecessary condition checks in ice_ptp_setup_adapter() and
ice_ptp_init(). They are duplicated in ice_pf_src_tmr_owned().

Change ice_ptp_setup_adapter() to return void.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Natalia Wochtman <natalia.wochtman@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-3-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoidpf: Replace use of system_unbound_wq with system_dfl_wq
Marco Crivellari [Tue, 9 Jun 2026 21:35:42 +0000 (14:35 -0700)] 
idpf: Replace use of system_unbound_wq with system_dfl_wq

This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:

   commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.

Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:

   system_wq -> system_percpu_wq
   system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-2-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoMerge branch 'octeontx2-af-npc-enhancements'
Jakub Kicinski [Sat, 13 Jun 2026 23:17:01 +0000 (16:17 -0700)] 
Merge branch 'octeontx2-af-npc-enhancements'

Ratheesh Kannoth says:

====================
octeontx2-af: npc: Enhancements.

This series extends Marvell octeontx2-af support for CN20K NPC (MCAM
debuggability, allocation policy, default-rule lifetime, optional KPU
profiles from firmware files, X2/X4 MCAM keyword handling in flows and
defaults, and dynamic CN20K NPC private state), adds a devlink mechanism
for multi-value parameters, and moves devlink_nl_param_fill() temporaries
to the heap so stack usage stays reasonable once union devlink_param_value
grows (patch 3).

Patch 1 enforces a single RVU admin-function PCI device in the kernel.
On Octeon series SoCs, hardware resources such as NPC, NIX and related
blocks are global and coordinated by the AF driver; PFs and VFs request
them through AF mailbox messages.  Firmware exposes only one AF PCI
function at boot, so two AF driver instances cannot both own that state.
rvu_probe() rejects a second bind with -EBUSY, logs a warning, clears the
probe gate on early allocation failures, and aligns the driver model with
hardware so reviewers and automation can rely on exactly one bound AF.

Patch 2 improves CN20K MCAM visibility in debugfs: mcam_layout marks
enabled entries, dstats reports per-entry hit deltas (baseline updated in
software after each read; hardware counters are not cleared), and mismatch
lists enabled entries without a PF mapping.

Patch 3 allocates the per-configuration-mode union devlink_param_value
buffers and struct devlink_param_gset_ctx used by devlink_nl_param_fill()
with kcalloc()/kzalloc_obj() and funnels failures through a single cleanup
path so the netlink reply path stays safe as the union grows.

Patch 4 (Saeed) introduces DEVLINK_PARAM_TYPE_U64_ARRAY and nested
DEVLINK_ATTR_PARAM_VALUE_DATA attributes so drivers and user space can
exchange bounded u64 arrays; YAML, uapi, and netlink validation are
updated.

Patch 5 adds a runtime devlink parameter srch_order to reorder CN20K
subbank search during MCAM allocation (the param uses the u64 array type
from patch 4).

Patch 6 ties default MCAM entries to NIX LF alloc/free on CN20K, adds
NIX_LF_DONT_FREE_DFT_IDXS for PF teardown paths that must not drop default
NPC indexes while the driver still owns state, and tightens nix_lf_alloc
error propagation.

Patch 7 allows loading a custom KPU profile from /lib/firmware/kpu via
module parameter kpu_profile, with cam2 / ptype_mask wiring and helpers
that share firmware-sourced vs filesystem-sourced profile layouts.

Patch 8 makes default-rule allocation, AF flow install, and PF-side RSS,
defaults, and ethtool flows respect the active CN20K MCAM keyword width
(X2 vs X4), including X4 reference-index masking and -EOPNOTSUPP when a
flow needs X4 keys on an X2-only profile.

Patch 9 replaces file-scope npc_priv and static dstats with allocation
sized from discovered bank/subbank geometry, threads npc_priv_get()
through CN20K NPC paths, and allocates dstats via devm_kzalloc for the
debugfs helper.

Patch 1 is ordered first so later patches assume a single bound AF.
Heap-backed devlink_nl_param_fill() sits immediately before the U64 array
param work so incremental builds stay stack-safe as the union grows; the
CN20K patches keep srch_order ahead of NIX LF coordination, optional KPU
profile load from firmware files, X2/X4 handling, and the npc_priv refactor
that touches the same files heavily.
====================

Link: https://patch.msgid.link/20260609040453.711932-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoocteontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically.
Ratheesh Kannoth [Tue, 9 Jun 2026 04:04:53 +0000 (09:34 +0530)] 
octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically.

Replace the file-scope static npc_priv with a kcalloc'd struct filled
from hardware bank/subbank geometry at init (num_banks is no longer a
const compile-time constant; drop init_done and use a non-NULL
npc_priv pointer for liveness). Thread npc_priv_get() / pointer access
through the CN20K NPC code paths, extend teardown to kfree the root
struct on failure and in npc_cn20k_deinit, and adjust MCAM section
setup to use the discovered subbank count.

Allocate MCAM debugfs dstats via devm_kzalloc instead of a static matrix,
and use the allocated backing store consistently when computing deltas
(including the counter rollover compare).

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-10-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoocteontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc
Ratheesh Kannoth [Tue, 9 Jun 2026 04:04:52 +0000 (09:34 +0530)] 
octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc

Default CN20K NPC rule allocation now keys off the active MCAM keyword
width: use X4 with a bank-masked reference index when the silicon uses
X4 keys, and X2 with the raw index otherwise (replacing the previous
always-X2 / eidx + 1 behaviour).

In the AF flow-install path, flows that need more than 256 key bits
query the NPC profile; if the platform is fixed to X2 entries, fail
with -EOPNOTSUPP instead of requesting X4. Otherwise select X4 for the
MCAM alloc.

On the PF, cache and pass the profile kw_type from npc_get_pfl_info
through otx2_mcam_pfl_info_get(), and use it when allocating MCAM
entries for RSS/defaults and when installing ethtool flows on CN20K,
including masking the reference index for X4 slot layout.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-9-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoocteontx2-af: npc: Support for custom KPU profile from filesystem
Ratheesh Kannoth [Tue, 9 Jun 2026 04:04:51 +0000 (09:34 +0530)] 
octeontx2-af: npc: Support for custom KPU profile from filesystem

Flashing updated firmware on deployed devices is cumbersome. Provide a
mechanism to load a custom KPU (Key Parse Unit) profile directly from
the filesystem at module load time.

When the rvu_af module is loaded with the kpu_profile parameter, the
specified profile is read from /lib/firmware/kpu and programmed into
the KPU registers. Add npc_kpu_profile_cam2 for the extended cam format
used by filesystem-loaded profiles and support ptype/ptype_mask in
npc_config_kpucam when profile->from_fs is set.

Usage:
  1. Copy the KPU profile file to /lib/firmware/kpu.
  2. Build OCTEONTX2_AF as a module.
  3. Load: insmod rvu_af.ko kpu_profile=<profile_name>

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-8-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoocteontx2: cn20k: Coordinate default rules with NIX LF lifecycle
Ratheesh Kannoth [Tue, 9 Jun 2026 04:04:50 +0000 (09:34 +0530)] 
octeontx2: cn20k: Coordinate default rules with NIX LF lifecycle

Add NIX_LF_DONT_FREE_DFT_IDXS so the PF can send NIX LF free during hw
reinit or teardown without the AF freeing CN20K default NPC rule indexes
while the driver still owns that state (otx2_init_hw_resources and
otx2_free_hw_resources).

On CN20K, allocate default NPC rules from NIX LF alloc before
nix_interface_init, roll back with npc_cn20k_dft_rules_free on failure,
and free from NIX LF free when the new flag is not set. Tighten
rvu_mbox_handler_nix_lf_alloc error handling: use a single rc, propagate
qmem_alloc and other errors, and set -ENOMEM only when kcalloc fails
(remove the blanket -ENOMEM at the free_mem path).

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-7-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoocteontx2-af: npc: cn20k: add subbank search order control
Ratheesh Kannoth [Tue, 9 Jun 2026 04:04:49 +0000 (09:34 +0530)] 
octeontx2-af: npc: cn20k: add subbank search order control

CN20K NPC MCAM is split into 32 subbanks that are searched in a
predefined order during allocation. Lower-numbered subbanks have
higher priority than higher-numbered ones.

Add a runtime "srch_order" to control the order in which
subbanks are searched during MCAM allocation.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-6-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agodevlink: Implement devlink param multi attribute nested data values
Saeed Mahameed [Tue, 9 Jun 2026 04:04:48 +0000 (09:34 +0530)] 
devlink: Implement devlink param multi attribute nested data values

Devlink param value attribute is not defined since devlink is handling
the value validating and parsing internally, this allows us to implement
multi attribute values without breaking any policies.

Devlink param multi-attribute values are considered to be dynamically
sized arrays of u64 values, by introducing a new devlink param type
DEVLINK_PARAM_TYPE_U64_ARRAY, driver and user space can set a variable
count of u64 values into the DEVLINK_ATTR_PARAM_VALUE_DATA attribute.

Implement get/set parsing and add to the internal value structure passed
to drivers.

This is useful for devices that need to configure a list of values for
a specific configuration.

example:
$ devlink dev param show pci/... name multi-value-param
name multi-value-param type driver-specific
values:
cmode permanent value: 0,1,2,3,4,5,6,7

$ devlink dev param set pci/... name multi-value-param \
value 4,5,6,7,0,1,2,3 cmode permanent

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-5-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agodevlink: heap-allocate param fill buffers in devlink_nl_param_fill
Ratheesh Kannoth [Tue, 9 Jun 2026 04:04:47 +0000 (09:34 +0530)] 
devlink: heap-allocate param fill buffers in devlink_nl_param_fill

devlink_nl_param_fill() kept two per-configuration-mode copies of
union devlink_param_value plus a struct devlink_param_gset_ctx on the
stack while building the Netlink reply. Allocate those with kcalloc()
and kzalloc_obj() instead, and route failures through a single cleanup
path so temporary buffers are always freed.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-4-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoocteontx2-af: npc: cn20k: debugfs enhancements
Ratheesh Kannoth [Tue, 9 Jun 2026 04:04:46 +0000 (09:34 +0530)] 
octeontx2-af: npc: cn20k: debugfs enhancements

Improve MCAM visibility and field debugging for CN20K NPC.

- Extend "mcam_layout" to show enabled (+) or disabled state per entry
  so status can be verified without parsing the full "mcam_entry" dump.
- Add "dstats" debugfs entry: for enabled MCAM indices, print hit deltas
  since the prior read by comparing hardware counters to a per-entry
  software baseline and advancing that baseline after each read (hardware
  counters are not cleared).
- Add "mismatch" debugfs entry: lists MCAM entries that are enabled
  but not explicitly allocated, helping diagnose allocation/field issues.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-3-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoocteontx2-af: enforce single RVU AF probe
Ratheesh Kannoth [Tue, 9 Jun 2026 04:04:45 +0000 (09:34 +0530)] 
octeontx2-af: enforce single RVU AF probe

On Octeon series SoCs, the AF is an integrated device within the SoC, and
hardware resources such as NPC, NIX and related blocks are global and
coordinated by the AF driver.  Physical and virtual functions request those
resources via AF mailbox messages, so two AF driver instances cannot both
own that global state; firmware exposes only one AF PCI function at boot
and any further octeontx2-af PCI probe returns -EBUSY so software matches
the single-AF model.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260609040453.711932-2-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoMerge branch 'net-stmmac-fixes-for-maximum-tx-rx-queues-to-use-by-driver'
Jakub Kicinski [Sat, 13 Jun 2026 23:00:33 +0000 (16:00 -0700)] 
Merge branch 'net-stmmac-fixes-for-maximum-tx-rx-queues-to-use-by-driver'

Jakub Raczynski says:

====================
net/stmmac: Fixes for maximum TX/RX queues to use by driver

When contributing other changes preparing functions for new XGMAC hardware
https://lore.kernel.org/netdev/20260601162537.553512-1-j.raczynski@samsung.com/
there have been reports by Sashiko AI.

All of issues are wrong DTS configuration, but kernel needs to handle it.
====================

Link: https://patch.msgid.link/20260611113358.3379518-1-j.raczynski@samsung.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agonet/stmmac: Apply MTL_MAX queue limit if config missing
Jakub Raczynski [Thu, 11 Jun 2026 11:33:58 +0000 (13:33 +0200)] 
net/stmmac: Apply MTL_MAX queue limit if config missing

When "snps,rx-queues-to-use" or "tx-queues-to-use" config in DTS is provided
current code will apply U8_MAX value for queues_to_use if there is input of
higher value. But actual maximum number of supported queues is set via
macro MTL_MAX_RX_QUEUES and MTL_MAX_TX_QUEUES, which currently have value of 8.

This value of U8_MAX will be capped to value provided by core in DMA
capabilities (dma_conf), but it does so only if core provides it.
This is true for XGMAC (dwxgmac2) and some GMAC (dwmac4),
but not for (dwmac1000). This capping is at later stage in stmmac_hw_init(),
and during stmmac_mtl_setup() we might parse fields outside allocated memory
if queues_to_use is over defines MTL_MAX_ values,
for example following rx_queues_cfg is array of size of MTL_MAX_RX_QUEUES.

Fix this by capping value to MTL_MAX during config parsing.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260611113358.3379518-3-j.raczynski@samsung.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agonet/stmmac: Apply TBS config only to used queues
Jakub Raczynski [Thu, 11 Jun 2026 11:33:57 +0000 (13:33 +0200)] 
net/stmmac: Apply TBS config only to used queues

While opening stmmac driver, there is enabling of TBS (Time-Based Scheduling)
option in dma config. Currently this is executed for all possible TX queues via
MTL_MAX_TX_QUEUES macro, but actual number of queues used might differ.
While setting this is generally harmless, since memory for MTL_MAX_TX_QUEUES
is allocated, it is incorrect, because it prepares config for unused queues.

Change this to apply tbs config only to tx_queues_to_use.

Co-developed-by: Chang-Sub Lee <cs0617.lee@samsung.com>
Signed-off-by: Chang-Sub Lee <cs0617.lee@samsung.com>
Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260611113358.3379518-2-j.raczynski@samsung.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agonet: airoha: Fix debugfs new-tuple display for IPv4 ROUTE entries
Wayen.Yan [Thu, 11 Jun 2026 23:09:56 +0000 (07:09 +0800)] 
net: airoha: Fix debugfs new-tuple display for IPv4 ROUTE entries

In airoha_ppe_debugfs_foe_show(), the second switch statement falls
through from PPE_PKT_TYPE_IPV4_HNAPT/DSLITE to PPE_PKT_TYPE_IPV4_ROUTE,
accessing hwe->ipv4.new_tuple for all three types. However, IPv4 ROUTE
(3-tuple) entries do not contain a valid new_tuple — this field is only
meaningful for NATted flows (HNAPT/DSLITE). For ROUTE entries, the
memory at the new_tuple offset holds routing information, not NAT data,
so displaying "new=" produces garbage output.

Display new_tuple only for HNAPT and DSLITE, and let IPV4_ROUTE fall
through to the default case.

Fixes: 3fe15c640f38 ("net: airoha: Introduce PPE debugfs support")
Link: https://lore.kernel.org/6a2b40ea.4dd82583.3a5c46.e5a2@mx.google.com
Signed-off-by: Wayen.Yan <win847@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/6a2be54b.ef98c1b2.3c3224.2ed8@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agonet: airoha: Fix register index for Tx-fwd counter configuration
Wayen.Yan [Thu, 11 Jun 2026 23:09:13 +0000 (07:09 +0800)] 
net: airoha: Fix register index for Tx-fwd counter configuration

In airoha_qdma_init_qos_stats(), the Tx-fwd counter configuration
register uses the same index (i << 1) as the Tx-cpu counter, which
overwrites the Tx-cpu configuration. The Tx-fwd counter value register
correctly uses (i << 1) + 1, so the configuration register should use
the same index.

Fix the REG_CNTR_CFG index from (i << 1) to ((i << 1) + 1) so that
the Tx-fwd counter is properly configured instead of clobbering the
Tx-cpu counter config.

Fixes: 20bf7d07c956 ("net: airoha: Add sched ETS offload support")
Signed-off-by: Wayen.Yan <win847@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/6a2b40e7.4dd82583.3a5c46.e566@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agotipc: restrict socket queue dumps in enqueue tracepoints
Li Xiasong [Thu, 11 Jun 2026 13:56:47 +0000 (21:56 +0800)] 
tipc: restrict socket queue dumps in enqueue tracepoints

tipc_sk_enqueue() runs with sk->sk_lock.slock held while the socket is
owned by user context. The spinlock protects the backlog queue in this
path, but it does not serialize against the socket owner consuming or
purging sk_receive_queue.

KASAN reported:

  CPU: 14 UID: 0 PID: 1050 Comm: tipc3 Not tainted 7.1.0-rc6+ #126 PREEMPT(lazy)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  Call Trace:
    <TASK>
    dump_stack_lvl+0x76/0xa0 lib/dump_stack.c:123
    print_report+0xce/0x5b0 mm/kasan/report.c:482
    kasan_report+0xc6/0x100 mm/kasan/report.c:597
    __asan_report_load4_noabort+0x14/0x30 mm/kasan/report_generic.c:380
    tipc_skb_dump+0x1327/0x16f0 net/tipc/trace.c:73
    tipc_list_dump+0x208/0x2e0 net/tipc/trace.c:187
    tipc_sk_dump+0xaf6/0xd60 net/tipc/socket.c:3996
    trace_event_raw_event_tipc_sk_class+0x312/0x5a0 net/tipc/trace.h:188
    tipc_sk_rcv+0xb1d/0x1d50 net/tipc/socket.c:2497
    tipc_node_xmit+0x1c3/0x1440 net/tipc/node.c:1689
    __tipc_sendmsg+0x97a/0x1440 net/tipc/socket.c:1512
    tipc_sendmsg+0x52/0x80 net/tipc/socket.c:1400
    sock_sendmsg+0x2f6/0x3e0 net/socket.c:825
    splice_to_socket+0x7f9/0x1010 fs/splice.c:884
    do_splice+0xe21/0x2330 fs/splice.c:936
    __do_splice+0x153/0x260 fs/splice.c:1431
    __x64_sys_splice+0x150/0x230 fs/splice.c:1616
    x64_sys_call+0xeb5/0x2790 arch/x86/entry/syscall_64.c:41
    do_syscall_64+0xf3/0x620 arch/x86/entry/syscall_64.c:63
    entry_SYSCALL_64_after_hwframe+0x76/0x7e arch/x86/entry/entry_64.S:130
  RIP: 0033:0x71624e8aafe2
  Code: 08 0f 85 71 3a ff ff 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66
  RSP: 002b:0000716157ffed68 EFLAGS: 00000246 ORIG_RAX: 0000000000000113
  RAX: ffffffffffffffda RBX: 0000716157fff6c0 RCX: 000071624e8aafe2
  RDX: 000000000000005f RSI: 0000000000000000 RDI: 0000000000000066
  RBP: 0000716157ffed90 R08: 0000000000008000 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffff00
  R13: 0000000000000021 R14: 0000000000000000 R15: 00007fff89799c40
    </TASK>

The TIPC_DUMP_ALL tracepoints in tipc_sk_enqueue() also dump
sk_receive_queue and can therefore dereference skbs that the socket
owner has already dequeued or freed. Restrict these dumps to
TIPC_DUMP_SK_BKLGQ, which matches the queue protected by the held
spinlock.

Keep the change limited to the enqueue path, where the unsafe queue dump
is reachable while the socket is owned by user context.

Fixes: 01e661ebfbad ("tipc: add trace_events for tipc socket")
Cc: stable@vger.kernel.org
Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
Link: https://patch.msgid.link/20260611135647.3666727-1-lixiasong1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agonet: airoha: better handle MIBs for GDM ports with multiple devs attached
Lorenzo Bianconi [Thu, 11 Jun 2026 10:43:00 +0000 (12:43 +0200)] 
net: airoha: better handle MIBs for GDM ports with multiple devs attached

In the context of a GDM port that can have multiple net_devices attached
(GDM3 and GDM4), the HW counters (MIBs) are global for the GDM port.
This cause duplicated stats reported to the kernel for the related
net_device.
The SoC supports a split MIB feature where each counter is tracked based
on the relevant HW channel (NBQ) to account for this scenario and
provide a way to select the related counter on accessing the MIB
registers.
Enable this feature for GDM3 and GDM4 and configure the relevant HW
channel before updating the HW stats to report correct HW counter to the
kernel for the related interface.
Move the stats struct from port to dev since HW counter are now specific
to the network device instead of the GDM port. Refactor
airoha_update_hw_stats() to take airoha_eth and airoha_gdm_port
parameters since the function operates on the entire port.

Co-developed-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260611-airoha-eth-multi-serdes-stats-v1-1-42442ae42064@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoocteontx2-af: fix NPC mailbox codes in mbox.h
Ratheesh Kannoth [Thu, 11 Jun 2026 08:33:30 +0000 (14:03 +0530)] 
octeontx2-af: fix NPC mailbox codes in mbox.h

Several NPC mailbox command IDs in the 0x601x range were assigned out of
order. Renumber and reorder the M() definitions so each opcode matches
the stable contract expected by userspace tools and applications.

Fixes: 4e527f1e5c15 ("octeontx2-af: npc: cn20k: Add new mailboxes for CN20K silicon")
Cc: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260611083330.1652181-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agodt-bindings: net: dsa: Convert lan9303.txt to yaml format
Frank Li [Wed, 10 Jun 2026 15:05:30 +0000 (11:05 -0400)] 
dt-bindings: net: dsa: Convert lan9303.txt to yaml format

Convert lan9303.txt to yaml format to fix below CHECK_DTBS warnings:
arch/arm/boot/dts/nxp/imx/imx53-kp-hsc.dtb: /soc/bus@50000000/i2c@53fec000/switch@a: failed to match any schema with compatible: ['smsc,lan9303-i2c']

Additional changes:
  - rename switch-phy to switch in example.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260610150533.515914-1-Frank.Li@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoethernet: 3c509: Improve style of pnp_device_id array terminator
Uwe Kleine-König (The Capable Hub) [Wed, 10 Jun 2026 09:46:53 +0000 (11:46 +0200)] 
ethernet: 3c509: Improve style of pnp_device_id array terminator

To match how device-id array terminators look like for other device
types drop `.id = ""` from it and let the compiler care for zeroing the
entry.

There are no changes in the compiled drivers, only the source looks
nicer.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://patch.msgid.link/a0cd057e6a24b9d355b5e4bdfcdb812cdd1e4652.1781082923.git.u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agonet: bcmgenet: Use weighted round-robin TX DMA arbitration
Ovidiu Panait [Wed, 10 Jun 2026 08:52:38 +0000 (08:52 +0000)] 
net: bcmgenet: Use weighted round-robin TX DMA arbitration

Under heavy network traffic, we observed sporadic TX queue timeouts on the
Raspberry Pi 4. The timeouts can be reproduced by stress testing the TX
path with multiple concurrent iperf UDP streams:

    iperf3 -c <ip> -u -b0 -P16 -t60
    NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 2044 ms
    NETDEV WATCHDOG: CPU: 3: transmit queue 0 timed out 2004 ms

Investigation showed that the timeouts are caused by the priority-based
arbiter. Under heavy load the highest priority queue starves the lower
priority ones, causing timeouts. The TX strict priority arbiter is not
suitable for the default use case where all the traffic gets spread
across all the TX queues.

Therefore, to fix this, switch the TX DMA arbiter to Weighted Round-Robin,
which services all queues, so they do not stall. The weights were chosen
to follow the existing priority scheme: q0 gets the smallest weight, while
q1-4 get the bulk of the TX bandwidth.

Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Link: https://patch.msgid.link/20260610085238.56300-1-ovidiu.panait.rb@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoselftests: net: add test for IPv4 devconf netlink notifications
Fernando Fernandez Mancera [Tue, 9 Jun 2026 20:45:20 +0000 (22:45 +0200)] 
selftests: net: add test for IPv4 devconf netlink notifications

Introduce a new test, `ipv4_devconf_notify`, to verify that the kernel
sends the appropriate netlink notifications when IPv4 devconf parameters
are modified.

The test depends on the newly introduced iproute2 command:

`ip link set dev <ifname> inet`

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260609204520.4670-3-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoipv4: handle devconf post-set actions on netlink updates
Fernando Fernandez Mancera [Tue, 9 Jun 2026 20:45:19 +0000 (22:45 +0200)] 
ipv4: handle devconf post-set actions on netlink updates

When IPv4 device configuration parameters are updated via netlink, the
kernel currently only updates the value. This bypasses several
post-modification actions that occur when these same parameters are
updated via sysctl, such as flushing the routing cache or emitting
RTM_NEWNETCONF notifications.

This patch addresses the inconsistency by calling the
devinet_conf_post_set() helper inside inet_set_link_af(). If a flush is
required, we defer it until the netlink attribute parsing loop
completes.

This ensures consistent behavior and side-effects for devconf changes,
regardless of whether they are initiated via sysctl or netlink.

Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260609204520.4670-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoipv4: centralize devconf sysctl handling
Fernando Fernandez Mancera [Tue, 9 Jun 2026 20:45:18 +0000 (22:45 +0200)] 
ipv4: centralize devconf sysctl handling

The logic for handling IPv4 devconf sysctls is scattered. Notification
and cache flushes are managed in devinet_conf_proc(), while a separate
ipv4_doint_and_flush() function and DEVINET_SYSCTL_FLUSHING_ENTRY macro
is used for properties that solely require a cache flush.

This patch refactors the sysctl handling by introducing a centralized
helper, devinet_conf_post_set(). This new function evaluates the changed
attribute and handles all necessary operations like triggering netlink
notifications. It returns a boolean indicating whether a routing cache
flush is required.

Note that the boolean is necessary as this function will be re-used for
netlink IPv4 devconf handling where the cache flushing must wait until
all the attributes have been processed.

Finally, this is introducing a small change in behavior for
IPV4_DEVCONF_ROUTE_LOCALNET. As commit d0daebc3d622 ("ipv4: Add
interface option to enable routing of 127.0.0.0/8") intended, the cache
flush should only be performed when ROUTE_LOCALNET changes from 1 to 0.
Unfortunately, this was not true because while implementing it the
DEVINET_SYSCTL_FLUSHING_ENTRY was used for the attribute, making the
code related to it on devinet_conf_proc() dead.

IPV4_DEVCONF_FORWARDING is still being handled separately as it requires
more operations.

Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260609204520.4670-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agotcp: refine tcp_sequence() for the FIN exception
Eric Dumazet [Mon, 8 Jun 2026 15:14:52 +0000 (15:14 +0000)] 
tcp: refine tcp_sequence() for the FIN exception

Commit 0e24d17bd966 ("tcp: implement RFC 7323 window retraction
receiver requirements") removed the special FIN case that
was added in commit 1e3bb184e941 ("tcp: re-enable acceptance of
FIN packets when RWIN is 0").

If a peer sends a segment containing data and a FIN flag before
it learns about our window retraction and has a buggy TCP stack,
it might place the FIN one byte beyond what it thinks is the
right edge of the window (i.e., max_window_edge + 1).

The data portion (end_seq - th->fin) will end exactly at max_window_edge.
In this case, we will drop the packet if our receive queue is not empty,
even though the data was sent within the window we previously allowed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Simon Baatz <gmbnomis@gmail.com>
Link: https://patch.msgid.link/20260608151452.706822-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoMerge branch 'dpll-ice-add-generic-dpll-type-and-full-tx-reference-clock-control...
Jakub Kicinski [Sat, 13 Jun 2026 20:24:38 +0000 (13:24 -0700)] 
Merge branch 'dpll-ice-add-generic-dpll-type-and-full-tx-reference-clock-control-for-e825'

Grzegorz Nitka says:

====================
dpll/ice: Add generic DPLL type and full TX reference clock control for E825

NOTE: This series is intentionally submitted on net-next (not
intel-wired-lan) as early feedback of DPLL subsystem changes is
welcomed. In the past possible approaches were discussed in [1].

This series adds TX reference clock support for E825 devices and exposes
TX clock selection and synchronization status via the Linux DPLL
subsystem.

Here is the high-level connection diagram for E825 device:
  +------------------------------------------------------------------+
  |                                                                  |
  |                           +-----------------------------+        |
  |                           |                             |        |
  |                           |         MAC                 |        |
  |                           |+------------+-----+         |        |
  |                           ||RX/1588 |PHC|tspll<----\    |        |
+---+----+                    ||MUX     +---+-^---|    |    |        |
| E | RX >--------------------->              |   >--\ |    |        |
| T |    |    /---------------->              |   >-\| |    |        |
| H |----+    |               |+---------+----^---+ || |    |        |
| 1 | TX <----|----------------+TX MUX   < OCXO   | || |    |        |
|   |PLL |    |               ||         |--------| || |    |        |
+---+----+    |           /----+         <-ext_ref<-||-|----|------ext_ref
| E | RX >----/           |   ||         |--------+ || |    |        |
| T |    |                |   ||         <  SyncE | || |    |        |
| H |----+                |   |+-----------^------+ || |    |        |
| 2 | TX <----------------/   |            | /------||-/    |        |
|   |PLL |                    +------------|-|------||------+        |
+---+----+                              /--/ |      ||               |
| . | RX >---                           |    |      ||               |
| . |    |                   +----------|----|------||--+            |
| . |----+                   |        +-^-+--^+     ||  |            |
|   | TX <---                |        |EEC|PPS|     ||  |            |
|   |PLL |                   |        +-------+     ||  |            |
+---+----+                   |        |       <-CLK0/|  |            |
| E | RX >---                |        |  DPLL |      |  |            |
| T |    |                   |        |       <-CLK1-/  |            |
| H |----+                   |        |       |         |            |
| X | TX <---                |        |       <---SMA---<            |
|   |PLL |                   |        |       |         |            |
+---+----+                   |        |       <---GPS---<            |
  |                          |        |       |         |            |
  |                          |        |       <---...---<            |
  |                          |        |       |         |            |
  |                          |        +-------+         |            |
  |                          | External timing module   |            |
  |                          +--------------------------+            |
  +------------------------------------------------------------------+

E825 hardware contains a dedicated TX clock domain with per-port source
selection behavior that is distinct from PPS handling and from board-level
EEC distribution. TX reference clock selection is device-wide, shared
across ports, and mediated by firmware as part of link bring-up. As a
result, TX clock selection intent may differ from effective hardware
configuration, and software must verify outcome after link-up.

To support this, the series extends the DPLL core and the ice driver
incrementally. The series also introduces DPLL_TYPE_GENERIC as a broad
UAPI class for DPLL instances outside PPS/EEC categories. The intent is
to keep type naming reusable and scalable across different ASIC
topologies while preserving functional discoverability via
driver/device context and pin topology.

This follows netdev discussion guidance that UAPI type naming should avoid
location-specific or vendor-specific taxonomy, because such labels do not
scale across different ASIC designs. The function of a given DPLL instance
is already discoverable from driver/device context and pin topology, and
does not require an additional narrow type identifier in UAPI.

At the same time, a separate DPLL object is still needed for E825 TX clock
control/reporting semantics. Using DPLL_TYPE_GENERIC provides a reusable
class for devices outside PPS/EEC without overfitting UAPI naming to one
topology.

The relevant discussion is in [2].

Series content
- add a new generic DPLL type for devices outside PPS/EEC classes;
- relax DPLL pin registration rules for firmware-described shared pins
  and extend pin notifications with a source identifier;
- allow dynamic state control of SyncE reference pins where hardware
  supports it;
- add CPI infrastructure for PHY-side TX clock control on E825C;
- introduce a TX-clock DPLL device and TX reference clock pins
  (EXT_EREF0 and SYNCE) in the ice driver;
- extend the Restart Auto-Negotiation command to carry a TX reference
  clock index;
- implement hardware-backed TX reference clock switching, post-link
  verification, and TX synchronization reporting.

TXCLK pins report TX reference topology only. Actual synchronization
success is reported via DPLL lock status, updated after hardware
verification: external TX references report LOCKED, while the internal
ENET/TXCO source reports UNLOCKED.

This provides reliable TX reference selection and observability on E825
devices using standard DPLL interfaces, without conflating user intent
with effective hardware behavior.

[1] https://lore.kernel.org/netdev/20250905160333.715c34ac@kernel.org/
[2] https://lore.kernel.org/netdev/20260402230626.3826719-1-grzegorz.nitka@intel.com/
====================

Link: https://patch.msgid.link/20260607183045.1213735-1-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoice: implement E825 TX ref clock control and TXC hardware sync status
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:45 +0000 (20:30 +0200)] 
ice: implement E825 TX ref clock control and TXC hardware sync status

Build on the previously introduced TXC DPLL framework and implement
full TX reference clock control and hardware-backed synchronization
status reporting for E825 devices.

E825 firmware may accept or override TX reference clock requests based
on device-wide routing constraints and link conditions. Because the
final selection becomes visible only after a link-up event, the driver
splits the observation into two complementary signals:

  - TXCLK pin state reflects the requested TX reference clock
    (pf->ptp.port.tx_clk_req). After a link-up, the value is reconciled
    against the SERDES reference selector by
    ice_txclk_update_and_notify(); if firmware or auto-negotiation
    selected a different clock, tx_clk_req is overwritten so that pin
    state converges to the actual hardware selection.

  - TXC DPLL lock status reflects hardware synchronization:
      * LOCKED   when an external TX reference is in use
      * UNLOCKED when falling back to ENET/TXCO, or when a requested
        external reference has not (yet) been accepted by hardware.

Userspace observing only pin state therefore sees user intent, while
lock status is the authoritative indicator of whether the requested
clock is actually selected and synchronizing. This matches the DPLL
subsystem model where pin state describes topology and device lock
status describes signal quality.

TX reference selection topology:
  - External references (SYNCE, EREF0) are represented as TXCLK pins
  - The internal ENET/TXCO clock has no pin representation; when
    selected, all TXCLK pins are reported DISCONNECTED

With this change, TX reference clocks on E825 devices can be reliably
selected, observed via standard DPLL interfaces, and monitored for
effective synchronization through TXC DPLL lock status.

Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-14-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoice: add Tx reference clock index handling to AN restart command
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:44 +0000 (20:30 +0200)] 
ice: add Tx reference clock index handling to AN restart command

Extend the Restart Auto-Negotiation (AN) AdminQ command with a new
parameter allowing software to specify the Tx reference clock index to
be used during link restart.

This patch:
 - adds REFCLK field definitions to ice_aqc_restart_an
 - updates ice_aq_set_link_restart_an() to take a new refclk parameter
   and properly encode it into the command
 - keeps legacy behavior by passing REFCLK_NOCHANGE where appropriate

This prepares the driver for configurations requiring dynamic selection
of the Tx reference clock as part of the AN flow.

Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-13-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoice: implement CPI support for E825C
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:43 +0000 (20:30 +0200)] 
ice: implement CPI support for E825C

Add full CPI (Converged PHY Interface) command handling required for
E825C devices. The CPI interface allows the driver to interact with
PHY-side control logic through the LM/PHY command registers, including
enabling/disabling/selection of PHY reference clock.

This patch introduces:
 - a new CPI subsystem (ice_cpi.c / ice_cpi.h) implementing the CPI
   request/acknowledge state machine, including REQ/ACK protocol,
   command execution, and response handling
 - helper functions for reading/writing PHY registers over Sideband
   Queue
 - CPI command execution API (ice_cpi_exec) and a helper for enabling or
   disabling Tx reference clocks (CPI 0xF1 opcode 'Config PHY clocking')
 - assurance of CPI transaction serialization into the CPI core.
   CPI REQ/ACK is a multi-step handshake    and must be executed
   atomically per PHY. Centralize the lock in ice_cpi_exec() and
   use adapter-scoped per-PHY mutexes, which match the hardware sharing
   model across PFs.
 - addition of the non-posted write opcode (wr_np) to SBQ
 - Makefile integration to build CPI support together with the PTP stack

This provides the infrastructure necessary to support PHY-side
configuration flows on E825C and is required for advanced link control
and Tx reference clock management.

Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-12-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoice: introduce TXC DPLL device and TX ref clock pin framework for E825
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:42 +0000 (20:30 +0200)] 
ice: introduce TXC DPLL device and TX ref clock pin framework for E825

E825 devices provide a dedicated TX clock (TXC) domain which may be
driven by multiple reference clock sources, including external board
references and port-derived SyncE. To support future TX clock control
and observability through the Linux DPLL subsystem, introduce a
separate TXC DPLL device (of DPLL_TYPE_GENERIC) and a framework for
representing TX reference clock inputs.

This change adds a new internal DPLL pin type (TXCLK) and registers
TX reference clock pins for E825-based devices:
- EXT_EREF0: a board-level external electrical reference
- SYNCE: a port-derived SyncE reference described via firmware nodes

The TXC DPLL device is created and managed alongside the existing
PPS and EEC DPLL instances. TXCLK pins are registered directly or
deferred via a notifier when backed by fwnode-described pins.
A per-pin attribute encodes the TX reference source associated with
each TXCLK pin.

At this stage, TXCLK pin state callbacks and TXC DPLL lock status
reporting are implemented as placeholders. Pin state getters always
return DISCONNECTED, and the TXC DPLL is initialized in the UNLOCKED
state. No hardware configuration or TX reference switching is
performed yet.

This patch establishes the structural groundwork required for
hardware-backed TX reference selection, verification, and
synchronization status reporting, which will be implemented in
subsequent patches.

Also signal dpll_init from the fwnode pin init error path so any
notifier worker already blocked on it can drain, avoiding a
flush_workqueue() deadlock during teardown.

Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-11-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agodpll: allow fwnode pins to attempt state change without capability bit
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:41 +0000 (20:30 +0200)] 
dpll: allow fwnode pins to attempt state change without capability bit

Pins registered with an fwnode may have .state_on_dpll_set implemented
without advertising DPLL_PIN_CAPABILITIES_STATE_CAN_CHANGE upfront.
Requiring the bit for fwnode pins ties firmware description to driver
implementation details unnecessarily.

Relax the capability check in dpll_pin_state_set() and
dpll_pin_on_pin_state_set(): when a pin has an associated fwnode, bypass
the capability gate and let the ops layer decide, returning -EOPNOTSUPP
if .state_on_dpll_set is absent. Non-fwnode pins retain the original
strict behavior.

This is used later in the series by the SyncE_Ref output pin, which
relies on the fwnode path for state control.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-10-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agodpll: extend pin notifier with notification source ID
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:40 +0000 (20:30 +0200)] 
dpll: extend pin notifier with notification source ID

Extend the DPLL pin notification API to include a source identifier
indicating where the notification originates. This allows notifier
consumers to distinguish between notifications coming from
an associated DPLL instance, a parent pin, or the pin itself.

A new field, src_clock_id, is added to struct dpll_pin_notifier_info
and is passed through all pin-related notification paths. Callers of
dpll_pin_notify() are updated to provide a meaningful source identifier
based on their context:
  - pin registration/unregistration uses the DPLL's clock_id,
  - pin-on-pin operations use the parent pin's clock_id,
  - pin changes use the pin's own clock_id.

As introduced in the commit ("dpll: allow registering FW-identified pin
with a different DPLL"), it is possible to share the same physical pin
via firmware description (fwnode) with DPLL objects from different
kernel modules. This means that a given pin can be registered multiple
times.

Driver such as ICE (E825 devices) rely on this mechanism when listening
for the event where a shared-fwnode pin appears, while avoiding reacting
to events triggered by their own registration logic.

This change only extends the notification metadata and does not alter
existing semantics for drivers that do not use the new field.

Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-9-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agodpll: balance create/delete notifications in __dpll_pin_(un)register
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:39 +0000 (20:30 +0200)] 
dpll: balance create/delete notifications in __dpll_pin_(un)register

__dpll_pin_register() emits dpll_pin_create_ntf() internally, but
__dpll_pin_unregister() left the matching delete to its callers. The
counts then diverge on dpll_pin_on_pin_register() rollback and on
dpll_pin_on_pin_unregister(), leaking stale notifications.

Emit dpll_pin_delete_ntf() inside __dpll_pin_unregister() and drop the
now-redundant call in dpll_pin_unregister().

Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-8-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agodpll: guard sync-pair removal on full pin unregister
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:38 +0000 (20:30 +0200)] 
dpll: guard sync-pair removal on full pin unregister

__dpll_pin_unregister() wiped the global sync-pair state on every
(dpll, ops, priv, cookie) tuple removed from a pin. When a pin is
registered multiple times and only one registration is being torn
down, this dropped sync-pair pairings still in use by the surviving
registrations.

Move dpll_pin_ref_sync_pair_del() inside the xa_empty(&pin->dpll_refs)
branch so it only runs when the last registration is gone, alongside
clearing the DPLL_REGISTERED mark.

Fixes: 58256a26bfb3 ("dpll: add reference sync get/set")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-7-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agodpll: emit per-dpll delete notifications in dpll_pin_on_pin_unregister()
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:37 +0000 (20:30 +0200)] 
dpll: emit per-dpll delete notifications in dpll_pin_on_pin_unregister()

dpll_pin_on_pin_register() emits a creation notification for every
parent->dpll_refs entry, but dpll_pin_on_pin_unregister() emitted only
one deletion notification outside the loop. When a pin is registered
against multiple parent dplls, userspace sees N creates but a single
delete and leaks per-dpll state.

Move dpll_pin_delete_ntf() into the loop and call it before
__dpll_pin_unregister() so the DPLL_REGISTERED mark is still set when
dpll_pin_available() is consulted.

Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-6-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agodpll: send delete notification before unregister in on-pin rollback
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:36 +0000 (20:30 +0200)] 
dpll: send delete notification before unregister in on-pin rollback

The rollback path in dpll_pin_on_pin_register() called
__dpll_pin_unregister() before dpll_pin_delete_ntf(). When the
unregister dropped the pin's last DPLL reference it cleared the
DPLL_REGISTERED mark in dpll_pin_xa, so the subsequent
dpll_pin_event_send() failed dpll_pin_available() and aborted with
-ENODEV. As a result userspace was never notified of the rollback
deletion and remained out of sync with the kernel.

Send the delete notification first, matching the order used by
dpll_pin_unregister() and dpll_pin_on_pin_unregister().

Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-5-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agodpll: fix stale iteration in dpll_pin_on_pin_unregister()
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:35 +0000 (20:30 +0200)] 
dpll: fix stale iteration in dpll_pin_on_pin_unregister()

Neither parent->dpll_refs nor pin->dpll_refs on its own is a correct
iteration target at unregister time:

  - pin->dpll_refs includes DPLLs the child was registered against
    via a different parent or directly; blind unregister WARNs on
    the cookie miss in dpll_xa_ref_pin_del().
  - parent->dpll_refs reflects the parent's current attachments, not
    those at child-register time. Another driver may have (un)reg'd
    the parent against additional DPLLs in the meantime, so we miss
    registrations that exist and visit DPLLs that have none.

Walk pin->dpll_refs and use dpll_pin_registration_find() to filter
to entries whose cookie is this parent. Symmetric with
dpll_pin_on_pin_register(), correct under any subsequent change to
parent->dpll_refs.

Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions")
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-4-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agodpll: allow registering FW-identified pin with a different DPLL
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:34 +0000 (20:30 +0200)] 
dpll: allow registering FW-identified pin with a different DPLL

Relax the (module, clock_id) equality requirement when registering a
pin identified by firmware (pin->fwnode). Some platforms associate a
FW-described pin with a DPLL instance that differs from the pin's
(module, clock_id) tuple. For such pins, permit registration without
requiring the strict match. Non-FW pins still require equality.

Keep netlink pin module reporting/filtering safe for this relaxed
registration model by caching the module name in the pin object at
allocation time and using the cached string in netlink paths.
This avoids dereferencing pin->module after provider module teardown.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-3-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agodpll: add generic DPLL type
Grzegorz Nitka [Sun, 7 Jun 2026 18:30:33 +0000 (20:30 +0200)] 
dpll: add generic DPLL type

Add DPLL_TYPE_GENERIC to represent DPLL devices which do not fit the
existing PPS or EEC classes.

The UAPI type is intentionally generic. During netdev discussion,
maintainers pointed out that introducing identifiers tied to a specific
placement or single design does not scale across ASICs and vendors.
The role of a DPLL is already inferable from the spawning driver,
bus device, and pin topology, without encoding additional
purpose-specific taxonomy in the type name.

Using a generic type keeps the UAPI extensible and avoids premature
naming that may become incorrect as new hardware topologies are
exposed through the DPLL subsystem.

Expose the new type through UAPI and netlink specification as "generic".

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Link: https://patch.msgid.link/20260607183045.1213735-2-grzegorz.nitka@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days agoMerge tag 'ipsec-next-2026-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Sat, 13 Jun 2026 20:16:38 +0000 (13:16 -0700)] 
Merge tag 'ipsec-next-2026-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2026-06-12

1) Replace the open-coded manual cleanup in xfrm_add_policy() error
   path with xfrm_policy_destroy() for consistency with
   xfrm_policy_construct().
   From Deepanshu Kartikey.

2) Limit XFRMA_TFCPAD to a sensible maximum (max IP length, 64k) since
   u32 is excessive for traffic flow confidentiality padding.
   From David Ahern.

3) Add a new netlink message XFRM_MSG_MIGRATE_STATE that
   allows migrating individual IPsec SAs independently of
   their policies. The existing XFRM_MSG_MIGRATE is tightly coupled
   to policy+SA migration, lacks SPI for unique SA identification,
   and cannot express reqid changes or migrate Transport mode
   selectors. The new interface identifies the SA via SPI and mark,
   supports reqid changes, address family changes, encap removal,
   and uses an atomic create+install flow under x->lock to prevent
   SN/IV reuse during AEAD SA migration.
   From Antony Antony.

* tag 'ipsec-next-2026-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: add documentation for XFRM_MSG_MIGRATE_STATE
  xfrm: restrict netlink attributes for XFRM_MSG_MIGRATE_STATE
  xfrm: add XFRM_MSG_MIGRATE_STATE for single SA migration
  xfrm: make xfrm_dev_state_add xuo parameter const
  xfrm: extract address family and selector validation helpers
  xfrm: refactor XFRMA_MTIMER_THRESH validation into a helper
  xfrm: move encap and xuo into struct xfrm_migrate
  xfrm: add error messages to state migration
  xfrm: add state synchronization after migration
  xfrm: check family before comparing addresses in migrate
  xfrm: split xfrm_state_migrate into create and install functions
  xfrm: rename reqid in xfrm_migrate
  xfrm: fix NAT-related field inheritance in SA migration
  xfrm: allow migration from UDP encapsulated to non-encapsulated ESP
  xfrm: add extack to xfrm_init_state
  xfrm: remove redundant assignments
  xfrm: Reject excessive values for XFRMA_TFCPAD
  xfrm: cleanup error path in xfrm_add_policy()
====================

Link: https://patch.msgid.link/20260612074725.1760473-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agonet: wwan: t7xx: check skb_clone in control TX
Ruoyu Wang [Fri, 12 Jun 2026 03:56:13 +0000 (11:56 +0800)] 
net: wwan: t7xx: check skb_clone in control TX

t7xx_port_ctrl_tx() clones each skb fragment before passing it to the
port transmit path. The clone is used immediately to set cloned->len, so
an skb_clone() failure results in a NULL pointer dereference.

Check the clone before using it. If previous fragments were already
queued, preserve the driver's existing partial-write behavior by
returning the number of bytes submitted so far.

Fixes: 36bd28c1cb0d ("wwan: core: Support slicing in port TX flow of WWAN subsystem")
Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20260612035613.1192486-1-ruoyuw560@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agoMerge branch 'vsock-consolidate-acceptq-accounting-into-core-helpers'
Jakub Kicinski [Sat, 13 Jun 2026 17:39:32 +0000 (10:39 -0700)] 
Merge branch 'vsock-consolidate-acceptq-accounting-into-core-helpers'

Raf Dickson says:

====================
vsock: consolidate acceptq accounting into core helpers

These patches follow up on commit c05fa14db43e
("vsock/vmci: fix sk_ack_backlog leak on failed handshake")
by consolidating sk_acceptq_added() and sk_acceptq_removed() into
the core vsock helpers so transports cannot forget them.

Link: https://lore.kernel.org/netdev/20260611021317.69362-1-rafdog35@gmail.com/
====================

Link: https://patch.msgid.link/20260612045216.105796-1-rafdog35@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agovsock: fold sk_acceptq_removed() into vsock_remove_pending()
Raf Dickson [Fri, 12 Jun 2026 04:52:16 +0000 (04:52 +0000)] 
vsock: fold sk_acceptq_removed() into vsock_remove_pending()

Callers of vsock_remove_pending() must also call sk_acceptq_removed()
to keep sk_ack_backlog consistent. Move the call into
vsock_remove_pending() itself to make it automatic and prevent future
callers from forgetting it.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Raf Dickson <rafdog35@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260612045216.105796-5-rafdog35@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agovsock: fold sk_acceptq_added() into vsock_enqueue_accept()
Raf Dickson [Fri, 12 Jun 2026 04:52:15 +0000 (04:52 +0000)] 
vsock: fold sk_acceptq_added() into vsock_enqueue_accept()

virtio and hyperv call sk_acceptq_added() immediately before
vsock_enqueue_accept(). Move the call into vsock_enqueue_accept()
itself so callers cannot forget it and the accounting is consistent.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Raf Dickson <rafdog35@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260612045216.105796-4-rafdog35@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agovsock: fold sk_acceptq_added() into vsock_add_pending()
Raf Dickson [Fri, 12 Jun 2026 04:52:14 +0000 (04:52 +0000)] 
vsock: fold sk_acceptq_added() into vsock_add_pending()

Move sk_acceptq_added() into vsock_add_pending() so callers cannot
forget it. vmci is the only transport using the pending list and
is updated accordingly.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Raf Dickson <rafdog35@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260612045216.105796-3-rafdog35@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agovsock: introduce vsock_pending_to_accept() helper
Raf Dickson [Fri, 12 Jun 2026 04:52:13 +0000 (04:52 +0000)] 
vsock: introduce vsock_pending_to_accept() helper

Add vsock_pending_to_accept() to move a socket directly from the
pending list to the accept queue in a single operation, avoiding
the sock_put/sock_hold dance and the sk_acceptq_removed()/
sk_acceptq_added() pair that would otherwise be needed when
calling vsock_remove_pending() followed by vsock_enqueue_accept().

Use it in vmci_transport_recv_connecting_server() where a completed
handshake transitions the socket from pending to accept queue.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Raf Dickson <rafdog35@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260612045216.105796-2-rafdog35@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agovsock: use sk_acceptq_is_full() helper in all transports
Raf Dickson [Fri, 12 Jun 2026 04:58:42 +0000 (04:58 +0000)] 
vsock: use sk_acceptq_is_full() helper in all transports

Replace the open-coded backlog check with sk_acceptq_is_full().
The helper uses > instead of >=, which is the correct comparison
per commit 64a146513f8f ("[NET]: Revert incorrect accept queue
backlog changes."), and adds READ_ONCE() for proper memory ordering.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Raf Dickson <rafdog35@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://patch.msgid.link/20260612045842.122207-1-rafdog35@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agonet: ethernet: mtk_wed: debugfs: correct index in wed_amsdu_show()
Wentao Guan [Fri, 12 Jun 2026 06:45:01 +0000 (14:45 +0800)] 
net: ethernet: mtk_wed: debugfs: correct index in wed_amsdu_show()

WED_MON_AMSDU_ENG_CNT point to different entry by 'base+n*offset' mode,
correct the wed amsdu entry number in wed_amsdu_show().

Fixes: 3f3de094e8342 ("net: ethernet: mtk_wed: debugfs: add WED 3.0 debugfs entries")
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Link: https://patch.msgid.link/20260612064501.203058-1-guanwentao@uniontech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agoMerge branch 'netdevsim-add-fake-ft-cls_flower-offload'
Jakub Kicinski [Sat, 13 Jun 2026 17:29:40 +0000 (10:29 -0700)] 
Merge branch 'netdevsim-add-fake-ft-cls_flower-offload'

Florian Westphal says:

====================
netdevsim: add fake FT/CLS_FLOWER offload

v2: fix up error reporting via extack
    shellcheck cleanups
    sort config toggles

1) Enable nf_tables offload control plane testing in netdevsim. Tag
   existing offload fn to allow error injection for testing rollback and abort
   logic.

2) Add nft_offload selftest to exercise the control plane and error
   unwind via fault injection.
====================

Link: https://patch.msgid.link/20260612092209.11966-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agoselftests: netfilter: add phony nft_offload test
Florian Westphal [Fri, 12 Jun 2026 09:22:09 +0000 (11:22 +0200)] 
selftests: netfilter: add phony nft_offload test

... "phony", because its not testing offloads, it tests the control
plane code.  Also test error unwind via fault injection framework.

For a proper test, real hardware would be required given we'd have
check if 'previously handed off to hardware' offload commands are
properly removed again on failure or rule flush.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260612092209.11966-3-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agonetdevsim: tc: allow to test nf_tables offload control plane code
Florian Westphal [Fri, 12 Jun 2026 09:22:08 +0000 (11:22 +0200)] 
netdevsim: tc: allow to test nf_tables offload control plane code

The actual 'offload' is phony, all commands are ignored: this is only
useful to test control plane code.

Tag the existing callback to permit error injection to test rollback/abort
code in nf_tables.  This is also for fuzzers - the fault injection
framework allows probabilistic error insertion.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260612092209.11966-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agonet: airoha: Fix error handling in airoha_ppe_flush_sram_entries()
Wayen.Yan [Fri, 12 Jun 2026 09:37:00 +0000 (17:37 +0800)] 
net: airoha: Fix error handling in airoha_ppe_flush_sram_entries()

In airoha_ppe_flush_sram_entries(), the outer "err" variable was never
updated when the inner loop variable shadowed it, causing the function
to always return 0 even when airoha_ppe_foe_commit_sram_entry() fails.

Drop the outer "err" variable and return directly on error, propagating
the error code from airoha_ppe_foe_commit_sram_entry() correctly.

Fixes: 620d7b91aadb ("net: airoha: ppe: Flush PPE SRAM table during PPE setup")
Link: https://lore.kernel.org/netdev/6a2b40e4.4dd82583.3a5c46.e52f@mx.google.com/
Signed-off-by: Wayen.Yan <win847@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/6a2bd37a.4034e349.1b41bb.1caf@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agoMAINTAINERS: Update Coly Li's email address
Coly Li [Sat, 13 Jun 2026 15:04:58 +0000 (23:04 +0800)] 
MAINTAINERS: Update Coly Li's email address

I switch to colyli@fygo.io as my current email address.

Signed-off-by: Coly Li <colyli@fygo.io>
Link: https://patch.msgid.link/20260613150458.682707-1-colyli@fygo.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 days agoMerge tag 'core-urgent-2026-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Jun 2026 15:23:36 +0000 (08:23 -0700)] 
Merge tag 'core-urgent-2026-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull debugobjects fix from Ingo Molnar:

 - Fix potential debugobjects deadlock on PREEMPT_RT kernels (Waiman
   Long)

* tag 'core-urgent-2026-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  debugobjects: Don't call fill_pool() in early boot hardirq context

6 days agoMerge tag 'i2c-for-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 13 Jun 2026 15:14:17 +0000 (08:14 -0700)] 
Merge tag 'i2c-for-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "The biggest news here is that this is my last pull request as I2C
  maintainer after 13.5 years. Starting with the 7.2 cycle, Andi Shyti
  is taking over who helped me greatly maintaining the host drivers for
  a while now. Thank you, Andi, and good luck with the subsystem. I'll
  be around for help, of course.

  Technically, there are two patches which might be a tad large for this
  late cycle, but most of them is explaining comments, so I think they
  are suitable.

   - MAINTAINERS:
      - hand over I2C maintainership to Andi
      - minor updates

   - rust: fix I2cAdapter refcount double increment

   - imx: keep clock and pinctrl states consistent in runtime PM

   - imx-lpi2c: fix DMA resource leaks on PIO fallback

   - qcom-cci: fix NULL pointer dereference on remove

   - riic: fix reset refcount leak on resume_noirq error path

   - stm32f7: account for analog filter in timing computation

   - tegra:
      - fix suspend/resume handling in NOIRQ phase
      - update Tegra410 I2C timings to match hardware specs"

* tag 'i2c-for-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  dt-bindings: i2c: mux-gpio: name correct maintainer
  MAINTAINERS: hand over I2C to Andi Shyti
  i2c: imx-lpi2c: fix resource leaks switching to devm_dma_request_chan()
  MAINTAINERS: i2c: designware: Remove inactive reviewer
  i2c: tegra: Fix NOIRQ suspend/resume
  i2c: tegra: Update Tegra410 I2C timing parameters
  i2c: qcom-cci: Fix NULL pointer dereference in cci_remove()
  i2c: stm32f7: fix timing computation ignoring i2c-analog-filter
  i2c: imx: fix clock and pinctrl state inconsistency in runtime PM
  i2c: riic: fix refcount leak in riic_i2c_resume_noirq()
  rust: i2c: fix I2cAdapter refcounts double increment

6 days agoMerge tag 'timers-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/daniel...
Thomas Gleixner [Sat, 13 Jun 2026 14:24:29 +0000 (16:24 +0200)] 
Merge tag 'timers-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/daniel.lezcano/linux into timers/clocksource

Pull clocksource/driver updates from Daniel Lezcano:

  - Remove the sifive,fine-ctr-bits property bindings because it is a
    redundant information (Nick Hu)

  - Remove the TCIU8 interrupt bindings on Renesas because it should not
    be described as the documentation marked reserved and fix the
    conditional reset line for the RZ/{T2H,N2H} (Cosmin Tanislav)

  - Add the StarFive JHB100 clint DT bindings compatible string (Ley
    Foon Tan)

  - Extend schema condition for interrupts to cover D1 compatible
    variant an add the D1 hstimer support (Michal Piekos)

  - Update the ARM architected timer support to handle the ACPI GTDT v3
    format and the EL2 virtual timer, enabling Linux to use the most
    appropriate timer when running with VHE, while also fixing several
    Device Trees to accurately reflect the underlying hardware (Marc
    Zyngier)

  - Cleanup and add the clocksource and the clockevent in the TI DM
    timer (Markus Schneider-Pargmann)

  - Add the multiple watchdogs support in the tegra186 and
    tegra234. Dedicate one as a kernel watchdog (Kartik Rajput)

  - Add the NXP clocksource selection for the scheduler in the Kconfig
    (Enric Balletbo i Serra)

Link: https://lore.kernel.org/all/1e55e8d6-8024-4f17-8620-ab3385465d76@oss.qualcomm.com
6 days agoposix-cpu-timers: Fix pid refcount leak in do_cpu_nanosleep() error path
WenTao Liang [Thu, 11 Jun 2026 16:17:38 +0000 (00:17 +0800)] 
posix-cpu-timers: Fix pid refcount leak in do_cpu_nanosleep() error path

In do_cpu_nanosleep(), posix_cpu_timer_create() takes a pid reference
via get_pid() and stores it in timer.it.cpu.pid. If the subsequent
posix_cpu_timer_set() call fails, the function returns immediately
without calling posix_cpu_timer_del() to release the pid reference,
causing a leak.

Fix it by calling posix_cpu_timer_del() before the unlock-and-return
on the error path, consistent with the other exit paths in the same
function.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: WenTao Liang <vulab@iscas.ac.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260611161738.97043-1-vulab@iscas.ac.cn
6 days agox86/irq: Add missing 's' back to thermal event printout
Thomas Gleixner [Sat, 13 Jun 2026 13:31:03 +0000 (15:31 +0200)] 
x86/irq: Add missing 's' back to thermal event printout

The /proc/interrupt handling rework dropped a 's' in the thermal event
printout, which breaks the thermal test in the Intel LKVS suite.

Bring the important letter back.

Fixes: 2b57c69917ee ("x86/irq: Make irqstats array based")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Closes: https://lore.kernel.org/oe-lkp/202606121325.97b29701-lkp@intel.com
6 days agotime/jiffies: Register jiffies clocksource before usage
Thomas Gleixner [Tue, 9 Jun 2026 15:14:45 +0000 (17:14 +0200)] 
time/jiffies: Register jiffies clocksource before usage

Teddy reported that a XEN HVM has a long boot delay, which was bisected to
the recent enhancements to the negative motion detection. It turned out
that the jiffies clocksource is used in early boot before it is registered,
which leaves the max_delta_raw field at zero. That causes the read out to
be clamped to the max delta of 0, which means time is not making progress.

Cure it by ensuring that it is initialized before its first usage in
timekeeping_init().

Fixes: 76031d9536a0 ("clocksource: Make negative motion detection more robust")
Reported-by: Teddy Astie <teddy.astie@vates.tech>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Teddy Astie <teddy.astie@vates.tech>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/87y0gn3fve.ffs@fw13
Closes: https://lore.kernel.org/all/1780914594.8631fc262581453bbf619ec5b2062170.19ea6c8227b000701b@vates.tech
6 days agohwmon: tmp401: Read "ti,n-factor" as signed
Rob Herring (Arm) [Fri, 12 Jun 2026 21:53:32 +0000 (16:53 -0500)] 
hwmon: tmp401: Read "ti,n-factor" as signed

The "ti,n-factor" binding and examples allow negative correction
values. Reading it as u32 makes the helper type disagree with the
documented signed value and hides real schema mismatches.

Use the signed helper so the DT access matches the s32 value stored by
the driver.

Assisted-by: Codex:gpt-5-5
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20260612215332.1889497-1-robh@kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
6 days agoio_uring/bpf-ops: add a separate maintainer entry
Pavel Begunkov [Fri, 12 Jun 2026 17:36:22 +0000 (18:36 +0100)] 
io_uring/bpf-ops: add a separate maintainer entry

Add a maintainer entry for io_uring bpf struct_ops related files.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/d89f3b89e77b09a18daa45476fd1a40f2ee253cd.1780930463.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 days agoblock: check bio split for unaligned bvec
Keith Busch [Fri, 12 Jun 2026 22:32:04 +0000 (15:32 -0700)] 
block: check bio split for unaligned bvec

Offsets and lengths need to be validated against the dma alignment. This
check was skipped for sufficiently a small bio with a single bvec, which
may allow an invalid request dispatched to the driver. Force the
validation for an unaligned bvec by forcing the bio split path that
handles this condition.

Fixes: 7eac33186957 ("iomap: simplify direct io validity check")
Fixes: 5ff3f74e145a ("block: simplify direct io validity check")
Reported-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://patch.msgid.link/20260612223205.465913-1-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 days agonbd: Reclassify sockets to avoid lockdep circular dependency
Eric Dumazet [Sat, 13 Jun 2026 04:26:19 +0000 (04:26 +0000)] 
nbd: Reclassify sockets to avoid lockdep circular dependency

syzbot reported a possible circular locking dependency in udp_sendmsg()
where fs_reclaim can be triggered while holding sk_lock, and fs_reclaim
can eventually depend on another sk_lock (e.g., if NBD is used for swap
or writeback and NBD uses TLS/TCP which acquires sk_lock).

Since the UDP socket and the NBD TCP/TLS socket are different, this is a
false positive. Fix this by reclassifying NBD sockets to a separate lock
class when they are added to the NBD device.

This is similar to what nvme-tcp and other network block devices do.

Fixes: ffa1e7ada456 ("block: Make request_queue lockdep splats show up earlier")
Reported-by: syzbot+607cdcf978b3e79da878@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6a2cdafe.428ffe26.258b27.0161.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260613042619.1108126-1-edumazet@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 days agoio_uring/net: make POLL_FIRST receive side checks consistent
Jens Axboe [Sat, 30 May 2026 02:03:47 +0000 (20:03 -0600)] 
io_uring/net: make POLL_FIRST receive side checks consistent

io_recv() and io_recvzc() are the odd ones out, as they checks for
whether POLL_FIRST should be honored before checking if the file is a
socket. It doesn't really matter, but might as well make it consistent
across all receive and send types.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 days agoio_uring: remove the per-ctx fallback task_work machinery
Jens Axboe [Thu, 11 Jun 2026 17:44:47 +0000 (11:44 -0600)] 
io_uring: remove the per-ctx fallback task_work machinery

With the tctx fallback running its entries directly, the per-ctx
fallback work has a single user left: moving local (DEFER_TASKRUN)
task_work entries out of a ring that is going away. Both of its call
sites are process context and don't hold ->uring_lock, the same
conditions the deferred fallback work itself ran under - so run the
entries in cancel mode right there instead, and rename the helper to
io_cancel_local_task_work() to match what it now does.

With that, ->fallback_llist, ->fallback_work, io_fallback_req_func()
and __io_fallback_tw() can all go away, along with the fallback work
flushing in the ring exit and cancel paths. Requests that get
orphaned by an exiting task now run via the tctx fallback work, which
the ring exit side implicitly waits on through the ctx refs those
requests hold.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 days agoio_uring: run the tctx task_work fallback directly
Jens Axboe [Thu, 11 Jun 2026 17:41:25 +0000 (11:41 -0600)] 
io_uring: run the tctx task_work fallback directly

The fallback work drains the tctx queue only to redistribute the entries
into the per-ctx fallback lists, bouncing them through a second
(per-ctx) work item before they finally run. That made sense when the
producer side did the draining and could be in any context, but the
fallback work is a regular process context kworker: it can just run the
entries itself. Reuse the normal run loop - if run from the fallback
kernel thread, ts.cancel will get set, and the work terminated.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 days agoio_uring: switch normal task_work to a mpscq
Jens Axboe [Thu, 11 Jun 2026 16:13:22 +0000 (10:13 -0600)] 
io_uring: switch normal task_work to a mpscq

Like the local task_work list, the normal (tctx) task_work list is an
llist, and hence needs the O(n) llist_reverse_order() pass before
running entries in queue order. On top of that, capped runs - sqpoll
processing IORING_TW_CAP_ENTRIES_VALUE entries at a time - need the
claimed-but-unprocessed leftovers carried in a separate retry_list,
as they can't be pushed back to the shared list.

Switch tctx->task_list to a mpscq, like what was done for the
DEFER_TASKRUN paths as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 days agoio_uring: switch local task_work to a mpscq
Jens Axboe [Wed, 10 Jun 2026 21:19:35 +0000 (15:19 -0600)] 
io_uring: switch local task_work to a mpscq

The local (DEFER_TASKRUN) task_work list is an llist, which is LIFO
ordered, and hence __io_run_local_work() has to restore the right
running order with an O(n) llist_reverse_order() pass first. On top of
that, a batch that gets capped by max_events needs the leftover entries
parked on a separate ->retry_llist, as they can't be pushed back to the
shared list.

Switch it to the FIFO mpscq. Adds are wait-free instead of a cmpxchg
retry loop, entries are popped in queue order with no reversal pass,
capping a run simply leaves the remainder on the queue, and
->retry_llist goes away entirely. The consumer cursor, ->work_head,
lives with the rest of the ->uring_lock protected state rather than
next to the queue, so that popping entries doesn't dirty the producer
side cacheline.

For low amounts of task_work, this ends up being a bit more efficient
than the existing scheme. As an example of that, doing multishot
receives for 8 clients has the following task_work overhead:

     1.02%  sock-test  [kernel.kallsyms]  [k] io_req_local_work_add
     0.88%  sock-test  [kernel.kallsyms]  [k] __io_run_local_work_loop
     0.60%  sock-test  [kernel.kallsyms]  [k] llist_reverse_order
     0.14%  sock-test  [kernel.kallsyms]  [k] __io_run_local_work
     2.64% at ~46Gb/sec

and after this change:

     1.08%  sock-test  [kernel.kallsyms]  [k] io_req_local_work_add
     1.03%  sock-test  [kernel.kallsyms]  [k] __io_run_local_work
     2.11% at ~53Gb/sec

which has less overhead even though that test run was faster. For a case
of having 1024 clients on a single ring:

     2.22%  sock-test  [kernel.kallsyms]  [k] llist_reverse_order
     0.84%  sock-test  [kernel.kallsyms]  [k] __io_run_local_work_loop
     0.42%  sock-test  [kernel.kallsyms]  [k] io_req_local_work_add
     0.02%  sock-test  [kernel.kallsyms]  [k] __io_run_local_work
     3.50% at ~24Gb/sec

we start to see the llist reversing taking a considerable amount of
time, and the total add+run task_work overhead is around 3.5%. After
the change:

     0.90%  sock-test  [kernel.kallsyms]  [k] __io_run_local_work
     0.42%  sock-test  [kernel.kallsyms]  [k] io_req_local_work_add
     1.32% at ~26Gb/sec

most of that overhead is gone, and performance is better as well.

Caleb Sander Mateos <csander@purestorage.com> reports that it improves
the performance of a ublk 4kb workload by 4% [1], while testing v1 of
this patchset.

[1] https://lore.kernel.org/io-uring/CADUfDZr-MMYBaP-e+y9+xuRhuiunO2sBTUCmwZyd7AgT8sVtiQ@mail.gmail.com/

Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 days agoio_uring/mpscq: add lockless multi-producer, single-consumer FIFO queue
Jens Axboe [Wed, 10 Jun 2026 21:19:15 +0000 (15:19 -0600)] 
io_uring/mpscq: add lockless multi-producer, single-consumer FIFO queue

Local task_work is currently using llists for managing the work,
but that's a LIFO type of list. This means that running this task_work
needs to reverse the list first, to ensure fairness in running the
queued items.

Add a lockless FIFO queued, based on Dmitry Vyukov's intrusive MPSC
node-based queue algorithm, modified with an externally held consumer
cursor and conditional stub reinsertion. See comments in the header.

Producers are wait-free: a push is a single xchg() on the queue tail,
which serializes concurrent producers and defines the FIFO order, plus
a store linking the node to its predecessor. There are no cmpxchg retry
loops, and pushing is safe from any context, including hardirq.

The cost of linked list FIFO ordering is that a push publishes the node
in two steps - the xchg() makes it visible as the new tail before the
subsequent store links it into the chain that is reachable from the
head. A consumer hitting that window gets a NULL from mpscq_pop() while
mpscq_empty() reports false, and must retry later rather than treat the
queue as empty. The window is two instructions wide, but a producer can
get preempted inside it, so the consumer must not busy wait on it.

The consumer side supports a single consumer at a time, with callers
providing their own serialization. A stub node, which also defines the
empty state (tail == stub), allows the consumer to detach the final
node without racing against producer link stores: that node is only
handed out once the stub has been cmpxchg'ed back in as the tail. This
also guarantees that the previous tail returned by mpscq_push() cannot
get freed before that push has linked it, making it always valid for
comparisons.

The consumer cursor is deliberately not part of the queue struct - the
caller owns it and passes it to mpscq_pop(). This is done to separate
the consumer and producers cacheline. The cursor is written for every
popped entry, and keeping it on the same cacheline as ->tail would have
the consumer invalidating the line that producers need for every push.
Keeping it external lets the caller place it with its own consumer side
data instead.

Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 days agoio_uring: grab RCU read lock marking task run
Jens Axboe [Fri, 12 Jun 2026 02:27:22 +0000 (20:27 -0600)] 
io_uring: grab RCU read lock marking task run

Not required right now, as io_req_local_work_add() already calls this
helper with the RCU read lock held. But in preparation for that not
being the case, grab it locally.

Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 days agoMerge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Paolo Abeni [Sat, 13 Jun 2026 09:50:31 +0000 (11:50 +0200)] 
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2026-06-09 (idpf, ixgbe, igc)

Przemyslaw adds needed padding to idpf PTP structures to match firmware
expectations.

Larysa bypasses XPS configuration on XDP queues for ixgbe.

Khai Wen corrects offset into packet buffer when handling for frame
preemption on igc.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: skip RX timestamp header for frame preemption verification
  ixgbe: do not configure xps for XDP queues
  idpf: add padding to PTP virtchnl structures
====================

Link: https://patch.msgid.link/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 days agoocteontx2-af: npc: Fix size of entry2cntr_map
Ratheesh Kannoth [Wed, 10 Jun 2026 02:23:44 +0000 (07:53 +0530)] 
octeontx2-af: npc: Fix size of entry2cntr_map

KASAN prints below splat. This is caused by allocating counter for
reserved mcam entry for cpt 2nd pass entry. But mcam->entry2cntr_map
is not allocated for reserved entries.

BUG: KASAN: slab-out-of-bounds in npc_map_mcam_entry_and_cntr+0xb0/0x1a0
Write of size 2 at addr ffff0001033e7ffe by task kworker/0:1/14

CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted 6.1.67 #1
Hardware name: Marvell CN106XX board (DT)
Workqueue: events work_for_cpu_fn
Call trace:
 dump_backtrace.part.0+0xe4/0xf0
 show_stack+0x18/0x30
 dump_stack_lvl+0x88/0xb4
 print_report+0x154/0x458
 kasan_report+0xb8/0x194
 __asan_store2+0x7c/0xa0
 npc_map_mcam_entry_and_cntr+0xb0/0x1a0
 rvu_mbox_handler_npc_mcam_write_entry+0x268/0x280
 npc_install_flow+0x840/0xfe0
 rvu_npc_install_cpt_pass2_entry+0x138/0x190
 rvu_nix_init+0x148c/0x2880
 rvu_probe+0x1800/0x30b0
 local_pci_probe+0x78/0xe0
 work_for_cpu_fn+0x30/0x50
 process_one_work+0x4cc/0x97c
 worker_thread+0x360/0x630
 kthread+0x1a0/0x1b0
 ret_from_fork+0x10/0x20

Fixes: 55307fcb9258 ("octeontx2-af: Add mbox messages to install and delete MCAM rules")
Cc: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260610022344.969774-1-rkannoth@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 days agoselftests/bpf: Add arena direct-value one-past-end reject test
Woojin Ji [Fri, 12 Jun 2026 05:26:55 +0000 (14:26 +0900)] 
selftests/bpf: Add arena direct-value one-past-end reject test

BPF_MAP_TYPE_ARENA supports direct-value pseudo loads, but unlike array
maps its map value_size is zero and the valid direct-value range is the
arena mmap size, max_entries * PAGE_SIZE.

Commit 3ac1a467e376 ("bpf: Fix off-by-one boundary validation in arena
direct-value access") fixed arena_map_direct_value_addr() to reject an
offset exactly at the end of the arena mapping. Add a regression test
that loads a BPF_PSEUDO_MAP_VALUE with off == arena_size and verifies
that the verifier rejects it with the expected offset in the log.

This is intentionally kept as a userspace raw-instruction test. I tried
expressing the same BPF_PSEUDO_MAP_VALUE + off == arena_size case in
verifier_arena.c with inline assembly. The only form that produces the
desired instruction bytes uses __imm_addr(arena), but that emits
R_BPF_64_NODYLD32, which the libbpf/bpftool link step rejects. Other
register, immediate, and memory constraints either fail in the BPF
backend or lower to a normal R_BPF_64_64 load followed by an ALU add,
which does not exercise arena_map_direct_value_addr() with the boundary
offset in the second ldimm64 slot.

A legacy test_verifier fixture can express the raw instruction directly,
but it needs arena map creation, mmap, and fixup plumbing in the legacy
runner. That is more intrusive than the small prog_tests raw-instruction
test.

Use the userspace raw-instruction test, following the existing selftests
pattern used for direct map-value pseudo loads, so insns[1].imm can be
set to arena_size precisely.

Assisted-by: ChatGPT:gpt-5.5
Signed-off-by: Woojin Ji <random6.xyz@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Cc: Emil Tsalapatis <emil@etsalapatis.com>
Cc: Junyoung Jang <graypanda.inzag@gmail.com>
Link: https://lore.kernel.org/r/20260612-arena-direct-value-v1-v4-1-b81b642f5277@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 days agorqspinlock: Fix order in raw_res_spin_(un)lock_irq to allow schedule
Gabriele Monaco [Wed, 10 Jun 2026 09:04:29 +0000 (11:04 +0200)] 
rqspinlock: Fix order in raw_res_spin_(un)lock_irq to allow schedule

raw_res_spin_unlock_irqrestore() calls raw_res_spin_unlock() and then
restores interrupts, this means preemption is enabled when interrupts
are still disabled (as part of raw_res_spin_unlock()) so this cannot
trigger an actual preemption.
This is inconsistent with other spinlock implementations
(raw_spin_unlock_irqrestore() and bpf_res_spin_unlock_irqrestore()
itself).

Adjust the macro to ensure interrupts are enabled before enabling
preemption, allowing to schedule at that point. Make the same
modification in the error path of raw_res_spin_lock_irqsave().

Fixes: 101acd2e78b1 ("rqspinlock: Add macros for rqspinlock usage")
Cc: stable@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de> # asm-generic
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20260610090431.32427-1-gmonaco@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 days agoMerge branch 'bpf-fix-setting-retval-to-eperm-for-cgroup-hooks-not-returning-errno'
Alexei Starovoitov [Sat, 13 Jun 2026 03:33:16 +0000 (20:33 -0700)] 
Merge branch 'bpf-fix-setting-retval-to-eperm-for-cgroup-hooks-not-returning-errno'

Xu Kuohai says:

====================
bpf: Fix setting retval to -EPERM for cgroup hooks not returning errno

This series fixes the issue reported by sashiko in [1]. The issue is that,
when a cgroup BPF program exits with 0, bpf_prog_run_array_cg() sets
the hook return value to -EPERM if it is not a valid errno. This is
correct for errno-based hooks, which return 0 on success and negative
errno on failure, but wrong for void and boolean LSM hooks. Boolean
LSM hooks should only return true or false, and void LSM hooks have
no return value at all.

Fix it by skipping setting -EPERM for hooks not returning errno.

[1] https://lore.kernel.org/bpf/20260605144232.95A141F00893@smtp.kernel.org/
====================

Link: https://patch.msgid.link/20260610201724.733943-1-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 days agoselftests/bpf: Add retval test for bool and errno LSM cgroup hooks
Xu Kuohai [Wed, 10 Jun 2026 20:17:24 +0000 (20:17 +0000)] 
selftests/bpf: Add retval test for bool and errno LSM cgroup hooks

Add test to check the return value when a BPF program exits with 0 for
a boolean and an errno LSM hook.

For each hook, two BPF programs are attached. The first program returns
0 without calling bpf_set_retval() to exercise the return value translation
logic, while the second program reads the retval via bpf_get_retval().

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20260610201724.733943-3-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 days agobpf: Fix setting retval to -EPERM for cgroup hooks not returning errno
Xu Kuohai [Wed, 10 Jun 2026 20:17:23 +0000 (20:17 +0000)] 
bpf: Fix setting retval to -EPERM for cgroup hooks not returning errno

When a cgroup BPF program exits with 0, bpf_prog_run_array_cg() sets
the hook return value to -EPERM if it is not a valid errno. This is
correct for errno-based hooks, which return 0 on success and negative
errno on failure, but wrong for boolean and void LSM hooks. Boolean
LSM hooks should only return true or false, and void LSM hooks have
no return value at all.

Fix it by skipping setting -EPERM for hooks not returning errno.

Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20260610201724.733943-2-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 days agonet: qrtr: fix 32-bit integer overflow in qrtr_endpoint_post()
Michael Bommarito [Thu, 11 Jun 2026 12:54:55 +0000 (08:54 -0400)] 
net: qrtr: fix 32-bit integer overflow in qrtr_endpoint_post()

qrtr_endpoint_post() validates an incoming packet with

if (!size || len != ALIGN(size, 4) + hdrlen)
goto err;

where size comes from the wire. On 32-bit, size_t is 32 bits and
ALIGN(size, 4) wraps to 0 for size >= 0xfffffffd, so the check
passes and skb_put_data(skb, data + hdrlen, size) writes past the
hdrlen-sized skb and oopses the kernel. 64-bit is unaffected.

This is the 32-bit residual of ad9d24c9429e2 ("net: qrtr: fix OOB
Read in qrtr_endpoint_post"), which fixed only the 64-bit case.

Reject any size that cannot fit the buffer before the ALIGN.

Fixes: ad9d24c9429e2 ("net: qrtr: fix OOB Read in qrtr_endpoint_post")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260611125455.2352279-1-michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agonet/mlx5: Check max_macs devlink param value against max capability
Dragos Tatulea [Thu, 11 Jun 2026 13:52:30 +0000 (16:52 +0300)] 
net/mlx5: Check max_macs devlink param value against max capability

The max_macs devlink param is checked against the FW max value only at
param register time (driver load) and inside the validate callback
(devlink param set). The stored DRIVERINIT value persists across FW
resets and devlink reloads without any further checks against the max.

If the FW link type changes from Ethernet to IB and a FW reset happens,
the MAX cap for log_max_current_uc_list will become zero, but the
previously stored max_macs value remains and is unconditionally
programmed into the HCA caps in handle_hca_cap(). FW will then return a
syndrome during SET_HCA_CAP:

 mlx5_cmd_out_err:839:(pid 3831): SET_HCA_CAP(0x109) op_mod(0x0) failed,
 status bad parameter(0x3), syndrome (0x537801), err(-22)
 set_hca_cap:907:(pid 3831): handle_hca_cap failed

This results in a failure to register the RDMA device.

This patch skips programming log_max_current_uc_list when the MAX
capability is 0 (in case of IB).

Fixes: 8680a60fc1fc ("net/mlx5: Let user configure max_macs generic param")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260611135230.534513-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agoMerge branch 'psp-add-support-for-dev-assoc-disassoc'
Jakub Kicinski [Sat, 13 Jun 2026 01:31:35 +0000 (18:31 -0700)] 
Merge branch 'psp-add-support-for-dev-assoc-disassoc'

Wei Wang says:

====================
psp: Add support for dev-assoc/disassoc

The main purpose of this feature is to associate virtual devices like
veth or netkit with a real PSP device, so we could provide PSP
functionality to the application running with virtual devices.

A typical deployment that works with this feature is as follows:
     Host Namespace:
     psp_dev_local  ←──physically linked──→ psp_dev_peer
  (PSP device)
       │
       │ BPF on psp_dev_local ingress: bpf_redirect_peer() to nk_guest
       │
  nk_host / veth_host
       │
       │ BPF on nk_host ingress: bpf_redirect_neigh() to psp_dev_local
       │
      Guest Namespace (netns):
       │
  nk_guest / veth_guest
  ★ PSP application run here

      Remote Namespace (_netns):
  psp_dev_peer
  ★ PSP server application runs here

Note:
The general requirement for this feature to work:
For PSP to work correctly, the egress device at validate_xmit_skb()
time must have psp_dev matching the association's psd. Any device
stacking or traffic redirection that changes the egress device will
cause either:
1. TX validation failure (SKB_DROP_REASON_PSP_OUTPUT) - fail-safe
2. RX policy failure after tx-assoc - packets without PSP extension
   are rejected by receiver expecting encrypted traffic

Here are a few examples that this feature would not work:
- Bonding with load balancing in round-robin, XOR, 802.3ad mode across
  multiple PSP devices, or mixed PSP and non-PSP devices
- Bonding with active-backup mode might work without PSP migration for
  failover case.
- ipvlan/macvlan in bridge mode would not work given packets are
  loopbacked locally without going through the PSP device.
====================

Link: https://patch.msgid.link/20260608233118.2694144-1-weibunny.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agoselftests/net: psp: add dev-get, no-nsid, and cleanup tests
Wei Wang [Mon, 8 Jun 2026 23:31:18 +0000 (16:31 -0700)] 
selftests/net: psp: add dev-get, no-nsid, and cleanup tests

Add the following 3 tests:

- _psp_dev_get_check_netkit_psp_assoc: verifies dev-get output in both
  host and guest namespaces, checking assoc-list, by-association flag,
  and nsid values
- _dev_assoc_no_nsid: tests dev-assoc and dev-disassoc without the nsid
  attribute, verifying ifindex lookup in the caller's namespace
- _psp_dev_assoc_cleanup_on_netkit_del: verifies that deleting the
  associated netkit interface properly cleans up the assoc-list, using
  a disposable netkit pair to avoid disturbing the shared environment

Signed-off-by: Wei Wang <weibunny@fb.com>
Link: https://patch.msgid.link/20260608233118.2694144-11-weibunny.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agoselftests/net: psp: add cross-namespace notification tests
Wei Wang [Mon, 8 Jun 2026 23:31:17 +0000 (16:31 -0700)] 
selftests/net: psp: add cross-namespace notification tests

Add tests that verify PSP notifications are delivered to listeners in
associated namespaces:

- _key_rotation_notify_multi_ns_netkit: triggers key rotation and
  verifies the notification is received in both main and guest namespaces
- _dev_change_notify_multi_ns_netkit: triggers dev_set and verifies the
  dev_change notification is received in both namespaces

Signed-off-by: Wei Wang <weibunny@fb.com>
Link: https://patch.msgid.link/20260608233118.2694144-10-weibunny.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agoselftests/net: psp: add dev-assoc data path test
Wei Wang [Mon, 8 Jun 2026 23:31:16 +0000 (16:31 -0700)] 
selftests/net: psp: add dev-assoc data path test

Add _assoc_check_list() test that associates nk_guest with the PSP
device and verifies the assoc-list is correctly populated.

Add _data_basic_send_netkit_psp_assoc() which tests PSP data send
through a netkit interface associated with a PSP device. The test
associates nk_guest with the PSP device, then sends PSP-encrypted
traffic from the guest namespace.

Signed-off-by: Wei Wang <weibunny@fb.com>
Link: https://patch.msgid.link/20260608233118.2694144-9-weibunny.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agoselftests/net: psp: support PSP in NetDrvContEnv infrastructure
Wei Wang [Mon, 8 Jun 2026 23:31:15 +0000 (16:31 -0700)] 
selftests/net: psp: support PSP in NetDrvContEnv infrastructure

Add infrastructure to support PSP tests across network namespaces
using NetDrvContEnv with netkit pairs. This enables testing PSP device
association, where a non-PSP-capable device (e.g. netkit) in a guest
namespace is associated with a real PSP device in the host namespace,
allowing the guest to perform PSP encryption/decryption through the
host's PSP hardware.

The topology is:
  Host NS:  psp_dev_local <---> nk_host
                |                  |
                |                  | (netkit pair)
                |                  |
  Remote NS: psp_dev_peer      Guest NS: nk_guest
             (responder)             (PSP tests)

env.py:
- nk_guest_ifindex is queried after moving the device into the guest
  namespace, so tests can use it directly for dev-assoc

psp.py:
- PSP device lookup supports container environments where the PSP
  device is on the physical interface, not the test interface
- Association helpers handle dev-assoc/dev-disassoc with defer-based
  cleanup to prevent state leaks on test assertion failures
- main() tries NetDrvContEnv with primary_rx_redirect and falls back
  to NetDrvEpEnv, so existing tests continue to work without the
  container environment

Signed-off-by: Wei Wang <weibunny@fb.com>
Link: https://patch.msgid.link/20260608233118.2694144-8-weibunny.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agoselftests/net: rename _nk_host_ifname to nk_host_ifname
Wei Wang [Mon, 8 Jun 2026 23:31:14 +0000 (16:31 -0700)] 
selftests/net: rename _nk_host_ifname to nk_host_ifname

Rename _nk_host_ifname to nk_host_ifname in NetDrvContEnv to make it
a public attribute, matching the nk_guest_ifname rename. Tests that
access the host-side netkit interface name (e.g. for cleanup after
deleting the netkit pair) no longer trigger pylint protected-access
warnings.

Signed-off-by: Wei Wang <weibunny@fb.com>
Link: https://patch.msgid.link/20260608233118.2694144-7-weibunny.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agoselftests/net: add _find_bpf_obj() to search hw/ for BPF objects
Wei Wang [Mon, 8 Jun 2026 23:31:13 +0000 (16:31 -0700)] 
selftests/net: add _find_bpf_obj() to search hw/ for BPF objects

Add _find_bpf_obj() helper to NetDrvContEnv that searches the test
directory first, then falls back to the hw/ subdirectory. This allows
tests outside drivers/net/hw/ (e.g. psp.py in drivers/net/) to find
BPF objects built in the hw/ directory.

Update _attach_bpf() and _attach_primary_rx_redirect_bpf() to use
_find_bpf_obj() for BPF object discovery.

Signed-off-by: Wei Wang <weibunny@fb.com>
Link: https://patch.msgid.link/20260608233118.2694144-6-weibunny.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days agoselftests/net: psp: refactor test builders to use ksft_variants
Wei Wang [Mon, 8 Jun 2026 23:31:12 +0000 (16:31 -0700)] 
selftests/net: psp: refactor test builders to use ksft_variants

Replace the manual psp_ip_ver_test_builder() and ipver_test_builder()
functions with @ksft_variants decorators for data_basic_send and
data_mss_adjust. This is a pure refactor with no behavior change.

Signed-off-by: Wei Wang <weibunny@fb.com>
Link: https://patch.msgid.link/20260608233118.2694144-5-weibunny.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>