]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
3 weeks agoRDMA/ionic: Fix memory leak of admin q_wr
Abhijit Gangurde [Wed, 24 Sep 2025 14:21:23 +0000 (19:51 +0530)] 
RDMA/ionic: Fix memory leak of admin q_wr

The admin queue work request buffer, aq->q_wr, is allocated via kcalloc in
__ionic_create_rdma_adminq. However, it was not being freed in the
corresponding teardown function __ionic_destroy_rdma_adminq. This results
in a memory leak.  Fix this leak by adding the missing kfree(aq->q_wr) in
the destruction path.

Fixes: f3bdbd42702c ("RDMA/ionic: Create device queues to support admin operations")
Link: https://patch.msgid.link/r/20250924142123.18344-1-abhijit.gangurde@amd.com
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 weeks agoRDMA/siw: Always report immediate post SQ errors
Bernard Metzler [Tue, 23 Sep 2025 14:45:36 +0000 (16:45 +0200)] 
RDMA/siw: Always report immediate post SQ errors

In siw_post_send(), any immediate error encountered during processing of
the work request list must be reported to the caller, even if previous
work requests in that list were just accepted and added to the send queue.

Not reporting those errors confuses the caller, which would wait
indefinitely for the failing and potentially subsequently aborted work
requests completion.

This fixes a case where immediate errors were overwritten by subsequent
code in siw_post_send().

Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
Link: https://patch.msgid.link/r/20250923144536.103825-1-bernard.metzler@linux.dev
Suggested-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Bernard Metzler <bernard.metzler@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 weeks agoRDMA/bnxt_re: improve clarity in ALLOC_PAGE handler
Alok Tiwari [Wed, 24 Sep 2025 11:01:27 +0000 (04:01 -0700)] 
RDMA/bnxt_re: improve clarity in ALLOC_PAGE handler

Update uverbs_copy_to call to use sizeof(dpi) instead of sizeof(length)
when copying the device page index (DPI) back to user space. Both dpi
and length are declared as u32, so this change has no functional impact
but makes the code clearer.

Link: https://patch.msgid.link/r/20250924110130.340195-1-alok.a.tiwari@oracle.com
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 weeks agoRDMA/irdma: Remove unused struct irdma_cq fields
Jacob Moroni [Tue, 23 Sep 2025 14:21:28 +0000 (14:21 +0000)] 
RDMA/irdma: Remove unused struct irdma_cq fields

These fields were set but not used anywhere, so remove them.

Link: https://patch.msgid.link/r/20250923142128.943240-1-jmoroni@google.com
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 weeks agoRDMA/irdma: Fix positive vs negative error codes in irdma_post_send()
Dan Carpenter [Tue, 23 Sep 2025 11:20:45 +0000 (14:20 +0300)] 
RDMA/irdma: Fix positive vs negative error codes in irdma_post_send()

This code accidentally returns positive EINVAL instead of negative
-EINVAL.  Some of the callers treat positive returns as success.
Add the missing '-' char.

Fixes: a24a29c8747f ("RDMA/irdma: Add Atomic Operations support")
Link: https://patch.msgid.link/r/aNKCjcD6Nab1jWEV@stanley.mountain
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 weeks agoRDMA/bnxt_re: Remove non-statistics counters from hw_counters
Anantha Prabhu [Tue, 23 Sep 2025 06:26:57 +0000 (11:56 +0530)] 
RDMA/bnxt_re: Remove non-statistics counters from hw_counters

Remove non-statistics counters from the RDMA hw_counters framework.
The removed data includes:

- Active resource counts (ACTIVE_PD, ACTIVE_QP, etc.)
- Resource watermarks (WATERMARK_PD, WATERMARK_QP, etc.)
- Operational counters (RESIZE_CQ_CNT)
- DB pacing metrics (PACING_RESCHED, PACING_CMPL, etc.)

This change ensures hw_counters contains only true performance
and error statistics.

Link: https://patch.msgid.link/r/20250923062657.981487-3-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Anantha Prabhu <anantha.prabhu@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
3 weeks agoRDMA/bnxt_re: Add debugfs info entry for device and resource information
Anantha Prabhu [Tue, 23 Sep 2025 06:26:56 +0000 (11:56 +0530)] 
RDMA/bnxt_re: Add debugfs info entry for device and resource information

Add a new debugfs info entry that displays device information and
non-statistics data using the seq_file interface. This entry shows:

- Resource watermarks (peak usage tracking)
- Operational counters (CQ resize count)
- Doorbell pacing information

Link: https://patch.msgid.link/r/20250923062657.981487-2-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Anantha Prabhu <anantha.prabhu@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 weeks agoRDMA/bnxt_re: Fix incorrect errno used in function comments
Alok Tiwari [Sun, 21 Sep 2025 08:18:48 +0000 (01:18 -0700)] 
RDMA/bnxt_re: Fix incorrect errno used in function comments

The function comments in qplib_rcfw.c mention -ETIMEOUT as a
possible return value. However, the correct errno is -ETIMEDOUT.

Update the comments to reflect the proper return value to avoid
confusion for developers and users referring to the code.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250921081854.1059094-1-alok.a.tiwari@oracle.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA: Use %pe format specifier for error pointers
Leon Romanovsky [Thu, 18 Sep 2025 17:53:41 +0000 (20:53 +0300)] 
RDMA: Use %pe format specifier for error pointers

Convert error logging throughout the RDMA subsystem to use
the %pe format specifier instead of PTR_ERR() with integer
format specifiers.

Link: https://patch.msgid.link/e81ec02df1e474be20417fb62e779776e3f47a50.1758217936.git.leon@kernel.org
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
4 weeks agoRDMA/ionic: Use ether_addr_copy instead of memcpy
Abhijit Gangurde [Fri, 19 Sep 2025 12:13:01 +0000 (17:43 +0530)] 
RDMA/ionic: Use ether_addr_copy instead of memcpy

eth header from ib_ud_header structure packs the mac
into 4B high and 2B low parts. But when 4B high is used
in memcpy, it sees it as overflow. However, this is safe
due to the 4B high and 2B low arrangement in the structure.
To avoid the memcpy warning, use ether_addr_copy to copy
the mac address.

In function ‘fortify_memcpy_chk’,
    inlined from ‘ionic_set_ah_attr.isra’ at drivers/infiniband/hw/ionic/ionic_controlpath.c:609:3:
./include/linux/fortify-string.h:580:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
  580 |                         __read_overflow2_field(q_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[6]: *** [scripts/Makefile.build:287: drivers/infiniband/hw/ionic/ionic_controlpath.o] Error 1
make[5]: *** [scripts/Makefile.build:556: drivers/infiniband/hw/ionic] Error 2
make[5]: *** Waiting for unfinished jobs....
make[4]: *** [scripts/Makefile.build:556: drivers/infiniband/hw] Error 2
make[3]: *** [scripts/Makefile.build:556: drivers/infiniband] Error 2
make[2]: *** [scripts/Makefile.build:556: drivers] Error 2
make[1]: *** [/tmp/tmp53nb1nwr/Makefile:2011: .] Error 2
make: *** [Makefile:248: __sub-make] Error 2

Fixes: e8521822c733 ("RDMA/ionic: Register device ops for control path")
Reported-by: Leon Romanovsky <leon@kernel.org>
Closes: https://lore.kernel.org/lkml/20250918180750.GA135135@unreal/
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250919121301.1113759-2-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/ionic: Fix build failure on SPARC due to xchg() operand size
Abhijit Gangurde [Fri, 19 Sep 2025 12:13:00 +0000 (17:43 +0530)] 
RDMA/ionic: Fix build failure on SPARC due to xchg() operand size

xchg() is used to safely handle the event queue arming.
However SPARC xchg operates only 4B of variable.
Change variable type from bool to int.

Unverified Error/Warning (likely false positive, kindly check if interested):

    ERROR: modpost: "__xchg_called_with_bad_pointer" [drivers/infiniband/hw/ionic/ionic_rdma.ko] undefined!

Error/Warning ids grouped by kconfigs:

recent_errors
`-- sparc-allmodconfig
    `-- ERROR:__xchg_called_with_bad_pointer-drivers-infiniband-hw-ionic-ionic_rdma.ko-undefined

Fixes: f3bdbd42702c ("RDMA/ionic: Create device queues to support admin operations")
Reported-by: Leon Romanovsky <leon@kernel.org>
Closes: https://lore.kernel.org/lkml/20250918180750.GA135135@unreal/
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250919121301.1113759-1-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/rxe: Fix race in do_task() when draining
Gui-Dong Han [Fri, 19 Sep 2025 02:52:12 +0000 (02:52 +0000)] 
RDMA/rxe: Fix race in do_task() when draining

When do_task() exhausts its iteration budget (!ret), it sets the state
to TASK_STATE_IDLE to reschedule, without a secondary check on the
current task->state. This can overwrite the TASK_STATE_DRAINING state
set by a concurrent call to rxe_cleanup_task() or rxe_disable_task().

While state changes are protected by a spinlock, both rxe_cleanup_task()
and rxe_disable_task() release the lock while waiting for the task to
finish draining in the while(!is_done(task)) loop. The race occurs if
do_task() hits its iteration limit and acquires the lock in this window.
The cleanup logic may then proceed while the task incorrectly
reschedules itself, leading to a potential use-after-free.

This bug was introduced during the migration from tasklets to workqueues,
where the special handling for the draining case was lost.

Fix this by restoring the original pre-migration behavior. If the state is
TASK_STATE_DRAINING when iterations are exhausted, set cont to 1 to
force a new loop iteration. This allows the task to finish its work, so
that a subsequent iteration can reach the switch statement and correctly
transition the state to TASK_STATE_DRAINED, stopping the task as intended.

Fixes: 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks")
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
Link: https://patch.msgid.link/20250919025212.1682087-1-hanguidong02@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoIB/sa: Fix sa_local_svc_timeout_ms read race
Vlad Dumitrescu [Tue, 16 Sep 2025 16:31:12 +0000 (19:31 +0300)] 
IB/sa: Fix sa_local_svc_timeout_ms read race

When computing the delta, the sa_local_svc_timeout_ms is read without
ib_nl_request_lock held. Though unlikely in practice, this can cause
a race condition if multiple local service threads are managing the
timeout.

Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink")
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250916163112.98414-1-edwards@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoIB/ipoib: Ignore L3 master device
Vlad Dumitrescu [Tue, 16 Sep 2025 11:11:03 +0000 (14:11 +0300)] 
IB/ipoib: Ignore L3 master device

Currently, all master upper netdevices (e.g., bond, VRF) are treated
equally.

When a VRF netdevice is used over an IPoIB netdevice, the expected
netdev resolution is on the lower IPoIB device which has the IP address
assigned to it and not the VRF device.

The rdma_cm module (CMA) tries to match incoming requests to a
particular netdevice. When successful, it also validates that the return
path points to the same device by performing a routing table lookup.
Currently, the former would resolve to the VRF netdevice, while the
latter to the correct lower IPoIB netdevice, leading to failure in
rdma_cm.

Improve this by ignoring the VRF master netdevice, if it exists, and
instead return the lower IPoIB device.

Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250916111103.84069-5-edwards@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/core: Use route entry flag to decide on loopback traffic
Parav Pandit [Tue, 16 Sep 2025 11:11:02 +0000 (14:11 +0300)] 
RDMA/core: Use route entry flag to decide on loopback traffic

addr_resolve() considers a destination to be local if the next-hop
device of the resolved route for the destination is the loopback
netdevice.

This fails when the source and destination IP addresses belong to
a netdev enslaved to a VRF netdev. In this case the next-hop device
is the VRF itself:

 $ ip link add name myvrf up type vrf table 100
 $ ip link set ens2f0np0 master myvrf up
 $ ip addr add 192.168.1.1/24 dev ens2f0np0
 $ ip route get 192.168.1.1 oif myvrf
 local 192.168.1.1 dev myvrf table 100 src 192.168.1.1 uid 0
    cache <local>

This results in packets being generated with an incorrect destination
MAC of the VRF netdevice and ib_write_bw failing with timeout.

Solve this by determining if a destination is local or not based on
the resolved route's type rather than based on its next-hop netdevice
loopback flag.

This enables to resolve loopback traffic with and without VRF
configurations in a uniform way.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250916111103.84069-4-edwards@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/core: Resolve MAC of next-hop device without ARP support
Parav Pandit [Tue, 16 Sep 2025 11:11:01 +0000 (14:11 +0300)] 
RDMA/core: Resolve MAC of next-hop device without ARP support

Currently, if the next-hop netdevice does not support ARP resolution,
the destination MAC address is silently set to zero without reporting
an error. This leads to incorrect behavior and may result in packet
transmission failures.

Fix this by deferring MAC resolution to the IP stack via neighbour
lookup, allowing proper resolution or error reporting as appropriate.

Fixes: 7025fcd36bd6 ("IB: address translation to map IP toIB addresses (GIDs)")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250916111103.84069-3-edwards@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/core: Squash a single user static function
Parav Pandit [Tue, 16 Sep 2025 11:11:00 +0000 (14:11 +0300)] 
RDMA/core: Squash a single user static function

To reduce dependencies in IFF_LOOPBACK in route and neighbour resolution
steps, squash the static function to its single caller and simplify the
code.
Until now, network field was set even when neighbour resolution failed.
With this change, dev_addr output fields are valid only when resolution
is successful.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250916111103.84069-2-edwards@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Update Kconfig
Tatyana Nikolova [Wed, 27 Aug 2025 15:25:45 +0000 (10:25 -0500)] 
RDMA/irdma: Update Kconfig

Update Kconfig to add dependency on idpf module and
add IPU E2000 to the list of supported devices.

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-17-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Extend CQE Error and Flush Handling for GEN3 Devices
Shiraz Saleem [Wed, 27 Aug 2025 15:25:44 +0000 (10:25 -0500)] 
RDMA/irdma: Extend CQE Error and Flush Handling for GEN3 Devices

Enhance the CQE error and flush handling specific to GEN3 devices.
Unlike GEN1/2 devices, which depend on software to generate completions
in error, GEN3 devices leverage firmware to generate CQEs in error for
all WQEs posted after a QP moves to an error state.

Key changes include:
- Updating the CQ poll logic to properly advance the CQ head in the
event of a flush CQE.
- Updating the flush logic for GEN3 to pass error WQE idx
for SQ on an AE to flush out unprocessed WQEs in error.
- Isolating the decoding of AE to flush codes into a separate routine
irdma_ae_to_qp_err_code. This routine can now be leveraged to
flush error CQEs on an AE and when error CQE is received for SRQ.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-16-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Add Atomic Operations support
Faisal Latif [Wed, 27 Aug 2025 15:25:43 +0000 (10:25 -0500)] 
RDMA/irdma: Add Atomic Operations support

Extend irdma to support atomic operations, namely Compare and Swap and
Fetch and Add, for GEN3 devices.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-15-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Restrict Memory Window and CQE Timestamping to GEN3
Shiraz Saleem [Wed, 27 Aug 2025 15:25:42 +0000 (10:25 -0500)] 
RDMA/irdma: Restrict Memory Window and CQE Timestamping to GEN3

With the deprecation of Memory Window and Timestamping support in GEN2,
move these features to be exclusive to GEN3. This iteration supports
only Type2 Memory Windows. Additionally, it includes the reporting of
the timestamp mask and Host Channel Adapter (HCA) core clock frequency
via the query device verb.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-14-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Add SRQ support
Faisal Latif [Wed, 27 Aug 2025 15:25:41 +0000 (10:25 -0500)] 
RDMA/irdma: Add SRQ support

Implement verb API and UAPI changes to support SRQ functionality in GEN3
devices.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-13-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Support 64-byte CQEs and GEN3 CQE opcode decoding
Shiraz Saleem [Wed, 27 Aug 2025 15:25:40 +0000 (10:25 -0500)] 
RDMA/irdma: Support 64-byte CQEs and GEN3 CQE opcode decoding

Introduce support for 64-byte CQEs in GEN3 devices. Additionally,
implement GEN3-specific CQE opcode decoding.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-12-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Add support for V2 HMC resource management scheme
Vinoth Kumar Chandra Mohan [Wed, 27 Aug 2025 15:25:39 +0000 (10:25 -0500)] 
RDMA/irdma: Add support for V2 HMC resource management scheme

HMC resource initialization is updated to support V1 or V2 approach
based on the FW capability. In the V2 approach, driver receives the
assigned HMC resources count and verifies if it will fit in the given
local memory. If it doesn't fit, the driver load fails.

Signed-off-by: Vinoth Kumar Chandra Mohan <vinoth.kumar.chandra.mohan@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-11-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Extend QP context programming for GEN3
Shiraz Saleem [Wed, 27 Aug 2025 15:25:38 +0000 (10:25 -0500)] 
RDMA/irdma: Extend QP context programming for GEN3

Extend the QP context structure with support for new fields
specific to GEN3 hardware capabilities.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-10-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Add GEN3 virtual QP1 support
Shiraz Saleem [Wed, 27 Aug 2025 15:25:37 +0000 (10:25 -0500)] 
RDMA/irdma: Add GEN3 virtual QP1 support

Add a new RDMA virtual channel op during QP1 creation that allow the
Control Plane (CP) to virtualize a regular QP as QP1 on non-default
RDMA capable vPorts. Additionally, the CP will return the Qsets to use
on the ib_device of the vPort.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-9-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Introduce GEN3 vPort driver support
Mustafa Ismail [Wed, 27 Aug 2025 15:25:36 +0000 (10:25 -0500)] 
RDMA/irdma: Introduce GEN3 vPort driver support

In the IPU model, a function can host one or more logical network
endpoints called vPorts. Each vPort may be associated with either a
physical or an internal communication port, and can be RDMA capable. A
vPort features a netdev and, if RDMA capable, must have an associated
ib_dev.

This change introduces a GEN3 auxiliary vPort driver responsible for
registering a verbs device for every RDMA-capable vPort. Additionally,
the UAPI is updated to prevent the binding of GEN3 devices to older
user-space providers.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-8-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Add GEN3 HW statistics support
Krzysztof Czurylo [Wed, 27 Aug 2025 15:25:35 +0000 (10:25 -0500)] 
RDMA/irdma: Add GEN3 HW statistics support

Plug into the unified HW statistics framework by adding a hardware
statistics map array for GEN3, defining the HW-specific width and
location for each counter in the statistics buffer.

Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-7-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Add GEN3 support for AEQ and CEQ
Shiraz Saleem [Wed, 27 Aug 2025 15:25:34 +0000 (10:25 -0500)] 
RDMA/irdma: Add GEN3 support for AEQ and CEQ

Extend support for GEN3 devices by programming the necessary hardware
IRQ registers and the updated descriptor fields for the Asynchronous
Event Queue (AEQ) and Completion Event Queue (CEQ). Introduce a RDMA
virtual channel operation with the Control Plane (CP) to associate
interrupt vectors appropriately with AEQ and CEQ. Add new Asynchronous
Event (AE) definitions specific to GEN3.

Additionally, refactor the AEQ and CEQ setup into the irdma_ctrl_init_hw
device control initialization routine.

This completes the PCI device level initialization for RDMA in the core
driver.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-6-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Add GEN3 CQP support with deferred completions
Krzysztof Czurylo [Wed, 27 Aug 2025 15:25:33 +0000 (10:25 -0500)] 
RDMA/irdma: Add GEN3 CQP support with deferred completions

GEN3 introduces asynchronous handling of Control QP (CQP) operations to
minimize head-of-line blocking. Create the CQP using the updated GEN3-
specific descriptor fields and implement the necessary support for this
deferred completion mechanism.

Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-5-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Discover and set up GEN3 hardware register layout
Christopher Bednarz [Wed, 27 Aug 2025 15:25:32 +0000 (10:25 -0500)] 
RDMA/irdma: Discover and set up GEN3 hardware register layout

Discover the hardware register layout for GEN3 devices through an RDMA
virtual channel operation with the Control Plane (CP). Set up the
corresponding hardware attributes specific to GEN3 devices.

Signed-off-by: Christopher Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-4-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Add GEN3 core driver support
Mustafa Ismail [Wed, 27 Aug 2025 15:25:31 +0000 (10:25 -0500)] 
RDMA/irdma: Add GEN3 core driver support

Introduce support for the GEN3 auxiliary core driver, which is
responsible for initializing PCI-level RDMA resources.

Facilitate host-driver communication with the device's Control Plane (CP)
to discover capabilities and perform privileged operations through an
RDMA-specific messaging interface built atop the IDPF mailbox and virtual
channel protocol.

Establish the RDMA virtual channel message interface and incorporate
operations to retrieve the hardware version and discover capabilities
from the CP.

Additionally, set up the RDMA MMIO regions and initialize the RF structure.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Co-developed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-3-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/irdma: Refactor GEN2 auxiliary driver
Mustafa Ismail [Wed, 27 Aug 2025 15:25:30 +0000 (10:25 -0500)] 
RDMA/irdma: Refactor GEN2 auxiliary driver

Refactor the irdma auxiliary driver and associated interfaces out of main.c
and into a standalone GEN2-specific source file and rename as gen_2 driver.

This is in preparation for adding GEN3 auxiliary drivers. Each HW
generation will have its own gen-specific interface file.

Additionally, move the Address Handle hash table and associated locks
under rf struct. This will allow GEN3 code to migrate to use it easily.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Co-developed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20250827152545.2056-2-tatyana.e.nikolova@intel.com
Tested-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/mana_ib: Extend modify QP
Shiraz Saleem [Mon, 15 Sep 2025 07:59:32 +0000 (00:59 -0700)] 
RDMA/mana_ib: Extend modify QP

Extend modify QP to support further attributes: local_ack_timeout, UD qkey,
rate_limit, qp_access_flags, flow_label, max_rd_atomic.

Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com>
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/1757923172-4475-1-git-send-email-kotaranov@linux.microsoft.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
4 weeks agoRDMA/cm: Rate limit destroy CM ID timeout error message
Håkon Bugge [Fri, 12 Sep 2025 10:05:20 +0000 (12:05 +0200)] 
RDMA/cm: Rate limit destroy CM ID timeout error message

When the destroy CM ID timeout kicks in, you typically get a storm of
them which creates a log flooding. Hence, change pr_err() to
pr_err_ratelimited() in cm_destroy_id_wait_timeout().

Fixes: 96d9cbe2f2ff ("RDMA/cm: add timeout to cm_destroy_id wait")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Link: https://patch.msgid.link/20250912100525.531102-1-haakon.bugge@oracle.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Avoid GID level QoS update from the driver
Shravya KN [Mon, 8 Sep 2025 09:45:16 +0000 (15:15 +0530)] 
RDMA/bnxt_re: Avoid GID level QoS update from the driver

The driver inserts a VLAN header into RoCE packets when the
traffic was untagged by modifying the existing GID entries.
This has caused the firmware to enforce only VLAN-based
priority mappings, ignoring other valid priority configurations
set via APP TLVs (e.g., DSCP selectors).

Driver now has support for selecting the service level (vlan id)
and traffic class (dscp) during modify_qp. So no need to override
the priority update using the update gid method. Hence removing
the code that handles the above operation.

Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Link: https://patch.msgid.link/20250908094516.18222-3-kalesh-anakkur.purayil@broadcom.com
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Update sysfs entries with appropriate data
Anantha Prabhu [Mon, 8 Sep 2025 09:45:15 +0000 (15:15 +0530)] 
RDMA/bnxt_re: Update sysfs entries with appropriate data

Updated the existing sysfs entries with correct data.
This change is to align the behavior with our OOB driver.
Added "board_id" sysfs entry which will provide the
VPD Part number, if exists.

Signed-off-by: Anantha Prabhu <anantha.prabhu@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250908094516.18222-2-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/ionic: Add Makefile/Kconfig to kernel build environment
Abhijit Gangurde [Wed, 3 Sep 2025 06:16:06 +0000 (11:46 +0530)] 
RDMA/ionic: Add Makefile/Kconfig to kernel build environment

Add ionic to the kernel build environment.

Co-developed-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-15-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/ionic: Implement device stats ops
Abhijit Gangurde [Wed, 3 Sep 2025 06:16:05 +0000 (11:46 +0530)] 
RDMA/ionic: Implement device stats ops

Implement device stats operations for hw stats and qp stats.

Co-developed-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-14-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/ionic: Register device ops for miscellaneous functionality
Abhijit Gangurde [Wed, 3 Sep 2025 06:16:04 +0000 (11:46 +0530)] 
RDMA/ionic: Register device ops for miscellaneous functionality

Implement idbdev ops for device and port information.

Co-developed-by: Andrew Boyer <andrew.boyer@amd.com>
Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
Co-developed-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-13-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/ionic: Register device ops for datapath
Abhijit Gangurde [Wed, 3 Sep 2025 06:16:03 +0000 (11:46 +0530)] 
RDMA/ionic: Register device ops for datapath

Implement device supported verb APIs for datapath.

Co-developed-by: Andrew Boyer <andrew.boyer@amd.com>
Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
Co-developed-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-12-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/ionic: Register device ops for control path
Abhijit Gangurde [Wed, 3 Sep 2025 06:16:02 +0000 (11:46 +0530)] 
RDMA/ionic: Register device ops for control path

Implement device supported verb APIs for control path.

Co-developed-by: Andrew Boyer <andrew.boyer@amd.com>
Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
Co-developed-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-11-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/ionic: Create device queues to support admin operations
Abhijit Gangurde [Wed, 3 Sep 2025 06:16:01 +0000 (11:46 +0530)] 
RDMA/ionic: Create device queues to support admin operations

Setup RDMA admin queues using device command exposed over
auxiliary device and manage these queues using ida.

Co-developed-by: Andrew Boyer <andrew.boyer@amd.com>
Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
Co-developed-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-10-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/ionic: Register auxiliary module for ionic ethernet adapter
Abhijit Gangurde [Wed, 3 Sep 2025 06:16:00 +0000 (11:46 +0530)] 
RDMA/ionic: Register auxiliary module for ionic ethernet adapter

Register auxiliary module to create ibdevice for ionic
ethernet adapter.

Co-developed-by: Andrew Boyer <andrew.boyer@amd.com>
Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
Co-developed-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-9-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA: Add IONIC to rdma_driver_id definition
Abhijit Gangurde [Wed, 3 Sep 2025 06:15:59 +0000 (11:45 +0530)] 
RDMA: Add IONIC to rdma_driver_id definition

Define RDMA_DRIVER_IONIC in enum rdma_driver_id.

Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-8-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agonet: ionic: Provide doorbell and CMB region information
Abhijit Gangurde [Wed, 3 Sep 2025 06:15:58 +0000 (11:45 +0530)] 
net: ionic: Provide doorbell and CMB region information

The RDMA device needs information of controller memory bar and
doorbell capability to share with user context. Discover CMB regions
and express doorbell capabilities on device init.

Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Co-developed-by: Pablo Cascón <pablo.cascon@amd.com>
Signed-off-by: Pablo Cascón <pablo.cascon@amd.com>
Co-developed-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Allen Hubbe <allen.hubbe@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-7-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agonet: ionic: Provide interrupt allocation support for the RDMA driver
Abhijit Gangurde [Wed, 3 Sep 2025 06:15:57 +0000 (11:45 +0530)] 
net: ionic: Provide interrupt allocation support for the RDMA driver

RDMA driver needs an interrupt for an event queue. Export
function from net driver to allocate an interrupt.

Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-6-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agonet: ionic: Provide RDMA reset support for the RDMA driver
Abhijit Gangurde [Wed, 3 Sep 2025 06:15:56 +0000 (11:45 +0530)] 
net: ionic: Provide RDMA reset support for the RDMA driver

The Ethernet driver holds the privilege to execute the device commands.
Export the function to execute RDMA reset command for use by RDMA driver.

Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-5-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agonet: ionic: Export the APIs from net driver to support device commands
Abhijit Gangurde [Wed, 3 Sep 2025 06:15:55 +0000 (11:45 +0530)] 
net: ionic: Export the APIs from net driver to support device commands

RDMA driver needs to establish admin queues to support admin operations.
Export the APIs to send device commands for the RDMA driver.

Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-4-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agonet: ionic: Update LIF identity with additional RDMA capabilities
Abhijit Gangurde [Wed, 3 Sep 2025 06:15:54 +0000 (11:45 +0530)] 
net: ionic: Update LIF identity with additional RDMA capabilities

Firmware sends the RDMA capability in a response for LIF_IDENTIFY
device command. Update the LIF indentify with additional RDMA
capabilities used by driver and firmware.

Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-3-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agonet: ionic: Create an auxiliary device for rdma driver
Abhijit Gangurde [Wed, 3 Sep 2025 06:15:53 +0000 (11:45 +0530)] 
net: ionic: Create an auxiliary device for rdma driver

To support RDMA capable ethernet device, create an auxiliary device in
the ionic Ethernet driver. The RDMA device is modeled as an auxiliary
device to the Ethernet device.

Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com>
Link: https://patch.msgid.link/20250903061606.4139957-2-abhijit.gangurde@amd.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Call strscpy() with correct size argument
Thorsten Blum [Mon, 1 Sep 2025 15:00:39 +0000 (17:00 +0200)] 
RDMA/bnxt_re: Call strscpy() with correct size argument

In bnxt_re_register_ib(), strscpy() is called with the length of the
source string rather than the size of the destination buffer.

This is fine as long as the destination buffer is larger than the source
string, but we should still use the destination buffer size instead to
call strscpy() as intended. And since 'node_desc' has a fixed size, we
can safely omit the size argument and let strscpy() infer it using
sizeof().

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20250901150038.227036-2-thorsten.blum@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/core: fix "truely"->"truly"
Xichao Zhao [Wed, 27 Aug 2025 12:00:07 +0000 (20:00 +0800)] 
RDMA/core: fix "truely"->"truly"

Trivial fix to spelling mistake in comment text.

Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Link: https://patch.msgid.link/20250827120007.489496-1-zhao.xichao@vivo.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/rdmavt: Use int type to store negative error codes
Qianfeng Rong [Tue, 26 Aug 2025 15:05:56 +0000 (23:05 +0800)] 
RDMA/rdmavt: Use int type to store negative error codes

Change 'ret' from u32 to int in alloc_qpn() to store -EINVAL, and remove
the 'bail' label as it simply returns 'ret'.

Storing negative error codes in an u32 causes no runtime issues, but it's
ugly as pants,  Change 'ret' from u32 to int type - this change has no
runtime impact.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Link: https://patch.msgid.link/20250826150556.541440-1-rongqianfeng@vivo.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/mlx5: Fix page size bitmap calculation for KSM mode
Edward Srouji [Sun, 24 Aug 2025 14:48:39 +0000 (17:48 +0300)] 
RDMA/mlx5: Fix page size bitmap calculation for KSM mode

When using KSM (Key Scatter-gather Memory) access mode, the HW requires
the IOVA to be aligned to the selected page size.
Without this alignment, the HW may not function correctly.

Currently, mlx5_umem_mkc_find_best_pgsz() does not filter out page sizes
that would result in misaligned IOVAs for KSM mode. This can lead to
selecting page sizes that are incompatible with the given IOVA.

Fix this by filtering the page size bitmap when in KSM mode, keeping
only page sizes to which the IOVA is aligned to.

Fixes: fcfb03597b7d ("RDMA/mlx5: Align mkc page size capability check to PRM")
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20250824144839.154717-1-edwards@nvidia.com
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Remove unnecessary condition checks
Kalesh AP [Fri, 22 Aug 2025 04:08:01 +0000 (09:38 +0530)] 
RDMA/bnxt_re: Remove unnecessary condition checks

The check for "rdev" and "en_dev" pointer validity always
return false.

Remove them.

Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250822040801.776196-11-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Use firmware provided message timeout value
Saravanan Vajravel [Fri, 22 Aug 2025 04:08:00 +0000 (09:38 +0530)] 
RDMA/bnxt_re: Use firmware provided message timeout value

Before this patch, we used a hardcoded value of 500 msec as the default
value for L2 firmware message response timeout. With this commit,
the driver is using the firmware timeout value from the firmware.

As part of this change moved bnxt_re_query_hwrm_intf_version() to
bnxt_re_setup_chip_ctx() so that timeout value is queries before
sending first command.

Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com>
Co-developed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250822040801.776196-10-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Initialize fw with roce_mirror support
Saravanan Vajravel [Fri, 22 Aug 2025 04:07:59 +0000 (09:37 +0530)] 
RDMA/bnxt_re: Initialize fw with roce_mirror support

- Check FW capability for roce_mirror support.
- Initialize FW with roce_mirror support.
- When modifying QP, use unique GID for sgid incase of RawEth QP.

Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Anantha Prabhu <anantha.prabhu@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250822040801.776196-9-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Add support for flow create/destroy
Saravanan Vajravel [Fri, 22 Aug 2025 04:07:58 +0000 (09:37 +0530)] 
RDMA/bnxt_re: Add support for flow create/destroy

- Added support for create_flow and destroy_flow verbs. These
  verbs are used on RawEth QP to add a specific flow action.
- To support TCP dump on RoCE, added IB_FLOW_ATTR_SNIFFER
  attribute.
- In create_flow verb, driver allocates mirror_vnic and configure it
  with RawEth QP. Once this is done, driver will enable mirroring.
- In destroy_flow, driver will disable mirroring and free the mirror
  vnic.

Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250822040801.776196-8-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Add support for mirror vnic
Saravanan Vajravel [Fri, 22 Aug 2025 04:07:57 +0000 (09:37 +0530)] 
RDMA/bnxt_re: Add support for mirror vnic

Added below support:
- Querying the pre-reserved mirror_vnic_id
- Allocating/freeing mirror_vnic
- Configuring mirror vnic to associate it with raw qp

These functions will be used in the subsequent patch in this series.

Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250822040801.776196-7-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Add support for unique GID
Saravanan Vajravel [Fri, 22 Aug 2025 04:07:56 +0000 (09:37 +0530)] 
RDMA/bnxt_re: Add support for unique GID

- RawEth QP requires unique GID so that per function stats_ctx
  is not polluted by packets mirrored to RoCE vnic.
- Added support to add unique GID when RawEth type QP is created.
- Added support to destroy unique GID when RawEth type QP is
  destroyed.
- Allocated exclusive stats_ctx to use for RawEth type QP.

Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250822040801.776196-6-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Refactor stats context memory allocation
Kalesh AP [Fri, 22 Aug 2025 04:07:55 +0000 (09:37 +0530)] 
RDMA/bnxt_re: Refactor stats context memory allocation

Moved the stats context allocation logic to a new function.
The stats context memory allocation code has been moved from
bnxt_qplib_alloc_hwctx() to the newly added bnxt_re_get_stats_ctx()
function. Also, the code to send the firmware command has been moved.

This patch is in preparation for other patches in this series.
There is no functional changes intended.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250822040801.776196-5-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Refactor hw context memory allocation
Kalesh AP [Fri, 22 Aug 2025 04:07:54 +0000 (09:37 +0530)] 
RDMA/bnxt_re: Refactor hw context memory allocation

This patch is in preparation for other patches in this series.
There is no functional changes intended.

1. Rename bnxt_qplib_alloc_ctx() to bnxt_qplib_alloc_hwctx().
2. Rename bnxt_qplib_free_ctx() to bnxt_qplib_free_hwctx().
3. Reduce the number of arguments of bnxt_qplib_alloc_hwctx()
   by moving a check outside of it.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250822040801.776196-4-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Add data structures for RoCE mirror support
Saravanan Vajravel [Fri, 22 Aug 2025 04:07:53 +0000 (09:37 +0530)] 
RDMA/bnxt_re: Add data structures for RoCE mirror support

Added data structures required for supporting mirroring on
RoCE device.

Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250822040801.776196-3-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agobnxt_en: Enhance stats context reservation logic
Saravanan Vajravel [Fri, 22 Aug 2025 04:07:52 +0000 (09:37 +0530)] 
bnxt_en: Enhance stats context reservation logic

When the firmware advertises that the device is capable of supporting
port mirroring on RoCE device, reserve one additional stat_ctx.
To support port mirroring feature, RDMA driver allocates one stat_ctx
for exclusive use in RawEth QP.

Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250822040801.776196-2-kalesh-anakkur.purayil@broadcom.com
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Enhance a log message when bnxt_re_register_netdev fails
Kalesh AP [Thu, 14 Aug 2025 11:25:55 +0000 (16:55 +0530)] 
RDMA/bnxt_re: Enhance a log message when bnxt_re_register_netdev fails

Make a error log message more user friendly.
When bnxt_re_register_netdev()() fails, the current
log does not convey much information.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250814112555.221665-10-kalesh-anakkur.purayil@broadcom.com
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Delete always true SGID table check
Kalesh AP [Thu, 14 Aug 2025 11:25:54 +0000 (16:55 +0530)] 
RDMA/bnxt_re: Delete always true SGID table check

The "sgid_tbl" inside "rdev->qplib_res" is a static memory.
Hence, the check always return true.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250814112555.221665-9-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Report udp source port for flow_label in bnxt_re_query_qp
Abhishek Mohapatra [Thu, 14 Aug 2025 11:25:53 +0000 (16:55 +0530)] 
RDMA/bnxt_re: Report udp source port for flow_label in bnxt_re_query_qp

The firmware doesn't capture the flow_label. Therefore the value
that's always returned by qplib_qp->ah.flow_label is 0 whenever
a qp is created. And as per IB spec, udp source port can be reported
for flow_label. Hence reported udp source port for flow_label in
bnxt_re_query_qp by populating the value of qplib_qp->udp_sport
into qp_attr->ah_attr.grh.flow_label.

Signed-off-by: Abhishek Mohapatra <abhishek.mohapatra@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Link: https://patch.msgid.link/20250814112555.221665-8-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: RoCE related hardware counters update
Vasuthevan Maheswaran [Thu, 14 Aug 2025 11:25:52 +0000 (16:55 +0530)] 
RDMA/bnxt_re: RoCE related hardware counters update

Support for new hardware counters added, and existing hardware
counters have been modified according to the design documents
for compatibility with open-source monitoring agents.

Signed-off-by: Vasuthevan Maheswaran <vasuthevan.maheswaran@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250814112555.221665-7-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Optimize bnxt_qplib_get_dev_attr function
Damodharam Ammepalli [Thu, 14 Aug 2025 11:25:51 +0000 (16:55 +0530)] 
RDMA/bnxt_re: Optimize bnxt_qplib_get_dev_attr function

Optimize bnxt_qplib_get_dev_attr() by separating out query_version which
uses creq notification method to host. Due to serialization of cmdq by
firmware, expected latency in response to heavy multi-threaded rdma
applications might be observed.

This patch separates the version_query logic out of device attribute
query and called only during rdma driver init.

Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250814112555.221665-6-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: RoCE Driver Dynamic Debug for HWRM's
Chenna Arnoori [Thu, 14 Aug 2025 11:25:49 +0000 (16:55 +0530)] 
RDMA/bnxt_re: RoCE Driver Dynamic Debug for HWRM's

Add Linux kernel dynamic debug prints to ROCE HWRM's.
Dumping request and response buffers for the ROCE HWRM's using
print_hex_dump_bytes() to be part of kernel dynmic debug.

Signed-off-by: Chenna Arnoori <chenna.arnoori@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Link: https://patch.msgid.link/20250814112555.221665-4-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
5 weeks agoRDMA/bnxt_re: Show srq_limit in fill_res_srq_entry hook
Kashyap Desai [Thu, 14 Aug 2025 11:25:48 +0000 (16:55 +0530)] 
RDMA/bnxt_re: Show srq_limit in fill_res_srq_entry hook

Added srq_limit in rdma show resource srq hook.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250814112555.221665-3-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
7 weeks agoRDMA/erdma: Use vcalloc() instead of vzalloc()
Qianfeng Rong [Thu, 21 Aug 2025 07:22:09 +0000 (15:22 +0800)] 
RDMA/erdma: Use vcalloc() instead of vzalloc()

Replace vzalloc() with vcalloc() in vmalloc_to_dma_addrs().  As noted in
the kernel documentation [1], open-coded multiplication in allocator
arguments is discouraged because it can lead to integer overflow.

Use vcalloc() to gain built-in overflow protection, making memory
allocation safer when calculating allocation size compared to explicit
multiplication.

[1]: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments

Link: https://patch.msgid.link/r/20250821072209.510348-1-rongqianfeng@vivo.com
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
7 weeks agoRDMA/mlx5: Fix vport loopback forcing for MPV device
Patrisious Haddad [Wed, 13 Aug 2025 12:41:19 +0000 (15:41 +0300)] 
RDMA/mlx5: Fix vport loopback forcing for MPV device

Previously loopback for MPV was supposed to be permanently enabled,
however other driver flows were able to over-ride that configuration and
disable it.

Add force_lb parameter that indicates that loopback should always be
enabled which prevents all other driver flows from disabling it.

Fixes: a9a9e68954f2 ("RDMA/mlx5: Fix vport loopback for MPV device")
Link: https://patch.msgid.link/r/cfc6b1f0f99f8100b087483cc14da6025317f901.1755088808.git.leon@kernel.org
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
7 weeks agoRDMA/mlx5: Better estimate max_qp_wr to reflect WQE count
Or Har-Toov [Wed, 13 Aug 2025 12:39:56 +0000 (15:39 +0300)] 
RDMA/mlx5: Better estimate max_qp_wr to reflect WQE count

The mlx5 driver currently derives max_qp_wr directly from the
log_max_qp_sz HCA capability:

    props->max_qp_wr = 1 << MLX5_CAP_GEN(mdev, log_max_qp_sz);

However, this value represents the number of WQEs in units of Basic
Blocks (see MLX5_SEND_WQE_BB), not actual number of WQEs.  Since the size
of a WQE can vary depending on transport type and features (e.g., atomic
operations, UMR, LSO), the actual number of WQEs can be significantly
smaller than the WQEBB count suggests.

This patch introduces a conservative estimation of the worst-case WQE size
— considering largest segments possible with 1 SGE and no inline data or
special features. It uses this to derive a more accurate max_qp_wr value.

Fixes: 938fe83c8dcb ("net/mlx5_core: New device capabilities handling")
Link: https://patch.msgid.link/r/7d992c9831c997ed5c33d30973406dc2dcaf5e89.1755088725.git.leon@kernel.org
Reported-by: Chuck Lever <cel@kernel.org>
Closes: https://lore.kernel.org/all/20250506142202.GJ2260621@ziepe.ca/
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
7 weeks agoRDMA/mlx5: Enable Data-Direct with Relaxed Ordering
Yishai Hadas [Wed, 13 Aug 2025 12:36:01 +0000 (15:36 +0300)] 
RDMA/mlx5: Enable Data-Direct with Relaxed Ordering

Relaxed Ordering can improve performance in certain scenarios.

Enable it in the Data-Direct use case as well.

Link: https://patch.msgid.link/r/1221dcdda8061ba5f6bc3519044083c7438b257e.1755088503.git.leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Gal Shalom <galshalom@Nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
7 weeks agoRDMA/efa: Extend admin timeout error print
Michael Margolin [Thu, 3 Jul 2025 18:23:14 +0000 (18:23 +0000)] 
RDMA/efa: Extend admin timeout error print

Add command id to the printed message for additional debug information.

Link: https://patch.msgid.link/r/20250703182314.16442-1-mrgolin@amazon.com
Reviewed-by: Yonatan Nachum <ynachum@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2 months agoIB/hfi1: Use for_each_online_cpu() instead of for_each_cpu()
Fushuai Wang [Mon, 11 Aug 2025 06:25:34 +0000 (14:25 +0800)] 
IB/hfi1: Use for_each_online_cpu() instead of for_each_cpu()

Replace the opencoded for_each_cpu(cpu, cpu_online_mask) loop with the
more readable and equivalent for_each_online_cpu(cpu) macro.

Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Link: https://patch.msgid.link/20250811062534.1041-1-wangfushuai@baidu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2 months agoRDMA/mana_ib: Drain send wrs of GSI QP
Konstantin Taranov [Tue, 29 Jul 2025 09:00:18 +0000 (02:00 -0700)] 
RDMA/mana_ib: Drain send wrs of GSI QP

Drain send WRs of the GSI QP on device removal.

In rare servicing scenarios, the hardware may delete the
state of the GSI QP, preventing it from generating CQEs
for pending send WRs. Since WRs submitted to the GSI QP
hold CM resources, the device cannot be removed until
those WRs are completed. This patch marks all pending
send WRs as failed, allowing the GSI QP to release the CM
resources and enabling safe device removal.

Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/1753779618-23629-1-git-send-email-kotaranov@linux.microsoft.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2 months agoRDMA/erdma: Use dma_map_page to map scatter MTT buffer
Boshi Yu [Fri, 25 Jul 2025 05:53:54 +0000 (13:53 +0800)] 
RDMA/erdma: Use dma_map_page to map scatter MTT buffer

Each high-level indirect MTT entry is assumed to point to exactly one page
of the low-level MTT buffer, but dma_map_sg may merge contiguous physical
pages when mapping. To avoid extra overhead from splitting merged regions,
use dma_map_page to map the scatter MTT buffer page by page.

Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com>
Link: https://patch.msgid.link/20250725055410.67520-2-boshiyu@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2 months agoRDMA/ucma: Support write an event into a CM
Mark Zhang [Mon, 30 Jun 2025 10:52:35 +0000 (13:52 +0300)] 
RDMA/ucma: Support write an event into a CM

Enable user-space to inject an event into a CM through it's event
channel. Two new events are added and supported: RDMA_CM_EVENT_USER and
RDMA_CM_EVENT_INTERNAL. With these 2 events a new event parameter "arg"
is supported, which is passed from sender to receiver transparently.

With this feature an application is able to write an event into a CM
channel with a new user-space rdmacm API. For example thread T1 could
write an event with the API:
    rdma_write_cm_event(cm_id, RDMA_CM_EVENT_USER, status, arg);
and thread T2 could receive the event with rdma_get_cm_event().

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Link: https://patch.msgid.link/fdf49d0b17a45933c5d8c1d90605c9447d9a3c73.1751279794.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2 months agoRDMA/ucma: Support query resolved service records
Mark Zhang [Mon, 30 Jun 2025 10:52:34 +0000 (13:52 +0300)] 
RDMA/ucma: Support query resolved service records

Enable user-space to query resolved service records through a ucma
command when a RDMA_CM_EVENT_ADDRINFO_RESOLVED event is received.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Link: https://patch.msgid.link/1090ee7c00c3f8058c4f9e7557de983504a16715.1751279794.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2 months agoRDMA/cma: Support IB service record resolution
Mark Zhang [Mon, 30 Jun 2025 10:52:33 +0000 (13:52 +0300)] 
RDMA/cma: Support IB service record resolution

Add new UCMA command and the corresponding CMA implementation. Userspace
can send this command to request service resolution based on service
name or ID.

On a successful resolution, one or multiple service records are
returned, the first one will be used as destination address by default.

Two new CM events are added and returned to caller accordingly:
  - RDMA_CM_EVENT_ADDRINFO_RESOLVED: Resolve succeeded;
  - RDMA_CM_EVENT_ADDRINFO_ERROR:  Resolve failed.

Internally two new CM states are added:
  - RDMA_CM_ADDRINFO_QUERY: CM is in the process of IB service
    resolution;
  - RDMA_CM_ADDRINFO_RESOLVED: CM has finished the resolve process.

With these new states, beside existing state transfer processes, 2 new
processes are supported:
 1. The default address is used:
    RDMA_CM_ADDR_BOUND ->
      RDMA_CM_ADDRINFO_QUERY ->
        RDMA_CM_ADDRINFO_RESOLVED ->
          RDMA_CM_ROUTE_QUERY

 2. To use a different address:
    RDMA_CM_ADDR_BOUND ->
      RDMA_CM_ADDRINFO_QUERY->
        RDMA_CM_ADDRINFO_RESOLVED ->
          RDMA_CM_ADDR_QUERY ->
            RDMA_CM_ADDR_RESOLVED ->
              RDMA_CM_ROUTE_QUERY

In the 2nd case, resolve_addrinfo returns multiple records, a user
could call rdma_resolve_addr() with the one that is not the first.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Link: https://patch.msgid.link/b6e82ad75522a13b5efe4ff86da0e465aab04cc2.1751279794.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2 months agoRDMA/sa_query: Support IB service records resolution
Mark Zhang [Mon, 30 Jun 2025 10:52:32 +0000 (13:52 +0300)] 
RDMA/sa_query: Support IB service records resolution

Add an SA query API ib_sa_service_rec_get() to support building and
sending SA query MADs that ask for service records with a specific
name or ID, and receiving and parsing responses from the SM.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Link: https://patch.msgid.link/9af6c82f3a3a9d975115a33235fb4ffc7c8edb21.1751279793.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2 months agoRDMA/sa_query: Add RMPP support for SA queries
Mark Zhang [Mon, 30 Jun 2025 10:52:31 +0000 (13:52 +0300)] 
RDMA/sa_query: Add RMPP support for SA queries

Register GSI mad agent with RMPP support and add rmpp_callback for
SA queries. This is needed for querying more than one service record
in one query.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Link: https://patch.msgid.link/81dbcb48682e1838dc40f381cdcc0dc63f25f0f1.1751279793.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2 months agoLinux 6.17-rc1 v6.17-rc1
Linus Torvalds [Sun, 10 Aug 2025 16:41:16 +0000 (19:41 +0300)] 
Linux 6.17-rc1

2 months agoMerge tag 'turbostat-2025.09.09' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 10 Aug 2025 06:02:36 +0000 (09:02 +0300)] 
Merge tag 'turbostat-2025.09.09' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:
 "tools/power turbostat: version 2025.09.09

   - Probe and display L3 Cache topology

   - Add ability to average an added counter (useful for pre-integrated
     "counters", such as Watts)

   - Break the limit of 64 built-in counters

   - Assorted bug fixes and minor feature tweaks"

* tag 'turbostat-2025.09.09' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: version 2025.09.09
  tools/power turbostat: Handle non-root legacy-uncore sysfs permissions
  tools/power turbostat: standardize PER_THREAD_PARAMS
  tools/power turbostat: Fix DMR support
  tools/power turbostat: add format "average" for external attributes
  tools/power turbostat: delete GET_PKG()
  tools/power turbostat: probe and display L3 cache topology
  tools/power turbostat: Support more than 64 built-in-counters
  tools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columns
  tools/power turbostat: Fix bogus SysWatt for forked program
  tools/power turbostat: Handle cap_get_proc() ENOSYS
  tools/power turbostat: Fix build with musl
  tools/power turbostat: verify arguments to params --show and --hide
  tools/power turbostat: regression fix: --show C1E%

2 months agoMerge tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 10 Aug 2025 05:51:37 +0000 (08:51 +0300)] 
Merge tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull smp fixes from Borislav Petkov:

 - Remove an obsolete comment and fix spelling

* tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu: Remove obsolete comment from takedown_cpu()
  smp: Fix spelling in on_each_cpu_cond_mask()'s doc-comment

2 months agoMerge tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 10 Aug 2025 05:46:47 +0000 (08:46 +0300)] 
Merge tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Fix a wrong ioremap size in mvebu-gicp

 - Remove yet another compile-test case for a driver which needs an
   additional dependency

 - Fix a lock inversion scenario in the IRQ unit test suite

 - Remove an impossible flag situation in gic-v5

 - Do not iounmap resources in gic-v5 which are managed by devm

 - Make sure stale, left-over interrupts in mvebu-gicp are cleared on
   driver init

 - Fix a reference counting mishap in msi-lib

 - Fix a dereference-before-null-ptr-check case in the riscv-imsic
   irqchip driver

* tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mvebu-gicp: Use resource_size() for ioremap()
  irqchip: Build IMX_MU_MSI only on ARM
  genirq/test: Resolve irq lock inversion warnings
  irqchip/gic-v5: Remove IRQD_RESEND_WHEN_IN_PROGRESS for ITS IRQs
  irqchip/gic-v5: iwb: Fix iounmap probe failure path
  irqchip/mvebu-gicp: Clear pending interrupts on init
  irqchip/msi-lib: Fix fwnode refcount in msi_lib_irq_domain_select()
  irqchip/riscv-imsic: Don't dereference before NULL pointer check

2 months agoMerge tag 'x86_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 10 Aug 2025 05:15:32 +0000 (08:15 +0300)] 
Merge tag 'x86_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Fix an interrupt vector setup race which leads to a non-functioning
   device

 - Add new Intel CPU models *and* a family: 0x12. Finally. Yippie! :-)

* tag 'x86_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Plug vector setup race
  x86/cpu: Add new Intel CPU model numbers for Wildcatlake and Novalake

2 months agoMerge tag 'locking_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 10 Aug 2025 05:11:39 +0000 (08:11 +0300)] 
Merge tag 'locking_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Borislav Petkov:

 - Prevent a futex hash leak due to different mm lifetimes

* tag 'locking_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Move futex cleanup to __mmdrop()

2 months agotools/power turbostat: version 2025.09.09
Len Brown [Sun, 10 Aug 2025 01:08:26 +0000 (21:08 -0400)] 
tools/power turbostat: version 2025.09.09

Probe and display L3 Cache topology
Add ability to average an added counter
(useful for pre-integrated "counters", such as Watts)
Break the limit of 64 built-in counters.
Assorted bug fixes and minor feature tweaks

Signed-off-by: Len Brown <len.brown@intel.com>
2 months agotools/power turbostat: Handle non-root legacy-uncore sysfs permissions
Len Brown [Sat, 9 Aug 2025 20:31:31 +0000 (16:31 -0400)] 
tools/power turbostat: Handle non-root legacy-uncore sysfs permissions

/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/
may be readable by all, but
/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/current_freq_khz
may be readable only by root.

Non-root turbostat users see complaints in this scenario.

Fail probe of the interface if we can't read current_freq_khz.

Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Original-patch-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2 months agotools/power turbostat: standardize PER_THREAD_PARAMS
Len Brown [Fri, 8 Aug 2025 23:30:07 +0000 (19:30 -0400)] 
tools/power turbostat: standardize PER_THREAD_PARAMS

use a macro for PER_THREAD_PARAMS to make adding one later more clear.

no functional change

Signed-off-by: Len Brown <len.brown@intel.com>
2 months agotools/power turbostat: Fix DMR support
Zhang Rui [Wed, 11 Jun 2025 06:50:26 +0000 (14:50 +0800)] 
tools/power turbostat: Fix DMR support

Together with the RAPL MSRs, there are more MSRs gone on DMR, including
PLR (Perf Limit Reasons), and IRTL (Package cstate Interrupt Response
Time Limit) MSRs. The configurable TDP info should also be retrieved
from TPMI based Intel Speed Select Technology feature.

Remove the access of these MSRs for DMR. Improve the DMR platform
feature table to make it more readable at the same time.

Fixes: 83075bd59de2 ("tools/power turbostat: Add initial support for DMR")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2 months agotools/power turbostat: add format "average" for external attributes
Michael Hebenstreit [Fri, 8 Aug 2025 19:57:53 +0000 (15:57 -0400)] 
tools/power turbostat: add format "average" for external attributes

External atributes with format "raw" are not printed in summary lines
for nodes/packages (or with option -S). The new format "average"
behaves like "raw" but also adds the summary data

Signed-off-by: Michael Hebenstreit <michael.hebenstreit@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2 months agotools/power turbostat: delete GET_PKG()
Len Brown [Tue, 22 Jul 2025 04:17:04 +0000 (00:17 -0400)] 
tools/power turbostat: delete GET_PKG()

pkg_base[pkg_id] is a simple array of structure pointers,
let the compiler treat it that way.

Signed-off-by: Len Brown <len.brown@intel.com>
2 months agotools/power turbostat: probe and display L3 cache topology
Len Brown [Tue, 15 Jul 2025 03:33:55 +0000 (23:33 -0400)] 
tools/power turbostat: probe and display L3 cache topology

Signed-off-by: Len Brown <len.brown@intel.com>
2 months agotools/power turbostat: Support more than 64 built-in-counters
Len Brown [Sat, 12 Jul 2025 20:16:56 +0000 (16:16 -0400)] 
tools/power turbostat: Support more than 64 built-in-counters

We have out-grown the ability to use a 64-bit memory location
to inventory every possible built-in counter.
Leverage the the CPU_SET(3) macros to break this barrier.

Also, break the Joules & Watts counters into two,
since we can no longer 'or' them together...

Signed-off-by: Len Brown <len.brown@intel.com>
2 months agotools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columns
Len Brown [Mon, 23 Jun 2025 20:24:25 +0000 (13:24 -0700)] 
tools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columns

Explain the meaning of the Totl%C0, Any%C0, GFX%C0, CPUGFX% columns.

Signed-off-by: Len Brown <len.brown@intel.com>