]> git.ipfire.org Git - thirdparty/kernel/stable.git/commit
IB/cm: use rwlock for MAD agent lock
authorJacob Moroni <jmoroni@google.com>
Thu, 20 Feb 2025 17:56:12 +0000 (17:56 +0000)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Fri, 27 Jun 2025 10:07:09 +0000 (11:07 +0100)
commitbbaee4242d70c7b9c1d232df8870b4d9bf3994cd
tree7668c9c45b3f893b80b07bdb13b12ee24a9a6b18
parentb0974ed82e6ad5ff246fd90a5b14f3e7be4f2924
IB/cm: use rwlock for MAD agent lock

[ Upstream commit 4dab26bed543584577b64b36aadb8b5b165bf44f ]

In workloads where there are many processes establishing connections using
RDMA CM in parallel (large scale MPI), there can be heavy contention for
mad_agent_lock in cm_alloc_msg.

This contention can occur while inside of a spin_lock_irq region, leading
to interrupts being disabled for extended durations on many
cores. Furthermore, it leads to the serialization of rdma_create_ah calls,
which has negative performance impacts for NICs which are capable of
processing multiple address handle creations in parallel.

The end result is the machine becoming unresponsive, hung task warnings,
netdev TX timeouts, etc.

Since the lock appears to be only for protection from cm_remove_one, it
can be changed to a rwlock to resolve these issues.

Reproducer:

Server:
  for i in $(seq 1 512); do
    ucmatose -c 32 -p $((i + 5000)) &
  done

Client:
  for i in $(seq 1 512); do
    ucmatose -c 32 -p $((i + 5000)) -s 10.2.0.52 &
  done

Fixes: 76039ac9095f ("IB/cm: Protect cm_dev, cm_ports and mad_agent with kref and lock")
Link: https://patch.msgid.link/r/20250220175612.2763122-1-jmoroni@google.com
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/infiniband/core/cm.c