git.ipfire.org Git - thirdparty/kernel/linux.git/log

RDMA/rxe: Fix race condition in QP timer handlers

I encontered the following warning:
WARNING: drivers/infiniband/sw/rxe/rxe_task.c:249 at rxe_sched_task+0x1c8/0x238 [rdma_rxe], CPU#0: swapper/0/0
...
  libsha1 [last unloaded: ip6_udp_tunnel]
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G         C          6.19.0-rc5-64k-v8+ #37 PREEMPT
Tainted: [C]=CRAP
Hardware name: Raspberry Pi 4 Model B Rev 1.2
Call trace:
  rxe_sched_task+0x1c8/0x238 [rdma_rxe] (P)
  retransmit_timer+0x130/0x188 [rdma_rxe]
  call_timer_fn+0x68/0x4d0
  __run_timers+0x630/0x888
...
WARNING: drivers/infiniband/sw/rxe/rxe_task.c:38 at rxe_sched_task+0x1c0/0x238 [rdma_rxe], CPU#0: swapper/0/0
...
WARNING: drivers/infiniband/sw/rxe/rxe_task.c:111 at do_work+0x488/0x5c8 [rdma_rxe], CPU#3: kworker/u17:4/93400
...
refcount_t: underflow; use-after-free.
WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x138/0x1a0, CPU#3: kworker/u17:4/93400

The issue is caused by a race condition between retransmit_timer() and
rxe_destroy_qp, leading to the Queue Pair's (QP) reference count dropping
to zero during timer handler execution.

It seems this warning is harmless because rxe_qp_do_cleanup() will flush
all pending timers and requests.

Example of flow causing the issue:

CPU0                                   CPU1
retransmit_timer() {
    spin_lock_irqsave
                           rxe_destroy_qp()
                            __rxe_cleanup()
                              __rxe_put() // qp->ref_count decrease to 0
                            rxe_qp_do_cleanup() {
    if (qp->valid) {
        rxe_sched_task() {
            WARN_ON(rxe_read(task->qp) <= 0);
        }
    }
    spin_unlock_irqrestore
}
                              spin_lock_irqsave
                              qp->valid = 0
                              spin_unlock_irqrestore
                            }

Ensure the QP's reference count is maintained and its validity is checked
within the timer callbacks by adding calls to rxe_get(qp) and corresponding
rxe_put(qp) after use.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Fixes: d94671632572 ("RDMA/rxe: Rewrite rxe_task.c")
Link: https://patch.msgid.link/20260120074437.623018-1-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/mana_ib: Add device‑memory support

Introduce a basic DM implementation that enables creating and
registering device memory, and using the associated memory keys
for networking operations.

Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/20260127082649.429018-1-kotaranov@linux.microsoft.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/mlx5: Fix memory leak in GET_DATA_DIRECT_SYSFS_PATH handler

The UVERBS_HANDLER(MLX5_IB_METHOD_GET_DATA_DIRECT_SYSFS_PATH) function
allocates memory for the device path using kobject_get_path(). If the
length of the device path exceeds the output buffer length, the function
returns -ENOSPC but does not free the allocated memory, resulting in a
memory leak.

Add a kfree() call to the error path to ensure the allocated memory is
properly freed.

Compile tested only. Issue found using a prototype static analysis tool
and code review.

Fixes: ec7ad6530909 ("RDMA/mlx5: Introduce GET_DATA_DIRECT_SYSFS_PATH ioctl")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Link: https://patch.msgid.link/20260126074801.627898-1-zilin@seu.edu.cn
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/uverbs: Validate wqe_size before using it in ib_uverbs_post_send

ib_uverbs_post_send() uses cmd.wqe_size from userspace without any
validation before passing it to kmalloc() and using the allocated
buffer as struct ib_uverbs_send_wr.

If a user provides a small wqe_size value (e.g., 1), kmalloc() will
succeed, but subsequent accesses to user_wr->opcode, user_wr->num_sge,
and other fields will read beyond the allocated buffer, resulting in
an out-of-bounds read from kernel heap memory. This could potentially
leak sensitive kernel information to userspace.

Additionally, providing an excessively large wqe_size can trigger a
WARNING in the memory allocation path, as reported by syzkaller.

This is inconsistent with ib_uverbs_unmarshall_recv() which properly
validates that wqe_size >= sizeof(struct ib_uverbs_recv_wr) before
proceeding.

Add the same validation for ib_uverbs_post_send() to ensure wqe_size
is at least sizeof(struct ib_uverbs_send_wr).

Fixes: c3bea3d2dc53 ("RDMA/uverbs: Use the iterator for ib_uverbs_unmarshall_recv()")
Signed-off-by: Yi Liu <liuy22@mails.tsinghua.edu.cn>
Link: https://patch.msgid.link/20260122142900.2356276-2-liuy22@mails.tsinghua.edu.cn
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/irdma: Use CQ ID for CEQE context

The hardware allows for an opaque CQ context field to be carried
over into CEQEs for the CQ. Previously, a pointer to the CQ was used
for this context. In the normal CQ destroy flow, the CEQ ring is
scrubbed to remove any preexisting CEQEs for the CQ that may not have
been processed yet so that the CQ structure is not dereferenced in the
CEQ ISR after the CQ has been freed.

However, in some cases, it is possible for a CEQE to be in flight in
HW even after the CQ destroy command completion is received, so it
could be missed during the scrub.

To protect against this, we can take advantage of the CQ table that
already exists and use the CQ ID for this context rather than a CQ
pointer.

Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260120212546.1893076-2-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/irdma: Add enum defs for reserved CQs/QPs

Added definitions for the special reserved CQs and QPs.

Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260120212546.1893076-1-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rxe: Fix iova-to-va conversion for MR page sizes != PAGE_SIZE

The current implementation incorrectly handles memory regions (MRs) with
page sizes different from the system PAGE_SIZE. The core issue is that
rxe_set_page() is called with mr->page_size step increments, but the
page_list stores individual struct page pointers, each representing
PAGE_SIZE of memory.

ib_sg_to_page() has ensured that when i>=1 either
a) SG[i-1].dma_end and SG[i].dma_addr are contiguous
or
b) SG[i-1].dma_end and SG[i].dma_addr are mr->page_size aligned.

This leads to incorrect iova-to-va conversion in scenarios:

1) page_size < PAGE_SIZE (e.g., MR: 4K, system: 64K):
   ibmr->iova = 0x181800
   sg[0]: dma_addr=0x181800, len=0x800
   sg[1]: dma_addr=0x173000, len=0x1000

   Access iova = 0x181800 + 0x810 = 0x182010
   Expected VA: 0x173010 (second SG, offset 0x10)
   Before fix:
     - index = (0x182010 >> 12) - (0x181800 >> 12) = 1
     - page_offset = 0x182010 & 0xFFF = 0x10
     - xarray[1] stores system page base 0x170000
     - Resulting VA: 0x170000 + 0x10 = 0x170010 (wrong)

2) page_size > PAGE_SIZE (e.g., MR: 64K, system: 4K):
   ibmr->iova = 0x18f800
   sg[0]: dma_addr=0x18f800, len=0x800
   sg[1]: dma_addr=0x170000, len=0x1000

   Access iova = 0x18f800 + 0x810 = 0x190010
   Expected VA: 0x170010 (second SG, offset 0x10)
   Before fix:
     - index = (0x190010 >> 16) - (0x18f800 >> 16) = 1
     - page_offset = 0x190010 & 0xFFFF = 0x10
     - xarray[1] stores system page for dma_addr 0x170000
     - Resulting VA: system page of 0x170000 + 0x10 = 0x170010 (wrong)

Yi Zhang reported a kernel panic[1] years ago related to this defect.

Solution:
1. Replace xarray with pre-allocated rxe_mr_page array for sequential
   indexing (all MR page indices are contiguous)
2. Each rxe_mr_page stores both struct page* and offset within the
   system page
3. Handle MR page_size != PAGE_SIZE relationships:
   - page_size > PAGE_SIZE: Split MR pages into multiple system pages
   - page_size <= PAGE_SIZE: Store offset within system page
4. Add boundary checks and compatibility validation

This ensures correct iova-to-va conversion regardless of MR page size
and system PAGE_SIZE relationship, while improving performance through
array-based sequential access.

Tests on 4K and 64K PAGE_SIZE hosts:
- rdma-core/pytests
  $ ./build/bin/run_tests.py  --dev eth0_rxe
- blktest:
  $ TIMEOUT=30 QUICK_RUN=1 USE_RXE=1 NVMET_TRTYPES=rdma ./check nvme srp rnbd

[1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/

Fixes: 592627ccbdff ("RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://patch.msgid.link/20260116032753.2574363-1-lizhijian@fujitsu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rxe: Remove unused page_offset member

In rxe_map_mr_sg(), the `page_offset` member of the `rxe_mr` struct
was initialized based on `ibmr.iova`, which will be updated inside
ib_sg_to_pages() later.

Consequently, the value assigned to `page_offset` was incorrect. However,
since `page_offset` was never utilized throughout the code, it can be safely
removed to clean up the codebase and avoid future confusion.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://patch.msgid.link/20260116032833.2574627-1-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

IB/mlx5: Fix port speed query for representors

When querying speed information for a representor in switchdev mode,
the code previously used the first device in the eswitch, which may not
match the device that actually owns the representor. In setups such as
multi-port eswitch or LAG, this led to incorrect port attributes being
reported.

Fix this by retrieving the correct core device from the representor's
eswitch before querying its port attributes.

Fixes: 27f9e0ccb6da ("net/mlx5: Lag, Add single RDMA device in multiport mode")
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260115-port-speed-query-fix-v2-1-3bde6a3c78e7@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/mlx5: Fix UMR hang in LAG error state unload

During firmware reset in LAG mode, a race condition causes the driver
to hang indefinitely while waiting for UMR completion during device
unload. See [1].

In LAG mode the bond device is only registered on the master, so it
never sees sys_error events from the slave.
During firmware reset this causes UMR waits to hang forever on unload
as the slave is dead but the master hasn't entered error state yet, so
UMR posts succeed but completions never arrive.

Fix this by adding a sys_error notifier that gets registered before
MLX5_IB_STAGE_IB_REG and stays alive until after ib_unregister_device().
This ensures error events reach the bond device throughout teardown.

[1]
Call Trace:
__schedule+0x2bd/0x760
schedule+0x37/0xa0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.isra.6+0x2b5/0x4a0
__mlx5_ib_dereg_mr+0x606/0x870 [mlx5_ib]
? __xa_erase+0x4a/0xa0
? _cond_resched+0x15/0x30
? wait_for_completion+0x31/0x100
ib_dereg_mr_user+0x48/0xc0 [ib_core]
? rdmacg_uncharge_hierarchy+0xa0/0x100
destroy_hw_idr_uobject+0x20/0x50 [ib_uverbs]
uverbs_destroy_uobject+0x37/0x150 [ib_uverbs]
__uverbs_cleanup_ufile+0xda/0x140 [ib_uverbs]
uverbs_destroy_ufile_hw+0x3a/0xf0 [ib_uverbs]
ib_uverbs_remove_one+0xc3/0x140 [ib_uverbs]
remove_client_context+0x8b/0xd0 [ib_core]
disable_device+0x8c/0x130 [ib_core]
__ib_unregister_device+0x10d/0x180 [ib_core]
ib_unregister_device+0x21/0x30 [ib_core]
__mlx5_ib_remove+0x1e4/0x1f0 [mlx5_ib]
auxiliary_bus_remove+0x1e/0x30
device_release_driver_internal+0x103/0x1f0
bus_remove_device+0xf7/0x170
device_del+0x181/0x410
mlx5_rescan_drivers_locked.part.10+0xa9/0x1d0 [mlx5_core]
mlx5_disable_lag+0x253/0x260 [mlx5_core]
mlx5_lag_disable_change+0x89/0xc0 [mlx5_core]
mlx5_eswitch_disable+0x67/0xa0 [mlx5_core]
mlx5_unload+0x15/0xd0 [mlx5_core]
mlx5_unload_one+0x71/0xc0 [mlx5_core]
mlx5_sync_reset_reload_work+0x83/0x100 [mlx5_core]
process_one_work+0x1a7/0x360
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x116/0x130
? kthread_flush_work_fn+0x10/0x10
ret_from_fork+0x22/0x40

Fixes: ede132a5cf55 ("RDMA/mlx5: Move events notifier registration to be after device registration")
Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260113-umr-hand-lag-fix-v1-1-3dc476e00cd9@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/mana_ib: Take CQ type from the device type

Get CQ type from the used gdma device. The MANA_IB_CREATE_RNIC_CQ
flag is ignored. It was used in older kernel versions where
the mana_ib was shared between ethernet and rnic.

Fixes: d4293f96ce0b ("RDMA/mana_ib: unify mana_ib functions to support any gdma device")
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/20260115093625.177306-1-kotaranov@linux.microsoft.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/iwcm: Fix workqueue list corruption by removing work_list

The commit e1168f0 ("RDMA/iwcm: Simplify cm_event_handler()")
changed the work submission logic to unconditionally call
queue_work() with the expectation that queue_work() would
have no effect if work was already pending. The problem is
that a free list of struct iwcm_work is used (for which
struct work_struct is embedded), so each call to queue_work()
is basically unique and therefore does indeed queue the work.

This causes a problem in the work handler which walks the work_list
until it's empty to process entries. This means that a single
run of the work handler could process item N+1 and release it
back to the free list while the actual workqueue entry is still
queued. It could then get reused (INIT_WORK...) and lead to
list corruption in the workqueue logic.

Fix this by just removing the work_list. The workqueue already
does this for us.

This fixes the following error that was observed when stress
testing with ucmatose on an Intel E830 in iWARP mode:

[  151.465780] list_del corruption. next->prev should be ffff9f0915c69c08, but was ffff9f0a1116be08. (next=ffff9f0a15b11c08)
[  151.466639] ------------[ cut here ]------------
[  151.466986] kernel BUG at lib/list_debug.c:67!
[  151.467349] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[  151.467753] CPU: 14 UID: 0 PID: 2306 Comm: kworker/u64:18 Not tainted 6.19.0-rc4+ #1 PREEMPT(voluntary)
[  151.468466] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  151.469192] Workqueue:  0x0 (iw_cm_wq)
[  151.469478] RIP: 0010:__list_del_entry_valid_or_report+0xf0/0x100
[  151.469942] Code: c7 58 5f 4c b2 e8 10 50 aa ff 0f 0b 48 89 ef e8 36 57 cb ff 48 8b 55 08 48 89 e9 48 89 de 48 c7 c7 a8 5f 4c b2 e8 f0 4f aa ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90
[  151.471323] RSP: 0000:ffffb15644e7bd68 EFLAGS: 00010046
[  151.471712] RAX: 000000000000006d RBX: ffff9f0915c69c08 RCX: 0000000000000027
[  151.472243] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9f0a37d9c600
[  151.472768] RBP: ffff9f0a15b11c08 R08: 0000000000000000 R09: c0000000ffff7fff
[  151.473294] R10: 0000000000000001 R11: ffffb15644e7bba8 R12: ffff9f092339ee68
[  151.473817] R13: ffff9f0900059c28 R14: ffff9f092339ee78 R15: 0000000000000000
[  151.474344] FS:  0000000000000000(0000) GS:ffff9f0a847b5000(0000) knlGS:0000000000000000
[  151.474934] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  151.475362] CR2: 0000559e233a9088 CR3: 000000020296b004 CR4: 0000000000770ef0
[  151.475895] PKRU: 55555554
[  151.476118] Call Trace:
[  151.476331]  <TASK>
[  151.476497]  move_linked_works+0x49/0xa0
[  151.476792]  __pwq_activate_work.isra.46+0x2f/0xa0
[  151.477151]  pwq_dec_nr_in_flight+0x1e0/0x2f0
[  151.477479]  process_scheduled_works+0x1c8/0x410
[  151.477823]  worker_thread+0x125/0x260
[  151.478108]  ? __pfx_worker_thread+0x10/0x10
[  151.478430]  kthread+0xfe/0x240
[  151.478671]  ? __pfx_kthread+0x10/0x10
[  151.478955]  ? __pfx_kthread+0x10/0x10
[  151.479240]  ret_from_fork+0x208/0x270
[  151.479523]  ? __pfx_kthread+0x10/0x10
[  151.479806]  ret_from_fork_asm+0x1a/0x30
[  151.480103]  </TASK>

Fixes: e1168f09b331 ("RDMA/iwcm: Simplify cm_event_handler()")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260112020006.1352438-1-jmoroni@google.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rxe: Fix double free in rxe_srq_from_init

In rxe_srq_from_init(), the queue pointer 'q' is assigned to
'srq->rq.queue' before copying the SRQ number to user space.
If copy_to_user() fails, the function calls rxe_queue_cleanup()
to free the queue, but leaves the now-invalid pointer in
'srq->rq.queue'.

The caller of rxe_srq_from_init() (rxe_create_srq) eventually
calls rxe_srq_cleanup() upon receiving the error, which triggers
a second rxe_queue_cleanup() on the same memory, leading to a
double free.

The call trace looks like this:
   kmem_cache_free+0x.../0x...
   rxe_queue_cleanup+0x1a/0x30 [rdma_rxe]
   rxe_srq_cleanup+0x42/0x60 [rdma_rxe]
   rxe_elem_release+0x31/0x70 [rdma_rxe]
   rxe_create_srq+0x12b/0x1a0 [rdma_rxe]
   ib_create_srq_user+0x9a/0x150 [ib_core]

Fix this by moving 'srq->rq.queue = q' after copy_to_user.

Fixes: aae0484e15f0 ("IB/rxe: avoid srq memory leak")
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Link: https://patch.msgid.link/20260112015412.29458-1-jiashengjiangcool@gmail.com
Reviewed-by: Zhu Yanjun <yanjun.Zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/hns: Support drain SQ and RQ

Some ULPs, e.g. rpcrdma, rely on drain_qp() to ensure all outstanding
requests are completed before releasing related memory. If drain_qp()
fails, ULPs may release memory directly, and in-flight WRs may later be
flushed after the memory is freed, potentially leading to UAF.

drain_qp() failures can happen when HW enters an error state or is
reset. Add support to drain SQ and RQ in such cases by posting a
fake WR during reset, so the driver can process all remaining WRs in
sequence and generate corresponding completions.

Always invoke comp_handler() in drain process to ensure completions
are not lost under concurrency (e.g. concurrent post_send() and
reset, or QPs created during reset). If the CQ is already processed,
cancel any already scheduled comp_handler() to avoid concurrency
issues.

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20260108113032.856306-1-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/irdma: Remove fixed 1 ms delay during AH wait loop

The AH CQP command wait loop executes in an atomic context and was
using a fixed 1 ms delay. Since many AH create commands can complete
much faster than 1 ms, use poll_timeout_us_atomic with a 1 us delay.

Also, use the timeout value indicated during the capability exchange
rather than a hard-coded value.

Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260105180550.2907858-1-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/irdma: Remove redundant dma_wmb() before writel()

A dma_wmb() is not necessary before a writel() because writel()
already has an even stronger store barrier. A dma_wmb() is only
required to order writes to consistent/DMA memory whereas the
barrier in writel() is specified to order writes to DMA memory as
well as MMIO.

Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260103172517.2088895-1-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rtrs-srv: Fix error print in process_info_req()

rtrs_srv_change_state() returns bool (true on success) therefore
there is no reason to print error when it fails as it always will
be 0.

Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Link: https://patch.msgid.link/20260107161517.56357-11-haris.iqbal@ionos.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rtrs-clt: For conn rejection use actual err number

When the connection establishment request is rejected from the server
side, then the actual error number sent back should be used.

Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Link: https://patch.msgid.link/20260107161517.56357-10-haris.iqbal@ionos.com
Reviewed-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rtrs: Extend log message when a port fails

Add HCA name and port of this HCA.
This would help with analysing and debugging the logs.

The logs would looks something like this,

rtrs_server L2516: Handling event: port error (10).
HCA name: mlx4_0, port num: 2
rtrs_client L3326: Handling event: port error (10).
HCA name: mlx4_0, port num: 1

Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com>
Link: https://patch.msgid.link/20260107161517.56357-9-haris.iqbal@ionos.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rtrs-srv: Rate-limit I/O path error logging

Excessive error logging is making it difficult to identify the root
cause of issues. Implement rate limiting to improve log clarity.

Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com>
Link: https://patch.msgid.link/20260107161517.56357-8-haris.iqbal@ionos.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rtrs-srv: Add check and closure for possible zombie paths

During several network incidents, a number of RTRS paths for a session
went through disconnect and reconnect phase. However, some of those did
not auto-reconnect successfully. Instead they failed with the following
logs,

On client,
kernel: rtrs_client L1991: <sess-name>: Connect rejected: status 28
  (consumer defined), rtrs errno -104
kernel: rtrs_client L2698: <sess-name>: init_conns() failed: err=-104
  path=gid:<gid1>@gid:<gid2> [mlx4_0:1]

On server, (log a)
kernel: ibtrs_server L1868: <>: Connection already exists: 0

When the misbehaving path was removed, and add_path was called to re-add
the path, the log on client side changed to, (log b)
kernel: rtrs_client L1991: <sess-name>: Connect rejected: status 28
  (consumer defined), rtrs errno -17

There was no log on the server side for this, which is expected since
there is no logging in that path,
if (unlikely(__is_path_w_addr_exists(srv, &cm_id->route.addr))) {
err = -EEXIST;
goto err;

Because of the following check on server side,
if (unlikely(sess->state != IBTRS_SRV_CONNECTING)) {
ibtrs_err(s, "Session in wrong state: %s\n",

.. we know that the path in (log a) was in CONNECTING state.

The above state of the path persists for as long as we leave the session
be. This means that the path is in some zombie state, probably waiting
for the info_req packet to arrive, which never does.

The changes in this commits does 2 things.

1) Add logs at places where we see the errors happening. The logs would
shed more light at the state and lifetime of such zombie paths.

2) Close such zombie sessions, only if they are in CONNECTING state, and
after an inactivity period of 30 seconds.
  i) The state check prevents closure of paths which are CONNECTED.
Also, from the above logs and code, we already know that the path could
only be on CONNECTING state, so we play safe and narrow our impact surface
area by closing only CONNECTING paths.
  ii) The inactivity period is to allow requests for other cid to finish
processing, or for any stray packets to arrive/fail.

Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com>
Link: https://patch.msgid.link/20260107161517.56357-7-haris.iqbal@ionos.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rtrs-clt: Remove unused members in rtrs_clt_io_req

Remove unused members from rtrs_clt_io_req.

Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com>
Link: https://patch.msgid.link/20260107161517.56357-6-haris.iqbal@ionos.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rtrs: Improve error logging for RDMA cm events

The member variable status in the struct rdma_cm_event is used for both
linux errors and the errors definded in rdma stack.

Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com>
Link: https://patch.msgid.link/20260107161517.56357-5-haris.iqbal@ionos.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rtrs: Add optional support for IB_MR_TYPE_SG_GAPS

Support IB_MR_TYPE_SG_GAPS, which has less limitations
than standard IB_MR_TYPE_MEM_REG, a few ULP support this.

Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com>
Link: https://patch.msgid.link/20260107161517.56357-4-haris.iqbal@ionos.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rtrs: Add error description to the logs

Print error description instead of the error number.

Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com>
Link: https://patch.msgid.link/20260107161517.56357-3-haris.iqbal@ionos.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/rtrs-srv: fix SG mapping

This fixes the following error on the server side:

RTRS server session allocation failed: -EINVAL

caused by the caller of the `ib_dma_map_sg()`, which does not expect
less mapped entries, than requested, which is in the order of things
and can be easily reproduced on the machine with enabled IOMMU.

The fix is to treat any positive number of mapped sg entries as a
successful mapping and cache DMA addresses by traversing modified
SG table.

Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality")
Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com>
Link: https://patch.msgid.link/20260107161517.56357-2-haris.iqbal@ionos.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/ocrdma: Remove unused OCRDMA_UVERBS definition

The OCRDMA_UVERBS() macro is unused, so remove it to clean up the code.

Link: https://patch.msgid.link/20260104-ib-core-misc-v1-6-00367f77f3a8@nvidia.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

RDMA/qedr: Remove unused defines

Perform basic cleanup by removing unused defines from qedr.h.

Link: https://patch.msgid.link/20260104-ib-core-misc-v1-5-00367f77f3a8@nvidia.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

RDMA/mlx5: Avoid direct access to DMA device pointer

The dma_device field is marked as internal and must not be accessed by
drivers or ULPs. Remove all direct mlx5 references to this field.

Link: https://patch.msgid.link/20260104-ib-core-misc-v1-4-00367f77f3a8@nvidia.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

RDMA/mlx5: Fix ucaps init error flow

In mlx5_ib_stage_caps_init(), if mlx5_ib_init_ucaps() fails after
mlx5_ib_init_var_table() succeeds, the VAR bitmap is leaked since
the function returns without cleanup.

Thus, cleanup the var table bitmap in case of error of initializing
ucaps before exiting, preventing the leak above.

Fixes: cf7174e8982f ("RDMA/mlx5: Create UCAP char devices for supported device capabilities")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://patch.msgid.link/20260104-ib-core-misc-v1-3-00367f77f3a8@nvidia.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/core: Avoid exporting module local functions and remove not-used ones

Some of the functions are local to the module and some are not used
starting from commit 36783dec8d79 ("RDMA/rxe: Delete deprecated module
parameters interface"). Delete and avoid exporting them.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://patch.msgid.link/20260104-ib-core-misc-v1-2-00367f77f3a8@nvidia.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/umem: Remove redundant DMABUF ops check

ib_umem_dmabuf_get_with_dma_device() is an in-kernel function and does
not require a defensive check for the .move_notify callback. All current
callers guarantee that this callback is always present.

Link: https://patch.msgid.link/20260104-ib-core-misc-v1-1-00367f77f3a8@nvidia.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

RDMA/mlx5: Implement query_port_speed callback

Implement the query_port_speed callback for mlx5 driver to support
querying effective port bandwidth.

For LAG configurations, query the aggregated speed from the LAG layer
or from the modified vport max_tx_speed.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/mlx5: Raise async event on device speed change

Raise IB_EVENT_DEVICE_SPEED_CHANGE whenever the speed of one of the
device's ports changes. Usually all ports of the device changes
together.

This ensures user applications and upper-layer software are immediately
notified when bandwidth changes, improving traffic management in dynamic
environments. This is especially useful for vports which are part of a
LAG configuration, to know if the effective speed of the LAG was
changed.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

IB/core: Add query_port_speed verb

Add new ibv_query_port_speed() verb to enable applications to query
the effective bandwidth of a port.

This verb is particularly useful when the speed is not a multiplication
of IB speed and width where width is 2^n.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

IB/core: Refactor rate_show to use ib_port_attr_to_rate()

Update sysfs rate_show() to rely on ib_port_attr_to_speed_info() for
converting IB port speed and width attributes to data rate and speed
string.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

IB/core: Add helper to convert port attributes to data rate

Introduce ib_port_attr_to_rate() to compute the data rate in 100 Mbps
units (deci-Gb/sec) from a port's active_speed and active_width
attributes. This generic helper removes duplicated speed-to-rate
calculations, which are used by sysfs and the upcoming new verb.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

IB/core: Add async event on device speed change

Add IB_EVENT_DEVICE_SPEED_CHANGE for notifying user applications on
device's ports speed changes.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

Support effective VF bandwidth query in LAG mode

Currently, mlx5 driver exposes only the parent function's speed to VFs,
providing no way to query the actual effective bandwidth in LAG and
MPESW configurations. This limitation prevents userspace and
upper-layer software from obtaining accurate bandwidth information,
which impacts traffic scheduling decisions.

This series addresses this by:

1. Adding mlx5 internal logic to calculate and propagate the effective
   aggregated LAG speed to all attached vports. The vport speeds are
   dynamically updated when LAG member link states change.

2. Extending RDMA core with a new ib_query_port_speed() verb and an
   IB_EVENT_DEVICE_SPEED_CHANGE async event. These interfaces expose
   the effective port speed to userspace, supporting speeds that are
   not expressible as IB speed * width (where width is 2^n).

This series enables userspace applications to query the effective
port speed and receive notifications on speed changes in real-time.
In LAG configurations, each mlx5 port reports the aggregated bandwidth
of all active LAG members.

Signed-off-by: Leon Romanovsky <leon@kernel.org>

net/mlx5: Add support for querying bond speed

Add mlx5_lag_query_bond_speed() to query the aggregated speed of
lag configurations with a bond device.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

net/mlx5: Handle port and vport speed change events in MPESW

Add port change event handling logic for MPESW LAG mode, ensuring
VFs are updated when the speed of LAG physical ports changes.
This triggers a speed update workflow when relevant port state changes
occur, enabling consistent and accurate reporting of VF bandwidth.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

net/mlx5: Propagate LAG effective max_tx_speed to vports

Currently, vports report only their parent's uplink speed, which in LAG
setups does not reflect the true aggregated bandwidth. This makes it
hard for upper-layer software to optimize load balancing decisions
based on accurate bandwidth information.

Fix the issue by calculating the possible maximum speed of a LAG as
the sum of speeds of all active uplinks that are part of the LAG.
Propagate this effective max speed to vports associated with the LAG
whenever a relevant event occurs, such as physical port link state
changes or LAG creation/modification.

With this change, upper-layer components receive accurate bandwidth
information corresponding to the active members of the LAG and can
make better load balancing decisions.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

net/mlx5: Add max_tx_speed and its CAP bit to IFC

Introduce the max_tx_speed field to the query and modify_vport_state
structures.

Add the esw_vport_state_max_tx_speed capability bit, indicating
the firmware support modifying the max_tx_speed field via the
MODIFY_VPORT_STATE command.

Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/hns: Notify ULP of remaining soft-WCs during reset

During a reset, software-generated WCs cannot be reported via
interrupts. This may cause the ULP to miss some WCs.

To avoid this, add check in the CQ arm process: if a hardware reset
has occurred and there are still unreported soft-WCs, notify the ULP
to handle the remaining WCs, thereby preventing any loss of completions.

Fixes: 626903e9355b ("RDMA/hns: Add support for reporting wc as software mode")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20260104064057.1582216-5-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/hns: Fix RoCEv1 failure due to DSCP

DSCP is not supported in RoCEv1, but get_dscp() is still called. If
get_dscp() returns an error, it'll eventually cause create_ah to fail
even when using RoCEv1.

Correct the return value and avoid calling get_dscp() when using
RoCEv1.

Fixes: ee20cc17e9d8 ("RDMA/hns: Support DSCP")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20260104064057.1582216-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/hns: Return actual error code instead of fixed EINVAL

query_cqc() and query_mpt() may return various error codes in
different cases. Return actual error code instead of fixed EINVAL.

Fixes: f2b070f36d1b ("RDMA/hns: Support CQ's restrack raw ops for hns driver")
Fixes: 3d67e7e236ad ("RDMA/hns: Support MR's restrack raw ops for hns driver")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20260104064057.1582216-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/hns: Fix WQ_MEM_RECLAIM warning

When sunrpc is used, if a reset triggered, our wq may lead the
following trace:

workqueue: WQ_MEM_RECLAIM xprtiod:xprt_rdma_connect_worker [rpcrdma]
is flushing !WQ_MEM_RECLAIM hns_roce_irq_workq:flush_work_handle
[hns_roce_hw_v2]
WARNING: CPU: 0 PID: 8250 at kernel/workqueue.c:2644 check_flush_dependency+0xe0/0x144
Call trace:
  check_flush_dependency+0xe0/0x144
  start_flush_work.constprop.0+0x1d0/0x2f0
  __flush_work.isra.0+0x40/0xb0
  flush_work+0x14/0x30
  hns_roce_v2_destroy_qp+0xac/0x1e0 [hns_roce_hw_v2]
  ib_destroy_qp_user+0x9c/0x2b4
  rdma_destroy_qp+0x34/0xb0
  rpcrdma_ep_destroy+0x28/0xcc [rpcrdma]
  rpcrdma_ep_put+0x74/0xb4 [rpcrdma]
  rpcrdma_xprt_disconnect+0x1d8/0x260 [rpcrdma]
  xprt_rdma_connect_worker+0xc0/0x120 [rpcrdma]
  process_one_work+0x1cc/0x4d0
  worker_thread+0x154/0x414
  kthread+0x104/0x144
  ret_from_fork+0x10/0x18

Since QP destruction frees memory, this wq should have the WQ_MEM_RECLAIM.

Fixes: ffd541d45726 ("RDMA/hns: Add the workqueue framework for flush cqe handler")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20260104064057.1582216-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

IB/cache: update gid cache on client reregister event

Some HCAs (e.g: ConnectX4) do not trigger a IB_EVENT_GID_CHANGE on
subnet prefix update from SM (PortInfo).

Since the commit d58c23c92548 ("IB/core: Only update PKEY and GID caches
on respective events"), the GID cache is updated exclusively on
IB_EVENT_GID_CHANGE. If this event is not emitted, the subnet prefix in the
IPoIB interface’s hardware address remains set to its default value
(0xfe80000000000000).

Then rdma_bind_addr() failed because it relies on hardware address to
find the port GID (subnet_prefix + port GUID).

This patch fixes this issue by updating the GID cache on
IB_EVENT_CLIENT_REREGISTER event (emitted on PortInfo::ClientReregister=1).

Fixes: d58c23c92548 ("IB/core: Only update PKEY and GID caches on respective events")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Link: https://patch.msgid.link/aVUfsO58QIDn5bGX@eaujamesFR0130
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

RDMA/hns: Introduce limit_bank mode with better performance

In limit_bank mode, QPs/CQs are restricted to using half of the banks.
HW concentrates resources on these banks, thereby improving performance
compared to the default mode.

Switch between limit_bank mode and default mode by setting the cap
flag in FW. Since the number of QPs and CQs will be halved, this mode
is suitable for scenarios where fewer QPs and CQs are required.

Signed-off-by: Lianfa Weng <wenglianfa@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20251230154911.3397584-1-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

Linux 6.19-rc3

Merge tag 'usb-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB fixes, and bunch of reverts for 6.19-rc3.
  Included in here are:

   - reverts of some typec ucsi driver changes that had a lot of
     regression reports after -rc1. Let's just revert it all for now and
     it will come back in a way that is better tested.

   - other typec bugfixes

   - usb-storage quirk fixups

   - dwc3 driver fix

   - other minor USB fixes for reported problems.

  All of these have passed 0-day testing and individual testing"

* tag 'usb-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits)
  Revert "usb: typec: ucsi: Update UCSI structure to have message in and message out fields"
  Revert "usb: typec: ucsi: Add support for message out data structure"
  Revert "usb: typec: ucsi: Enable debugfs for message_out data structure"
  Revert "usb: typec: ucsi: Add support for SET_PDOS command"
  Revert "usb: typec: ucsi: Fix null pointer dereference in ucsi_sync_control_common"
  Revert "usb: typec: ucsi: Get connector status after enable notifications"
  usb: ohci-nxp: clean up probe error labels
  usb: gadget: lpc32xx_udc: clean up probe error labels
  usb: ohci-nxp: fix device leak on probe failure
  usb: phy: isp1301: fix non-OF device reference imbalance
  usb: gadget: lpc32xx_udc: fix clock imbalance in error path
  usb: typec: ucsi: Get connector status after enable notifications
  usb: usb-storage: Maintain minimal modifications to the bcdDevice range.
  usb: dwc3: of-simple: fix clock resource leak in dwc3_of_simple_probe
  usb: typec: ucsi: Fix null pointer dereference in ucsi_sync_control_common
  USB: lpc32xx_udc: Fix error handling in probe
  usb: typec: altmodes/displayport: Drop the device reference in dp_altmode_probe()
  usb: phy: fsl-usb: Fix use-after-free in delayed work during device removal
  usb: renesas_usbhs: Fix a resource leak in usbhs_pipe_malloc()
  usb: typec: ucsi: huawei-gaokin: add DRM dependency
  ...

Merge tag 'tty-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull serial driver fixes from Greg KH:
"Here are some small serial driver fixes for some reported issues.
  Included in here are:

   - serial sysfs fwnode fix that was much reported

   - sh-sci driver fix

   - serial device init bugfix

   - 8250 bugfix

   - xilinx_uartps bugfix

  All of these have passed 0-day testing and individual testing"

* tag 'tty-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: xilinx_uartps: fix rs485 delay_rts_after_send
  serial: sh-sci: Check that the DMA cookie is valid
  serial: core: Fix serial device initialization
  serial: 8250: longson: Fix NULL vs IS_ERR() bug in probe
  serial: core: Restore sysfs fwnode information

Merge tag 'firewire-fixes-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
"A fix for PCI driver for Texas Instruments PCILyx series.

  The driver had a bug where it allocated a DMA-coherent buffer of 16 KB
  but released it using PAGE_SIZE. This disproportion was reported in
  2020, but the fix was never merged. It is finally resolved"

* tag 'firewire-fixes-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: nosy: Fix dma_free_coherent() size

Merge tag 'riscv-for-linus-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Paul Walmsley:
"Nothing exotic here; these are the cleanup and new ISA extension
  probing patches (not including CFI):

   - Add probing and userspace reporting support for the standard RISC-V
     ISA extensions Zilsd and Zclsd, which implement load/store dual
     instructions on RV32

   - Abstract the register saving code in setup_sigcontext() so it can
     be used for stateful RISC-V ISA extensions beyond the vector
     extension

   - Add the SBI extension ID and some initial data structure
     definitions for the RISC-V standard SBI debug trigger extension

   - Clean up some code slightly: change some page table functions to
     avoid atomic operations oinn !SMP and to avoid unnecessary casts to
     atomic_long_t; and use the existing RISCV_FULL_BARRIER macro in
     place of some open-coded 'fence rw,rw' instructions"

* tag 'riscv-for-linus-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Add SBI debug trigger extension and function ids
  riscv/atomic.h: use RISCV_FULL_BARRIER in _arch_atomic* function.
  riscv: hwprobe: export Zilsd and Zclsd ISA extensions
  riscv: add ISA extension parsing for Zilsd and Zclsd
  dt-bindings: riscv: add Zilsd and Zclsd extension descriptions
  riscv: mm: use xchg() on non-atomic_long_t variables, not atomic_long_xchg()
  riscv: mm: ptep_get_and_clear(): avoid atomic ops when !CONFIG_SMP
  riscv: mm: pmdp_huge_get_and_clear(): avoid atomic ops when !CONFIG_SMP
  riscv: signal: abstract header saving for setup_sigcontext

Merge tag 'powerpc-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Madhavan Srinivasan:

- Fix for kexec warning due to SMT disable or partial SMT enabled

- Handle font bitmap pointer with reloc_offset to fix boot crash

- Fix to enable cpuidle state for Power11

- Couple of misc fixes

Thanks to Aboorva Devarajan, Aditya Bodkhe, Cedar Maxwell, Christian
Zigotzky, Christophe Leroy, Christophe Leroy (CS GROUP), Finn Thain,
Gopi Krishna Menon, Guenter Roeck, Jan Stancek, Joe Lawrence, Josh
Poimboeuf, Justin M. Forbes, Madadi Vineeth Reddy, Naveen N Rao (AMD),
Nysal Jan K.A., Sachin P Bappalige, Samir M, Sourabh Jain, Srikar
Dronamraju, and Stan Johnson

* tag 'powerpc-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32: Restore disabling of interrupts at interrupt/syscall exit
  powerpc/powernv: Enable cpuidle state detection for POWER11
  powerpc: Add reloc_offset() to font bitmap pointer used for bootx_printf()
  powerpc/tools: drop `-o pipefail` in gcc check scripts
  selftests/powerpc/pmu/: Add check_extended_reg_test to .gitignore
  powerpc/kexec: Enable SMT before waking offline CPUs

Merge tag 'spi-fix-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"We've got more fixes here for the Cadence QSPI controller, this time
  fixing some issues that come up when working with slower flashes on
  some platforms plus a general race condition.

  We also add support for the Allwinner A523, this is just some new
  compatibles"

* tag 'spi-fix-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: cadence-quadspi: Improve CQSPI_SLOW_SRAM quirk if flash is slow
  spi: cadence-quadspi: Prevent lost complete() call during indirect read
  spi: sun6i: Support A523's SPI controllers
  spi: dt-bindings: sun6i: Add compatibles for A523's SPI controllers

Merge tag 'regulator-fix-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A couple of fixes from Thomas, making the UAPI headers more robustly
  correct and ensuring they are covered by checkpatch, and one from
  Andreas fixing an update for a change to the DT bindings that I missed
  was requested during bindings review in the newly added fp9931 driver"

* tag 'regulator-fix-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: fp9931: fix regulator node pointer
  regulator: Add UAPI headers to MAINTAINERS
  regulator: uapi: Use UAPI integer type

Merge tag 'drm-fixes-2025-12-27' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Post overeating fixes, only msm for this week has anything, so quiet
  as expected.

  msm:
   - GPU:
      - Fix crash on a7xx GPUs not supporting IFPC
      - Fix perfcntr use with IFPC
      - Concurrent binning fix
   - DPU:
      - Fixed DSC and SSPP fetching issues
      - Switched to scnprint instead of snprintf
      - Added missing NULL checks in pingpong code"

* tag 'drm-fixes-2025-12-27' of https://gitlab.freedesktop.org/drm/kernel: (27 commits)
  drm/msm: Replace unsafe snprintf usage with scnprintf
  drm/msm/dpu: Add missing NULL pointer check for pingpong interface
  Revert "drm/msm/dpu: Enable quad-pipe for DSC and dual-DSI case"
  Revert "drm/msm/dpu: support plane splitting in quad-pipe case"
  drm/msm: msm_iommu.c: fix all kernel-doc warnings
  drm/msm: msm_gpu.h: fix all kernel-doc warnings
  drm/msm: msm_gem_vma.c: fix all kernel-doc warnings
  drm/msm: msm_fence.h: fix all kernel-doc warnings
  drm/msm/dpu: dpu_hw_wb.h: fix all kernel-doc warnings
  drm/msm/dpu: dpu_hw_vbif.h: fix all kernel-doc warnings
  drm/msm/dpu: dpu_hw_top.h: fix all kernel-doc warnings
  drm/msm/dpu: dpu_hw_sspp.h: fix all kernel-doc warnings
  drm/msm/dpu: dpu_hw_pingpong.h: fix all kernel-doc warnings
  drm/msm/dpu: dpu_hw_merge3d.h: fix all kernel-doc warnings
  drm/msm/dpu: dpu_hw_lm.h: fix all kernel-doc warnings
  drm/msm/dpu: dpu_hw_intf.h: fix all kernel-doc warnings
  drm/msm/dpu: dpu_hw_dspp.h: fix all kernel-doc warnings
  drm/msm/dpu: dpu_hw_dsc.h: fix all kernel-doc warnings
  drm/msm/dpu: dpu_hw_cwb.h: fix all kernel-doc warnings
  drm/msm/dpu: dpu_hw_ctl.h: fix all kernel-doc warnings
  ...

Merge tag 'drm-msm-fixes-2025-12-26' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

Fixes for v6.19:

GPU:
- Fix crash on a7xx GPUs not supporting IFPC
- Fix perfcntr use with IFPC
- Concurrent binning fix

DPU:
- Fixed DSC and SSPP fetching issues
- Switched to scnprint instead of snprintf
- Added missing NULL checks in pingpong code

Also documentation fixes.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <rob.clark@oss.qualcomm.com>
Link: https://patch.msgid.link/CACSVV01jcLLChsFtmqc4VDNoQ2ic2q+d86n3wdoSUdmW6xaSdQ@mail.gmail.com

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Three HBA driver and one upper level driver (sg) fix.

  The sg change is the largest, but that results mostly from moving code
  to avoid the described race condition"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Add ufshcd_update_evt_hist() for UFS suspend error
  scsi: sg: Fix occasional bogus elapsed time that exceeds timeout
  scsi: mpi3mr: Read missing IOCFacts flag for reply queue full overflow
  scsi: scsi_debug: Fix atomic write enable module param description

Merge tag 'v6.19-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fix from Steve French:

- Fix potential memory leak

* tag 'v6.19-rc2-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix memory and information leak in smb3_reconfigure()

Merge tag 'driver-core-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core

Pull driver core fixes from Danilo Krummrich:

- Introduce DMA Rust helpers to avoid build errors when !CONFIG_HAS_DMA

- Remove unnecessary (and hence incorrect) endian conversion in the
   Rust PCI driver sample code

- Fix memory leak in the unwind path of debugfs_change_name()

- Support non-const struct software_node pointers in
   SOFTWARE_NODE_REFERENCE(), after introducing _Generic()

- Avoid NULL pointer dereference in the unwind path of
   simple_xattrs_free()

* tag 'driver-core-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  fs/kernfs: null-ptr deref in simple_xattrs_free()
  software node: Also support referencing non-constant software nodes
  debugfs: Fix memleak in debugfs_change_name().
  samples: rust: fix endianness issue in rust_driver_pci
  rust: dma: add helpers for architectures without CONFIG_HAS_DMA

Merge tag 'efi-fixes-for-v6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:
"A couple of fixes for EFI regressions introduced this cycle:

   - Make EDID handling in the EFI stub mixed mode safe

   - Ensure that efi_mm.user_ns has a sane value - this is needed now
     that EFI runtime calls are preemptible on arm64"

* tag 'efi-fixes-for-v6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  kthread: Warn if mm_struct lacks user_ns in kthread_use_mm()
  arm64: efi: Fix NULL pointer dereference by initializing user_ns
  efi/libstub: gop: Fix EDID support in mixed-mode

Merge tag 'block-6.19-20251226' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

- Fix for a signedness issue introduced in this kernel release for rnbd

- Fix up user copy references for ublk when the server exits

* tag 'block-6.19-20251226' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
block: rnbd-clt: Fix signedness bug in init_dev()
ublk: clean up user copy references on ublk server exit

Merge tag 'io_uring-6.19-20251226' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fix from Jens Axboe:
"Just a single fix for a bug that can cause a leak of the filename with
  IORING_OP_OPENAT, if direct descriptors are asked for and O_CLOEXEC
  has been set in the request flags"

* tag 'io_uring-6.19-20251226' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring: fix filename leak in __io_openat_prep()

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
"Just a bunch of fixes, mostly trivial ones in tools/virtio"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost/vsock: improve RCU read sections around vhost_vsock_get()
  tools/virtio: add device, device_driver stubs
  tools/virtio: fix up oot build
  virtio_features: make it self-contained
  tools/virtio: switch to kernel's virtio_config.h
  tools/virtio: stub might_sleep and synchronize_rcu
  tools/virtio: add struct cpumask to cpumask.h
  tools/virtio: pass KCFLAGS to module build
  tools/virtio: add ucopysize.h stub
  tools/virtio: add dev_WARN_ONCE and is_vmalloc_addr stubs
  tools/virtio: stub DMA mapping functions
  tools/virtio: add struct module forward declaration
  tools/virtio: use kernel's virtio.h
  virtio: make it self-contained
  tools/virtio: fix up compiler.h stub

Merge tag 'v6.19-rc2-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- Fix parsing of SMB1 negotiate request by adjusting offsets affected
   by the removal of the RFC1002 length field from the SMB header

- Update minimum PDU size macros for both SMB1 and SMB2

- Rename smb2_get_msg function to smb_get_msg to better reflect its
   role in handling both SMB1 and SMB2 requests

* tag 'v6.19-rc2-smb3-server-fixes' of git://git.samba.org/ksmbd:
  smb/server: fix minimum SMB2 PDU size
  smb/server: fix minimum SMB1 PDU size
  ksmbd: rename smb2_get_msg to smb_get_msg
  ksmbd: Fix to handle removal of rfc1002 header from smb_hdr

firewire: nosy: Fix dma_free_coherent() size

It looks like the buffer allocated and mapped in add_card() is done
with size RCV_BUFFER_SIZE which is 16 KB and 4KB.

Fixes: 286468210d83 ("firewire: new driver: nosy - IEEE 1394 traffic sniffer")
Co-developed-by: Thomas Fourier <fourier.thomas@gmail.com>
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Co-developed-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20251216165420.38355-2-fourier.thomas@gmail.com
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>

io_uring: fix filename leak in __io_openat_prep()

__io_openat_prep() allocates a struct filename using getname(). However,
for the condition of the file being installed in the fixed file table as
well as having O_CLOEXEC flag set, the function returns early. At that
point, the request doesn't have REQ_F_NEED_CLEANUP flag set. Due to this,
the memory for the newly allocated struct filename is not cleaned up,
causing a memory leak.

Fix this by setting the REQ_F_NEED_CLEANUP for the request just after the
successful getname() call, so that when the request is torn down, the
filename will be cleaned up, along with other resources needing cleanup.

Reported-by: syzbot+00e61c43eb5e4740438f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=00e61c43eb5e4740438f
Tested-by: syzbot+00e61c43eb5e4740438f@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Prithvi Tambewagh <activprithvi@gmail.com>
Fixes: b9445598d8c6 ("io_uring: openat directly into fixed fd table")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

kthread: Warn if mm_struct lacks user_ns in kthread_use_mm()

Add a WARN_ON_ONCE() check to detect mm_struct instances that are
missing user_ns initialization when passed to kthread_use_mm().

When a kthread adopts an mm via kthread_use_mm(), LSM hooks and
capability checks may access current->mm->user_ns for credential
validation. If user_ns is NULL, this leads to a NULL pointer
dereference crash.

This was observed with efi_mm on arm64, where commit a5baf582f4c0
("arm64/efi: Call EFI runtime services without disabling preemption")
introduced kthread_use_mm(&efi_mm), but efi_mm lacked user_ns
initialization, causing crashes during /proc access.

Adding this warning helps catch similar bugs early during development
rather than waiting for hard-to-debug NULL pointer crashes in
production.

Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

arm64: efi: Fix NULL pointer dereference by initializing user_ns

Linux 6.19-rc2 (9448598b22c5 ("Linux 6.19-rc2")) is crashing with a NULL
pointer dereference on arm64 hosts:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c8
   pc : cap_capable (security/commoncap.c:82 security/commoncap.c:128)
   Call trace:
    cap_capable (security/commoncap.c:82 security/commoncap.c:128) (P)
    security_capable (security/security.c:?)
    ns_capable_noaudit (kernel/capability.c:342 kernel/capability.c:381)
    __ptrace_may_access (./include/linux/rcupdate.h:895 kernel/ptrace.c:326)
    ptrace_may_access (kernel/ptrace.c:353)
    do_task_stat (fs/proc/array.c:467)
    proc_tgid_stat (fs/proc/array.c:673)
    proc_single_show (fs/proc/base.c:803)

I've bissected the problem to commit a5baf582f4c0 ("arm64/efi: Call EFI
runtime services without disabling preemption").

>From my analyzes, the crash occurs because efi_mm lacks a user_ns field
initialization. This was previously harmless, but commit a5baf582f4c0
("arm64/efi: Call EFI runtime services without disabling preemption")
changed the EFI runtime call path to use kthread_use_mm(&efi_mm), which
temporarily adopts efi_mm as the current mm for the calling kthread.

When a thread has an active mm, LSM hooks like cap_capable() expect
mm->user_ns to be valid for credential checks. With efi_mm.user_ns being
NULL, capability checks during possible /proc access dereference the
NULL pointer and crash.

Fix by initializing efi_mm.user_ns to &init_user_ns.

Fixes: a5baf582f4c0 ("arm64/efi: Call EFI runtime services without disabling preemption")
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

efi/libstub: gop: Fix EDID support in mixed-mode

The efi_edid_discovered_protocol and efi_edid_active_protocol have mixed
mode fields. So all their attributes should be accessed through
the efi_table_attr() helper.

Doing so fixes the upper 32 bits of the 64 bit gop_edid pointer getting
set to random values (followed by a crash at boot) when booting a x86_64
kernel on a machine with 32 bit UEFI like the Asus T100TA.

Fixes: 17029cdd8f9d ("efi/libstub: gop: Add support for reading EDID")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Merge tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
"A set of NFSD fixes that arrived just a bit late for the 6.19 merge
  window.

  Regression fixes:
   - Mark variable __maybe_unused to avoid W=1 build break

  Stable fixes:
   - NFSv4 file creation neglects setting ACL
   - Clear TIME_DELEG in the suppattr_exclcreat bitmap
   - Clear SECLABEL in the suppattr_exclcreat bitmap
   - Fix memory leak in nfsd_create_serv error paths
   - Bound check rq_pages index in inline path
   - Return 0 on success from svc_rdma_copy_inline_range
   - Use rc_pageoff for memcpy byte offset
   - Avoid NULL deref on zero length gss_token in gss_read_proxy_verf"

* tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: NFSv4 file creation neglects setting ACL
  NFSD: Clear TIME_DELEG in the suppattr_exclcreat bitmap
  NFSD: Clear SECLABEL in the suppattr_exclcreat bitmap
  nfsd: fix memory leak in nfsd_create_serv error paths
  nfsd: Mark variable __maybe_unused to avoid W=1 build break
  svcrdma: bound check rq_pages index in inline path
  svcrdma: return 0 on success from svc_rdma_copy_inline_range
  svcrdma: use rc_pageoff for memcpy byte offset
  SUNRPC: svcauth_gss: avoid NULL deref on zero length gss_token in gss_read_proxy_verf

Merge tag 'erofs-for-6.19-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fix from Gao Xiang:
"Junbeom reported that synchronous reads could hit unintended EIOs
  under memory pressure due to incorrect error propagation in
  z_erofs_decompress_queue(), where earlier physical clusters in the
  same decompression queue may be served for another readahead.

  This addresses the issue by decompressing each physical cluster
  independently as long as disk I/Os succeed, rather than being impacted
  by the error status of previous physical clusters in the same queue.

  Summary:

   - Fix unexpected EIOs under memory pressure caused by recent
     incorrect error propagation logic"

* tag 'erofs-for-6.19-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix unexpected EIO under memory pressure

cifs: Fix memory and information leak in smb3_reconfigure()

In smb3_reconfigure(), if smb3_sync_session_ctx_passwords() fails, the
function returns immediately without freeing and erasing the newly
allocated new_password and new_password2. This causes both a memory leak
and a potential information leak.

Fix this by calling kfree_sensitive() on both password buffers before
returning in this error case.

Fixes: 0f0e357902957 ("cifs: during remount, make sure passwords are in sync")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

drm/msm: Replace unsafe snprintf usage with scnprintf

The refill_buf function uses snprintf to append to a fixed-size buffer.
snprintf returns the length that would have been written, which can
exceed the remaining buffer size. If this happens, ptr advances beyond
the buffer and rem becomes negative. In the 2nd iteration, rem is
treated as a large unsigned integer, causing snprintf to write oob.

While this behavior is technically mitigated by num_perfcntrs being
locked at 5, it's still unsafe if num_perfcntrs were ever to change/a
second source was added.

Signed-off-by: Evan Lambert <veyga@veygax.dev>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/696358/
Link: https://lore.kernel.org/r/20251224124254.17920-3-veyga@veygax.dev
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

vhost/vsock: improve RCU read sections around vhost_vsock_get()

vhost_vsock_get() uses hash_for_each_possible_rcu() to find the
`vhost_vsock` associated with the `guest_cid`. hash_for_each_possible_rcu()
should only be called within an RCU read section, as mentioned in the
following comment in include/linux/rculist.h:

/**
* hlist_for_each_entry_rcu - iterate over rcu list of given type
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the hlist_node within the struct.
* @cond: optional lockdep expression if called from non-RCU protection.
*
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as hlist_add_head_rcu()
* as long as the traversal is guarded by rcu_read_lock().
*/

Currently, all calls to vhost_vsock_get() are between rcu_read_lock()
and rcu_read_unlock() except for calls in vhost_vsock_set_cid() and
vhost_vsock_reset_orphans(). In both cases, the current code is safe,
but we can make improvements to make it more robust.

About vhost_vsock_set_cid(), when building the kernel with
CONFIG_PROVE_RCU_LIST enabled, we get the following RCU warning when the
user space issues `ioctl(dev, VHOST_VSOCK_SET_GUEST_CID, ...)` :

  WARNING: suspicious RCU usage
  6.18.0-rc7 #62 Not tainted
  -----------------------------
  drivers/vhost/vsock.c:74 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by rpc-libvirtd/3443:
   #0: ffffffffc05032a8 (vhost_vsock_mutex){+.+.}-{4:4}, at: vhost_vsock_dev_ioctl+0x2ff/0x530 [vhost_vsock]

  stack backtrace:
  CPU: 2 UID: 0 PID: 3443 Comm: rpc-libvirtd Not tainted 6.18.0-rc7 #62 PREEMPT(none)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-7.fc42 06/10/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x75/0xb0
   dump_stack+0x14/0x1a
   lockdep_rcu_suspicious.cold+0x4e/0x97
   vhost_vsock_get+0x8f/0xa0 [vhost_vsock]
   vhost_vsock_dev_ioctl+0x307/0x530 [vhost_vsock]
   __x64_sys_ioctl+0x4f2/0xa00
   x64_sys_call+0xed0/0x1da0
   do_syscall_64+0x73/0xfa0
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
   ...
   </TASK>

This is not a real problem, because the vhost_vsock_get() caller, i.e.
vhost_vsock_set_cid(), holds the `vhost_vsock_mutex` used by the hash
table writers. Anyway, to prevent that warning, add lockdep_is_held()
condition to hash_for_each_possible_rcu() to verify that either the
caller is in an RCU read section or `vhost_vsock_mutex` is held when
CONFIG_PROVE_RCU_LIST is enabled; and also clarify the comment for
vhost_vsock_get() to better describe the locking requirements and the
scope of the returned pointer validity.

About vhost_vsock_reset_orphans(), currently this function is only
called via vsock_for_each_connected_socket(), which holds the
`vsock_table_lock` spinlock (which is also an RCU read-side critical
section). However, add an explicit RCU read lock there to make the code
more robust and explicit about the RCU requirements, and to prevent
issues if the calling context changes in the future or if
vhost_vsock_reset_orphans() is called from other contexts.

Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack callers")
Cc: stefanha@redhat.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20251126133826.142496-1-sgarzare@redhat.com>
Message-ID: <20251126210313.GA499503@fedora>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: add device, device_driver stubs

Add stubs needed by virtio.h

Message-ID: <0fabf13f6ea812ebc73b1c919fb17d4dec1545db.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: fix up oot build

oot build tends to help uncover bugs so it's worth keeping around,
as long as it's low effort.
add stubs for a couple of macros virtio gained recently,
and disable vdpa in the test build.

Message-ID: <33968faa7994b86d1f78057358a50b8f460c7a23.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio_features: make it self-contained

virtio_features.h uses WARN_ON_ONCE and memset so it must
include linux/bug.h and linux/string.h

Message-ID: <579986aa9b8d023844990d2a0e267382f8ad85d5.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: switch to kernel's virtio_config.h

Drops stubs in virtio_config.h, use the kernel's version instead - we
are now activly developing it, so the stub became too hard to maintain.

Message-ID: <8e5c85dc8aad001f161f7e2d8799ffbccfc31381.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: stub might_sleep and synchronize_rcu

Add might_sleep() and synchronize_rcu() stubs needed by virtio_config.h.

might_sleep() is a no-op, synchronize_rcu doesn't work but we don't
need it to.

Created using Cursor CLI.

Message-ID: <5557e026335d808acd7b890693ee1382e73dd33a.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: add struct cpumask to cpumask.h

Add struct cpumask stub used by virtio_config.h.

Created using Cursor CLI.

Message-ID: <eacf56399ba220513ebcd610f4a5115dc768db80.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: pass KCFLAGS to module build

Update the mod target to pass KCFLAGS with the in-tree vhost driver
include path. This way vhost_test can find vhost headers.

Created using Cursor CLI.

Message-ID: <5473e5a5dfd2fcd261a778f2017cac669c031f23.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: add ucopysize.h stub

Add ucopysize.h with stub implementations of check_object_size,
copy_overflow, and check_copy_size.

Created using Cursor CLI.

Message-ID: <5046df90002bb744609248404b81d33b559fe813.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: add dev_WARN_ONCE and is_vmalloc_addr stubs

Add dev_WARN_ONCE and is_vmalloc_addr stubs needed by virtio_ring.c.
is_vmalloc_addr stub always returns false - that's fine since it's
merely a sanity check.

Created using Cursor CLI.

Message-ID: <749e7a03b7cd56baf50a27efc3b05e50cf8f36b6.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: stub DMA mapping functions

Add dma_map_page_attrs and dma_unmap_page_attrs stubs.
Follow the same pattern as existing DMA mapping stubs.

Created using Cursor CLI.

Message-ID: <3512df1fe0e2129ea493434a21c940c50381cc93.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: add struct module forward declaration

Declarate struct module in our linux/module.h stub.

Created using Cursor CLI.

Message-ID: <c01b8d24159664cc8c49354088efa342ae9e7321.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: use kernel's virtio.h

Replace virtio stubs with an include of the kernel header.

Message-ID: <33daf1033fc447eb8e3e54d21013ccfd99550e37.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio: make it self-contained

virtio.h uses struct module, add a forward declaration to
make the header self-contained.

Message-ID: <9171b5cac60793eb59ab044c96ee038bf1363bee.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tools/virtio: fix up compiler.h stub

Add #undef __user before and after including compiler_types.h to avoid
redefinition warnings when compiling with system headers that also
define __user. This allows tools/virtio to build without warnings.

Additionally, stub out __must_check

Created using Cursor CLI.

Message-ID: <56424ce95c72cb4957070a7cd3c3c40ad5addaee.1764873799.git.mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

regulator: fp9931: fix regulator node pointer

Sync the driver with the binding. During review process a regulators
subnode was requested but neither driver nor test setup was updated.

Fixes: 12d821bd13d4 ("regulator: Add FP9931/JD9930 driver")
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Link: https://patch.msgid.link/20251223-fp9931-fix-v1-1-b19b4c1e7056@kemnade.info
Signed-off-by: Mark Brown <broonie@kernel.org>

drm/msm/dpu: Add missing NULL pointer check for pingpong interface

It is checked almost always in dpu_encoder_phys_wb_setup_ctl(), but in a
single place the check is missing.
Also use convenient locals instead of phys_enc->* where available.

Cc: stable@vger.kernel.org
Fixes: d7d0e73f7de33 ("drm/msm/dpu: introduce the dpu_encoder_phys_* for writeback")
Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/693860/
Link: https://lore.kernel.org/r/20251211093630.171014-1-kniv@yandex-team.ru
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

Revert "drm/msm/dpu: Enable quad-pipe for DSC and dual-DSI case"

This reverts commit d7ec9366b15cd04508fa015cb94d546b1c01edfb.

The dual-DSI dual-DSC scenario seems to be broken by this commit.

Reported-by: Marijn Suijten <marijn.suijten@somainline.org>
Closes: https://lore.kernel.org/r/aUR2b3FOSisTfDFj@SoMainline.org
Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com>
Fixes: d7ec9366b15c ("drm/msm/dpu: Enable quad-pipe for DSC and dual-DSI case")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/695550/
Link: https://lore.kernel.org/r/20251219-drm-msm-dpu-revert-quad-pipe-broken-v1-2-654b46505f84@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

Revert "drm/msm/dpu: support plane splitting in quad-pipe case"

This reverts commit 5978864e34b66bdae4d7613834c03dd5d0a0c891.

At least on Hamoa based devices, there are IOMMU faults:

arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x00000000, fsynr=0x3d0023, cbfrsynra=0x1c00, cb=13
arm-smmu 15000000.iommu: FSR = 00000402 [Format=2 TF], SID=0x1c00
arm-smmu 15000000.iommu: FSYNR0 = 003d0023 [S1CBNDX=61 PNU PLVL=3]

While on some of these devices, there are also all sorts of artifacts on eDP.

Reverting this fixes these issues.

Closes: https://lore.kernel.org/r/z75wnahrp7lrl5yhfdysr3np3qrs6xti2i4otkng4ex3blfgrx@xyiucge3xykb/
Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com>
Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org>
Fixes: 5978864e34b6 ("drm/msm/dpu: support plane splitting in quad-pipe case")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/695549/
Link: https://lore.kernel.org/r/20251219-drm-msm-dpu-revert-quad-pipe-broken-v1-1-654b46505f84@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

drm/msm: msm_iommu.c: fix all kernel-doc warnings

Correct or add kernel-doc comments to eliminate all warnings:

Warning: ../drivers/gpu/drm/msm/msm_iommu.c:381 expecting prototype for
alloc_pt(). Prototype was for msm_iommu_pagetable_alloc_pt() instead
Warning: ../drivers/gpu/drm/msm/msm_iommu.c:426 expecting prototype for
free_pt(). Prototype was for msm_iommu_pagetable_free_pt() instead

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/695675/
Link: https://lore.kernel.org/r/20251219184638.1813181-20-rdunlap@infradead.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

drm/msm: msm_gpu.h: fix all kernel-doc warnings

Correct or add kernel-doc comments to eliminate all warnings:

Warning: drivers/gpu/drm/msm/msm_gpu.h:119 Incorrect use of kernel-doc
format: * devfreq: devfreq instance
Warning: drivers/gpu/drm/msm/msm_gpu.h:125 Incorrect use of kernel-doc
format: * idle_freq:
Warning: drivers/gpu/drm/msm/msm_gpu.h:136 Incorrect use of kernel-doc
format: * boost_constraint:
Warning: drivers/gpu/drm/msm/msm_gpu.h:144 Incorrect use of kernel-doc
format: * busy_cycles: Last busy counter value, for calculating elapsed
busy
Warning: drivers/gpu/drm/msm/msm_gpu.h:156 Incorrect use of kernel-doc
format: * idle_work:
Warning: drivers/gpu/drm/msm/msm_gpu.h:163 Incorrect use of kernel-doc
format: * boost_work:
Warning: drivers/gpu/drm/msm/msm_gpu.h:170 struct member 'devfreq' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:170 struct member 'boost_freq' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'devfreq' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'lock' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'governor' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'we are
continuing to sample busyness and * adjust frequency while the GPU is
idle' not described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'boost_freq' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'busy_cycles'
not described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'time' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'idle_time' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'idle_work' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'boost_work' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'suspended' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:472 No description found for
return value of 'msm_context_is_vmbind'
Warning: drivers/gpu/drm/msm/msm_gpu.h:476 struct member 'ref' not
described in 'msm_context'
Warning: drivers/gpu/drm/msm/msm_gpu.h:476 struct member 'elapsed_ns' not
described in 'msm_context'
Warning: drivers/gpu/drm/msm/msm_gpu.h:492 expecting prototype for
msm_context_is_vm_bind(). Prototype was for msm_context_is_vmbind()
instead
Warning: drivers/gpu/drm/msm/msm_gpu.h:523 No description found for
return value of 'msm_gpu_convert_priority'
Warning: drivers/gpu/drm/msm/msm_gpu.h:583 expecting prototype for
struct msm_gpu_submitqueues. Prototype was for struct msm_gpu_submitqueue
instead

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/695671/
Link: https://lore.kernel.org/r/20251219184638.1813181-19-rdunlap@infradead.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

drm/msm: msm_gem_vma.c: fix all kernel-doc warnings

Correct or add kernel-doc comments to eliminate all warnings:

Warning: ../drivers/gpu/drm/msm/msm_gem_vma.c:96 expecting prototype for
struct msm_vma_op. Prototype was for struct msm_vm_op instead

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/695679/
Link: https://lore.kernel.org/r/20251219184638.1813181-18-rdunlap@infradead.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

drm/msm: msm_fence.h: fix all kernel-doc warnings

Correct or add kernel-doc comments to eliminate all warnings:

Warning: drivers/gpu/drm/msm/msm_fence.h:27 Incorrect use of kernel-doc
format: * last_fence:
Warning: drivers/gpu/drm/msm/msm_fence.h:36 Incorrect use of kernel-doc
format: * completed_fence:
Warning: drivers/gpu/drm/msm/msm_fence.h:44 Incorrect use of kernel-doc
format: * fenceptr:
Warning: drivers/gpu/drm/msm/msm_fence.h:65 Incorrect use of kernel-doc
format: * next_deadline_fence:
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'dev' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'name' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'context' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'index' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'fence' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'there is no
remaining pending work */ uint32_t last_fence' not described in
'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'updated from the CPU after interrupt * from GPU */ uint32_t completed_fence' not described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'fenceptr' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'spinlock' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'next_deadline'
not described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member
'next_deadline_fence' not described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'deadline_timer'
not described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'deadline_work'
not described in 'msm_fence_context'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/695667/
Link: https://lore.kernel.org/r/20251219184638.1813181-17-rdunlap@infradead.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

drm/msm/dpu: dpu_hw_wb.h: fix all kernel-doc warnings

Correct or add kernel-doc comments to eliminate all warnings:

Warning: drivers/gpu/drm/msm/disp/dpu1/dpu_hw_wb.h:24 Cannot find
identifier on line: *
Warning: drivers/gpu/drm/msm/disp/dpu1/dpu_hw_wb.h:57 struct member
'setup_roi' not described in 'dpu_hw_wb_ops'
Warning: drivers/gpu/drm/msm/disp/dpu1/dpu_hw_wb.h:75 struct member
'caps' not described in 'dpu_hw_wb'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/695672/
Link: https://lore.kernel.org/r/20251219184638.1813181-16-rdunlap@infradead.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>