Li Zhijian [Tue, 20 Jan 2026 07:44:37 +0000 (15:44 +0800)]
RDMA/rxe: Fix race condition in QP timer handlers
I encontered the following warning:
WARNING: drivers/infiniband/sw/rxe/rxe_task.c:249 at rxe_sched_task+0x1c8/0x238 [rdma_rxe], CPU#0: swapper/0/0
...
libsha1 [last unloaded: ip6_udp_tunnel]
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G C 6.19.0-rc5-64k-v8+ #37 PREEMPT
Tainted: [C]=CRAP
Hardware name: Raspberry Pi 4 Model B Rev 1.2
Call trace:
rxe_sched_task+0x1c8/0x238 [rdma_rxe] (P)
retransmit_timer+0x130/0x188 [rdma_rxe]
call_timer_fn+0x68/0x4d0
__run_timers+0x630/0x888
...
WARNING: drivers/infiniband/sw/rxe/rxe_task.c:38 at rxe_sched_task+0x1c0/0x238 [rdma_rxe], CPU#0: swapper/0/0
...
WARNING: drivers/infiniband/sw/rxe/rxe_task.c:111 at do_work+0x488/0x5c8 [rdma_rxe], CPU#3: kworker/u17:4/93400
...
refcount_t: underflow; use-after-free.
WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x138/0x1a0, CPU#3: kworker/u17:4/93400
The issue is caused by a race condition between retransmit_timer() and
rxe_destroy_qp, leading to the Queue Pair's (QP) reference count dropping
to zero during timer handler execution.
It seems this warning is harmless because rxe_qp_do_cleanup() will flush
all pending timers and requests.
Ensure the QP's reference count is maintained and its validity is checked
within the timer callbacks by adding calls to rxe_get(qp) and corresponding
rxe_put(qp) after use.
Introduce a basic DM implementation that enables creating and
registering device memory, and using the associated memory keys
for networking operations.
Zilin Guan [Mon, 26 Jan 2026 07:48:01 +0000 (07:48 +0000)]
RDMA/mlx5: Fix memory leak in GET_DATA_DIRECT_SYSFS_PATH handler
The UVERBS_HANDLER(MLX5_IB_METHOD_GET_DATA_DIRECT_SYSFS_PATH) function
allocates memory for the device path using kobject_get_path(). If the
length of the device path exceeds the output buffer length, the function
returns -ENOSPC but does not free the allocated memory, resulting in a
memory leak.
Add a kfree() call to the error path to ensure the allocated memory is
properly freed.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Yi Liu [Thu, 22 Jan 2026 14:29:00 +0000 (22:29 +0800)]
RDMA/uverbs: Validate wqe_size before using it in ib_uverbs_post_send
ib_uverbs_post_send() uses cmd.wqe_size from userspace without any
validation before passing it to kmalloc() and using the allocated
buffer as struct ib_uverbs_send_wr.
If a user provides a small wqe_size value (e.g., 1), kmalloc() will
succeed, but subsequent accesses to user_wr->opcode, user_wr->num_sge,
and other fields will read beyond the allocated buffer, resulting in
an out-of-bounds read from kernel heap memory. This could potentially
leak sensitive kernel information to userspace.
Additionally, providing an excessively large wqe_size can trigger a
WARNING in the memory allocation path, as reported by syzkaller.
This is inconsistent with ib_uverbs_unmarshall_recv() which properly
validates that wqe_size >= sizeof(struct ib_uverbs_recv_wr) before
proceeding.
Add the same validation for ib_uverbs_post_send() to ensure wqe_size
is at least sizeof(struct ib_uverbs_send_wr).
Jacob Moroni [Tue, 20 Jan 2026 21:25:46 +0000 (21:25 +0000)]
RDMA/irdma: Use CQ ID for CEQE context
The hardware allows for an opaque CQ context field to be carried
over into CEQEs for the CQ. Previously, a pointer to the CQ was used
for this context. In the normal CQ destroy flow, the CEQ ring is
scrubbed to remove any preexisting CEQEs for the CQ that may not have
been processed yet so that the CQ structure is not dereferenced in the
CEQ ISR after the CQ has been freed.
However, in some cases, it is possible for a CEQE to be in flight in
HW even after the CQ destroy command completion is received, so it
could be missed during the scrub.
To protect against this, we can take advantage of the CQ table that
already exists and use the CQ ID for this context rather than a CQ
pointer.
Li Zhijian [Fri, 16 Jan 2026 03:27:53 +0000 (11:27 +0800)]
RDMA/rxe: Fix iova-to-va conversion for MR page sizes != PAGE_SIZE
The current implementation incorrectly handles memory regions (MRs) with
page sizes different from the system PAGE_SIZE. The core issue is that
rxe_set_page() is called with mr->page_size step increments, but the
page_list stores individual struct page pointers, each representing
PAGE_SIZE of memory.
ib_sg_to_page() has ensured that when i>=1 either
a) SG[i-1].dma_end and SG[i].dma_addr are contiguous
or
b) SG[i-1].dma_end and SG[i].dma_addr are mr->page_size aligned.
This leads to incorrect iova-to-va conversion in scenarios:
Access iova = 0x18f800 + 0x810 = 0x190010
Expected VA: 0x170010 (second SG, offset 0x10)
Before fix:
- index = (0x190010 >> 16) - (0x18f800 >> 16) = 1
- page_offset = 0x190010 & 0xFFFF = 0x10
- xarray[1] stores system page for dma_addr 0x170000
- Resulting VA: system page of 0x170000 + 0x10 = 0x170010 (wrong)
Yi Zhang reported a kernel panic[1] years ago related to this defect.
Solution:
1. Replace xarray with pre-allocated rxe_mr_page array for sequential
indexing (all MR page indices are contiguous)
2. Each rxe_mr_page stores both struct page* and offset within the
system page
3. Handle MR page_size != PAGE_SIZE relationships:
- page_size > PAGE_SIZE: Split MR pages into multiple system pages
- page_size <= PAGE_SIZE: Store offset within system page
4. Add boundary checks and compatibility validation
This ensures correct iova-to-va conversion regardless of MR page size
and system PAGE_SIZE relationship, while improving performance through
array-based sequential access.
Li Zhijian [Fri, 16 Jan 2026 03:28:33 +0000 (11:28 +0800)]
RDMA/rxe: Remove unused page_offset member
In rxe_map_mr_sg(), the `page_offset` member of the `rxe_mr` struct
was initialized based on `ibmr.iova`, which will be updated inside
ib_sg_to_pages() later.
Consequently, the value assigned to `page_offset` was incorrect. However,
since `page_offset` was never utilized throughout the code, it can be safely
removed to clean up the codebase and avoid future confusion.
Or Har-Toov [Thu, 15 Jan 2026 12:26:45 +0000 (14:26 +0200)]
IB/mlx5: Fix port speed query for representors
When querying speed information for a representor in switchdev mode,
the code previously used the first device in the eswitch, which may not
match the device that actually owns the representor. In setups such as
multi-port eswitch or LAG, this led to incorrect port attributes being
reported.
Fix this by retrieving the correct core device from the representor's
eswitch before querying its port attributes.
Fixes: 27f9e0ccb6da ("net/mlx5: Lag, Add single RDMA device in multiport mode") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260115-port-speed-query-fix-v2-1-3bde6a3c78e7@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Chiara Meiohas [Tue, 13 Jan 2026 13:37:10 +0000 (15:37 +0200)]
RDMA/mlx5: Fix UMR hang in LAG error state unload
During firmware reset in LAG mode, a race condition causes the driver
to hang indefinitely while waiting for UMR completion during device
unload. See [1].
In LAG mode the bond device is only registered on the master, so it
never sees sys_error events from the slave.
During firmware reset this causes UMR waits to hang forever on unload
as the slave is dead but the master hasn't entered error state yet, so
UMR posts succeed but completions never arrive.
Fix this by adding a sys_error notifier that gets registered before
MLX5_IB_STAGE_IB_REG and stays alive until after ib_unregister_device().
This ensures error events reach the bond device throughout teardown.
Get CQ type from the used gdma device. The MANA_IB_CREATE_RNIC_CQ
flag is ignored. It was used in older kernel versions where
the mana_ib was shared between ethernet and rnic.
Jacob Moroni [Mon, 12 Jan 2026 02:00:06 +0000 (02:00 +0000)]
RDMA/iwcm: Fix workqueue list corruption by removing work_list
The commit e1168f0 ("RDMA/iwcm: Simplify cm_event_handler()")
changed the work submission logic to unconditionally call
queue_work() with the expectation that queue_work() would
have no effect if work was already pending. The problem is
that a free list of struct iwcm_work is used (for which
struct work_struct is embedded), so each call to queue_work()
is basically unique and therefore does indeed queue the work.
This causes a problem in the work handler which walks the work_list
until it's empty to process entries. This means that a single
run of the work handler could process item N+1 and release it
back to the free list while the actual workqueue entry is still
queued. It could then get reused (INIT_WORK...) and lead to
list corruption in the workqueue logic.
Fix this by just removing the work_list. The workqueue already
does this for us.
This fixes the following error that was observed when stress
testing with ucmatose on an Intel E830 in iWARP mode:
Jiasheng Jiang [Mon, 12 Jan 2026 01:54:12 +0000 (01:54 +0000)]
RDMA/rxe: Fix double free in rxe_srq_from_init
In rxe_srq_from_init(), the queue pointer 'q' is assigned to
'srq->rq.queue' before copying the SRQ number to user space.
If copy_to_user() fails, the function calls rxe_queue_cleanup()
to free the queue, but leaves the now-invalid pointer in
'srq->rq.queue'.
The caller of rxe_srq_from_init() (rxe_create_srq) eventually
calls rxe_srq_cleanup() upon receiving the error, which triggers
a second rxe_queue_cleanup() on the same memory, leading to a
double free.
The call trace looks like this:
kmem_cache_free+0x.../0x...
rxe_queue_cleanup+0x1a/0x30 [rdma_rxe]
rxe_srq_cleanup+0x42/0x60 [rdma_rxe]
rxe_elem_release+0x31/0x70 [rdma_rxe]
rxe_create_srq+0x12b/0x1a0 [rdma_rxe]
ib_create_srq_user+0x9a/0x150 [ib_core]
Fix this by moving 'srq->rq.queue = q' after copy_to_user.
Chengchang Tang [Thu, 8 Jan 2026 11:30:32 +0000 (19:30 +0800)]
RDMA/hns: Support drain SQ and RQ
Some ULPs, e.g. rpcrdma, rely on drain_qp() to ensure all outstanding
requests are completed before releasing related memory. If drain_qp()
fails, ULPs may release memory directly, and in-flight WRs may later be
flushed after the memory is freed, potentially leading to UAF.
drain_qp() failures can happen when HW enters an error state or is
reset. Add support to drain SQ and RQ in such cases by posting a
fake WR during reset, so the driver can process all remaining WRs in
sequence and generate corresponding completions.
Always invoke comp_handler() in drain process to ensure completions
are not lost under concurrency (e.g. concurrent post_send() and
reset, or QPs created during reset). If the CQ is already processed,
cancel any already scheduled comp_handler() to avoid concurrency
issues.
Jacob Moroni [Mon, 5 Jan 2026 18:05:50 +0000 (18:05 +0000)]
RDMA/irdma: Remove fixed 1 ms delay during AH wait loop
The AH CQP command wait loop executes in an atomic context and was
using a fixed 1 ms delay. Since many AH create commands can complete
much faster than 1 ms, use poll_timeout_us_atomic with a 1 us delay.
Also, use the timeout value indicated during the capability exchange
rather than a hard-coded value.
Jacob Moroni [Sat, 3 Jan 2026 17:25:17 +0000 (17:25 +0000)]
RDMA/irdma: Remove redundant dma_wmb() before writel()
A dma_wmb() is not necessary before a writel() because writel()
already has an even stronger store barrier. A dma_wmb() is only
required to order writes to consistent/DMA memory whereas the
barrier in writel() is specified to order writes to DMA memory as
well as MMIO.
Md Haris Iqbal [Wed, 7 Jan 2026 16:15:16 +0000 (17:15 +0100)]
RDMA/rtrs-clt: For conn rejection use actual err number
When the connection establishment request is rejected from the server
side, then the actual error number sent back should be used.
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-10-haris.iqbal@ionos.com Reviewed-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Kim Zhu [Wed, 7 Jan 2026 16:15:15 +0000 (17:15 +0100)]
RDMA/rtrs: Extend log message when a port fails
Add HCA name and port of this HCA.
This would help with analysing and debugging the logs.
The logs would looks something like this,
rtrs_server L2516: Handling event: port error (10).
HCA name: mlx4_0, port num: 2
rtrs_client L3326: Handling event: port error (10).
HCA name: mlx4_0, port num: 1
Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-9-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Kim Zhu [Wed, 7 Jan 2026 16:15:14 +0000 (17:15 +0100)]
RDMA/rtrs-srv: Rate-limit I/O path error logging
Excessive error logging is making it difficult to identify the root
cause of issues. Implement rate limiting to improve log clarity.
Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-8-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Md Haris Iqbal [Wed, 7 Jan 2026 16:15:13 +0000 (17:15 +0100)]
RDMA/rtrs-srv: Add check and closure for possible zombie paths
During several network incidents, a number of RTRS paths for a session
went through disconnect and reconnect phase. However, some of those did
not auto-reconnect successfully. Instead they failed with the following
logs,
On server, (log a)
kernel: ibtrs_server L1868: <>: Connection already exists: 0
When the misbehaving path was removed, and add_path was called to re-add
the path, the log on client side changed to, (log b)
kernel: rtrs_client L1991: <sess-name>: Connect rejected: status 28
(consumer defined), rtrs errno -17
There was no log on the server side for this, which is expected since
there is no logging in that path,
if (unlikely(__is_path_w_addr_exists(srv, &cm_id->route.addr))) {
err = -EEXIST;
goto err;
Because of the following check on server side,
if (unlikely(sess->state != IBTRS_SRV_CONNECTING)) {
ibtrs_err(s, "Session in wrong state: %s\n",
.. we know that the path in (log a) was in CONNECTING state.
The above state of the path persists for as long as we leave the session
be. This means that the path is in some zombie state, probably waiting
for the info_req packet to arrive, which never does.
The changes in this commits does 2 things.
1) Add logs at places where we see the errors happening. The logs would
shed more light at the state and lifetime of such zombie paths.
2) Close such zombie sessions, only if they are in CONNECTING state, and
after an inactivity period of 30 seconds.
i) The state check prevents closure of paths which are CONNECTED.
Also, from the above logs and code, we already know that the path could
only be on CONNECTING state, so we play safe and narrow our impact surface
area by closing only CONNECTING paths.
ii) The inactivity period is to allow requests for other cid to finish
processing, or for any stray packets to arrive/fail.
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-7-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Kim Zhu [Wed, 7 Jan 2026 16:15:11 +0000 (17:15 +0100)]
RDMA/rtrs: Improve error logging for RDMA cm events
The member variable status in the struct rdma_cm_event is used for both
linux errors and the errors definded in rdma stack.
Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-5-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Md Haris Iqbal [Wed, 7 Jan 2026 16:15:10 +0000 (17:15 +0100)]
RDMA/rtrs: Add optional support for IB_MR_TYPE_SG_GAPS
Support IB_MR_TYPE_SG_GAPS, which has less limitations
than standard IB_MR_TYPE_MEM_REG, a few ULP support this.
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-4-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Kim Zhu [Wed, 7 Jan 2026 16:15:09 +0000 (17:15 +0100)]
RDMA/rtrs: Add error description to the logs
Print error description instead of the error number.
Signed-off-by: Kim Zhu <zhu.yanjun@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-3-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Roman Penyaev [Wed, 7 Jan 2026 16:15:08 +0000 (17:15 +0100)]
RDMA/rtrs-srv: fix SG mapping
This fixes the following error on the server side:
RTRS server session allocation failed: -EINVAL
caused by the caller of the `ib_dma_map_sg()`, which does not expect
less mapped entries, than requested, which is in the order of things
and can be easily reproduced on the machine with enabled IOMMU.
The fix is to treat any positive number of mapped sg entries as a
successful mapping and cache DMA addresses by traversing modified
SG table.
Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Signed-off-by: Roman Penyaev <r.peniaev@gmail.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Link: https://patch.msgid.link/20260107161517.56357-2-haris.iqbal@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Maher Sanalla [Sun, 4 Jan 2026 13:51:35 +0000 (15:51 +0200)]
RDMA/mlx5: Fix ucaps init error flow
In mlx5_ib_stage_caps_init(), if mlx5_ib_init_ucaps() fails after
mlx5_ib_init_var_table() succeeds, the VAR bitmap is leaked since
the function returns without cleanup.
Thus, cleanup the var table bitmap in case of error of initializing
ucaps before exiting, preventing the leak above.
Fixes: cf7174e8982f ("RDMA/mlx5: Create UCAP char devices for supported device capabilities") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/20260104-ib-core-misc-v1-3-00367f77f3a8@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Parav Pandit [Sun, 4 Jan 2026 13:51:34 +0000 (15:51 +0200)]
RDMA/core: Avoid exporting module local functions and remove not-used ones
Some of the functions are local to the module and some are not used
starting from commit 36783dec8d79 ("RDMA/rxe: Delete deprecated module
parameters interface"). Delete and avoid exporting them.
Leon Romanovsky [Sun, 4 Jan 2026 13:51:33 +0000 (15:51 +0200)]
RDMA/umem: Remove redundant DMABUF ops check
ib_umem_dmabuf_get_with_dma_device() is an in-kernel function and does
not require a defensive check for the .move_notify callback. All current
callers guarantee that this callback is always present.
Or Har-Toov [Thu, 18 Dec 2025 15:59:00 +0000 (17:59 +0200)]
RDMA/mlx5: Implement query_port_speed callback
Implement the query_port_speed callback for mlx5 driver to support
querying effective port bandwidth.
For LAG configurations, query the aggregated speed from the LAG layer
or from the modified vport max_tx_speed.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Or Har-Toov [Thu, 18 Dec 2025 15:58:53 +0000 (17:58 +0200)]
RDMA/mlx5: Raise async event on device speed change
Raise IB_EVENT_DEVICE_SPEED_CHANGE whenever the speed of one of the
device's ports changes. Usually all ports of the device changes
together.
This ensures user applications and upper-layer software are immediately
notified when bandwidth changes, improving traffic management in dynamic
environments. This is especially useful for vports which are part of a
LAG configuration, to know if the effective speed of the LAG was
changed.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Or Har-Toov [Thu, 18 Dec 2025 15:58:46 +0000 (17:58 +0200)]
IB/core: Add query_port_speed verb
Add new ibv_query_port_speed() verb to enable applications to query
the effective bandwidth of a port.
This verb is particularly useful when the speed is not a multiplication
of IB speed and width where width is 2^n.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Or Har-Toov [Thu, 18 Dec 2025 15:58:40 +0000 (17:58 +0200)]
IB/core: Refactor rate_show to use ib_port_attr_to_rate()
Update sysfs rate_show() to rely on ib_port_attr_to_speed_info() for
converting IB port speed and width attributes to data rate and speed
string.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Or Har-Toov [Thu, 18 Dec 2025 15:58:32 +0000 (17:58 +0200)]
IB/core: Add helper to convert port attributes to data rate
Introduce ib_port_attr_to_rate() to compute the data rate in 100 Mbps
units (deci-Gb/sec) from a port's active_speed and active_width
attributes. This generic helper removes duplicated speed-to-rate
calculations, which are used by sysfs and the upcoming new verb.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Or Har-Toov [Thu, 18 Dec 2025 15:58:26 +0000 (17:58 +0200)]
IB/core: Add async event on device speed change
Add IB_EVENT_DEVICE_SPEED_CHANGE for notifying user applications on
device's ports speed changes.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Leon Romanovsky [Mon, 5 Jan 2026 07:39:29 +0000 (02:39 -0500)]
Support effective VF bandwidth query in LAG mode
Currently, mlx5 driver exposes only the parent function's speed to VFs,
providing no way to query the actual effective bandwidth in LAG and
MPESW configurations. This limitation prevents userspace and
upper-layer software from obtaining accurate bandwidth information,
which impacts traffic scheduling decisions.
This series addresses this by:
1. Adding mlx5 internal logic to calculate and propagate the effective
aggregated LAG speed to all attached vports. The vport speeds are
dynamically updated when LAG member link states change.
2. Extending RDMA core with a new ib_query_port_speed() verb and an
IB_EVENT_DEVICE_SPEED_CHANGE async event. These interfaces expose
the effective port speed to userspace, supporting speeds that are
not expressible as IB speed * width (where width is 2^n).
This series enables userspace applications to query the effective
port speed and receive notifications on speed changes in real-time.
In LAG configurations, each mlx5 port reports the aggregated bandwidth
of all active LAG members.
Or Har-Toov [Thu, 18 Dec 2025 15:58:20 +0000 (17:58 +0200)]
net/mlx5: Add support for querying bond speed
Add mlx5_lag_query_bond_speed() to query the aggregated speed of
lag configurations with a bond device.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Or Har-Toov [Thu, 18 Dec 2025 15:58:13 +0000 (17:58 +0200)]
net/mlx5: Handle port and vport speed change events in MPESW
Add port change event handling logic for MPESW LAG mode, ensuring
VFs are updated when the speed of LAG physical ports changes.
This triggers a speed update workflow when relevant port state changes
occur, enabling consistent and accurate reporting of VF bandwidth.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Or Har-Toov [Thu, 18 Dec 2025 15:58:05 +0000 (17:58 +0200)]
net/mlx5: Propagate LAG effective max_tx_speed to vports
Currently, vports report only their parent's uplink speed, which in LAG
setups does not reflect the true aggregated bandwidth. This makes it
hard for upper-layer software to optimize load balancing decisions
based on accurate bandwidth information.
Fix the issue by calculating the possible maximum speed of a LAG as
the sum of speeds of all active uplinks that are part of the LAG.
Propagate this effective max speed to vports associated with the LAG
whenever a relevant event occurs, such as physical port link state
changes or LAG creation/modification.
With this change, upper-layer components receive accurate bandwidth
information corresponding to the active members of the LAG and can
make better load balancing decisions.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Or Har-Toov [Thu, 18 Dec 2025 15:57:58 +0000 (17:57 +0200)]
net/mlx5: Add max_tx_speed and its CAP bit to IFC
Introduce the max_tx_speed field to the query and modify_vport_state
structures.
Add the esw_vport_state_max_tx_speed capability bit, indicating
the firmware support modifying the max_tx_speed field via the
MODIFY_VPORT_STATE command.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Chengchang Tang [Sun, 4 Jan 2026 06:40:57 +0000 (14:40 +0800)]
RDMA/hns: Notify ULP of remaining soft-WCs during reset
During a reset, software-generated WCs cannot be reported via
interrupts. This may cause the ULP to miss some WCs.
To avoid this, add check in the CQ arm process: if a hardware reset
has occurred and there are still unreported soft-WCs, notify the ULP
to handle the remaining WCs, thereby preventing any loss of completions.
Fixes: 626903e9355b ("RDMA/hns: Add support for reporting wc as software mode") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20260104064057.1582216-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Junxian Huang [Sun, 4 Jan 2026 06:40:56 +0000 (14:40 +0800)]
RDMA/hns: Fix RoCEv1 failure due to DSCP
DSCP is not supported in RoCEv1, but get_dscp() is still called. If
get_dscp() returns an error, it'll eventually cause create_ah to fail
even when using RoCEv1.
Correct the return value and avoid calling get_dscp() when using
RoCEv1.
Junxian Huang [Sun, 4 Jan 2026 06:40:55 +0000 (14:40 +0800)]
RDMA/hns: Return actual error code instead of fixed EINVAL
query_cqc() and query_mpt() may return various error codes in
different cases. Return actual error code instead of fixed EINVAL.
Fixes: f2b070f36d1b ("RDMA/hns: Support CQ's restrack raw ops for hns driver") Fixes: 3d67e7e236ad ("RDMA/hns: Support MR's restrack raw ops for hns driver") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20260104064057.1582216-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Etienne AUJAMES [Wed, 31 Dec 2025 13:07:45 +0000 (14:07 +0100)]
IB/cache: update gid cache on client reregister event
Some HCAs (e.g: ConnectX4) do not trigger a IB_EVENT_GID_CHANGE on
subnet prefix update from SM (PortInfo).
Since the commit d58c23c92548 ("IB/core: Only update PKEY and GID caches
on respective events"), the GID cache is updated exclusively on
IB_EVENT_GID_CHANGE. If this event is not emitted, the subnet prefix in the
IPoIB interface’s hardware address remains set to its default value
(0xfe80000000000000).
Then rdma_bind_addr() failed because it relies on hardware address to
find the port GID (subnet_prefix + port GUID).
This patch fixes this issue by updating the GID cache on
IB_EVENT_CLIENT_REREGISTER event (emitted on PortInfo::ClientReregister=1).
Fixes: d58c23c92548 ("IB/core: Only update PKEY and GID caches on respective events") Signed-off-by: Etienne AUJAMES <eaujames@ddn.com> Link: https://patch.msgid.link/aVUfsO58QIDn5bGX@eaujamesFR0130 Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Lianfa Weng [Tue, 30 Dec 2025 15:49:11 +0000 (23:49 +0800)]
RDMA/hns: Introduce limit_bank mode with better performance
In limit_bank mode, QPs/CQs are restricted to using half of the banks.
HW concentrates resources on these banks, thereby improving performance
compared to the default mode.
Switch between limit_bank mode and default mode by setting the cap
flag in FW. Since the number of QPs and CQs will be halved, this mode
is suitable for scenarios where fewer QPs and CQs are required.
Linus Torvalds [Sun, 28 Dec 2025 18:21:47 +0000 (10:21 -0800)]
Merge tag 'usb-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes, and bunch of reverts for 6.19-rc3.
Included in here are:
- reverts of some typec ucsi driver changes that had a lot of
regression reports after -rc1. Let's just revert it all for now and
it will come back in a way that is better tested.
- other typec bugfixes
- usb-storage quirk fixups
- dwc3 driver fix
- other minor USB fixes for reported problems.
All of these have passed 0-day testing and individual testing"
* tag 'usb-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits)
Revert "usb: typec: ucsi: Update UCSI structure to have message in and message out fields"
Revert "usb: typec: ucsi: Add support for message out data structure"
Revert "usb: typec: ucsi: Enable debugfs for message_out data structure"
Revert "usb: typec: ucsi: Add support for SET_PDOS command"
Revert "usb: typec: ucsi: Fix null pointer dereference in ucsi_sync_control_common"
Revert "usb: typec: ucsi: Get connector status after enable notifications"
usb: ohci-nxp: clean up probe error labels
usb: gadget: lpc32xx_udc: clean up probe error labels
usb: ohci-nxp: fix device leak on probe failure
usb: phy: isp1301: fix non-OF device reference imbalance
usb: gadget: lpc32xx_udc: fix clock imbalance in error path
usb: typec: ucsi: Get connector status after enable notifications
usb: usb-storage: Maintain minimal modifications to the bcdDevice range.
usb: dwc3: of-simple: fix clock resource leak in dwc3_of_simple_probe
usb: typec: ucsi: Fix null pointer dereference in ucsi_sync_control_common
USB: lpc32xx_udc: Fix error handling in probe
usb: typec: altmodes/displayport: Drop the device reference in dp_altmode_probe()
usb: phy: fsl-usb: Fix use-after-free in delayed work during device removal
usb: renesas_usbhs: Fix a resource leak in usbhs_pipe_malloc()
usb: typec: ucsi: huawei-gaokin: add DRM dependency
...
Linus Torvalds [Sun, 28 Dec 2025 18:14:49 +0000 (10:14 -0800)]
Merge tag 'tty-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
"Here are some small serial driver fixes for some reported issues.
Included in here are:
- serial sysfs fwnode fix that was much reported
- sh-sci driver fix
- serial device init bugfix
- 8250 bugfix
- xilinx_uartps bugfix
All of these have passed 0-day testing and individual testing"
* tag 'tty-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: xilinx_uartps: fix rs485 delay_rts_after_send
serial: sh-sci: Check that the DMA cookie is valid
serial: core: Fix serial device initialization
serial: 8250: longson: Fix NULL vs IS_ERR() bug in probe
serial: core: Restore sysfs fwnode information
Linus Torvalds [Sun, 28 Dec 2025 18:11:18 +0000 (10:11 -0800)]
Merge tag 'firewire-fixes-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
"A fix for PCI driver for Texas Instruments PCILyx series.
The driver had a bug where it allocated a DMA-coherent buffer of 16 KB
but released it using PAGE_SIZE. This disproportion was reported in
2020, but the fix was never merged. It is finally resolved"
* tag 'firewire-fixes-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: nosy: Fix dma_free_coherent() size
Linus Torvalds [Sun, 28 Dec 2025 17:44:26 +0000 (09:44 -0800)]
Merge tag 'riscv-for-linus-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
"Nothing exotic here; these are the cleanup and new ISA extension
probing patches (not including CFI):
- Add probing and userspace reporting support for the standard RISC-V
ISA extensions Zilsd and Zclsd, which implement load/store dual
instructions on RV32
- Abstract the register saving code in setup_sigcontext() so it can
be used for stateful RISC-V ISA extensions beyond the vector
extension
- Add the SBI extension ID and some initial data structure
definitions for the RISC-V standard SBI debug trigger extension
- Clean up some code slightly: change some page table functions to
avoid atomic operations oinn !SMP and to avoid unnecessary casts to
atomic_long_t; and use the existing RISCV_FULL_BARRIER macro in
place of some open-coded 'fence rw,rw' instructions"
* tag 'riscv-for-linus-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Add SBI debug trigger extension and function ids
riscv/atomic.h: use RISCV_FULL_BARRIER in _arch_atomic* function.
riscv: hwprobe: export Zilsd and Zclsd ISA extensions
riscv: add ISA extension parsing for Zilsd and Zclsd
dt-bindings: riscv: add Zilsd and Zclsd extension descriptions
riscv: mm: use xchg() on non-atomic_long_t variables, not atomic_long_xchg()
riscv: mm: ptep_get_and_clear(): avoid atomic ops when !CONFIG_SMP
riscv: mm: pmdp_huge_get_and_clear(): avoid atomic ops when !CONFIG_SMP
riscv: signal: abstract header saving for setup_sigcontext
Linus Torvalds [Sun, 28 Dec 2025 17:40:09 +0000 (09:40 -0800)]
Merge tag 'powerpc-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- Fix for kexec warning due to SMT disable or partial SMT enabled
- Handle font bitmap pointer with reloc_offset to fix boot crash
- Fix to enable cpuidle state for Power11
- Couple of misc fixes
Thanks to Aboorva Devarajan, Aditya Bodkhe, Cedar Maxwell, Christian
Zigotzky, Christophe Leroy, Christophe Leroy (CS GROUP), Finn Thain,
Gopi Krishna Menon, Guenter Roeck, Jan Stancek, Joe Lawrence, Josh
Poimboeuf, Justin M. Forbes, Madadi Vineeth Reddy, Naveen N Rao (AMD),
Nysal Jan K.A., Sachin P Bappalige, Samir M, Sourabh Jain, Srikar
Dronamraju, and Stan Johnson
* tag 'powerpc-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/32: Restore disabling of interrupts at interrupt/syscall exit
powerpc/powernv: Enable cpuidle state detection for POWER11
powerpc: Add reloc_offset() to font bitmap pointer used for bootx_printf()
powerpc/tools: drop `-o pipefail` in gcc check scripts
selftests/powerpc/pmu/: Add check_extended_reg_test to .gitignore
powerpc/kexec: Enable SMT before waking offline CPUs
Linus Torvalds [Sat, 27 Dec 2025 16:37:26 +0000 (08:37 -0800)]
Merge tag 'spi-fix-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"We've got more fixes here for the Cadence QSPI controller, this time
fixing some issues that come up when working with slower flashes on
some platforms plus a general race condition.
We also add support for the Allwinner A523, this is just some new
compatibles"
* tag 'spi-fix-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: cadence-quadspi: Improve CQSPI_SLOW_SRAM quirk if flash is slow
spi: cadence-quadspi: Prevent lost complete() call during indirect read
spi: sun6i: Support A523's SPI controllers
spi: dt-bindings: sun6i: Add compatibles for A523's SPI controllers
Linus Torvalds [Sat, 27 Dec 2025 16:04:39 +0000 (08:04 -0800)]
Merge tag 'regulator-fix-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of fixes from Thomas, making the UAPI headers more robustly
correct and ensuring they are covered by checkpatch, and one from
Andreas fixing an update for a change to the DT bindings that I missed
was requested during bindings review in the newly added fp9931 driver"
* tag 'regulator-fix-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fp9931: fix regulator node pointer
regulator: Add UAPI headers to MAINTAINERS
regulator: uapi: Use UAPI integer type
Linus Torvalds [Fri, 26 Dec 2025 21:41:02 +0000 (13:41 -0800)]
Merge tag 'driver-core-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Introduce DMA Rust helpers to avoid build errors when !CONFIG_HAS_DMA
- Remove unnecessary (and hence incorrect) endian conversion in the
Rust PCI driver sample code
- Fix memory leak in the unwind path of debugfs_change_name()
- Support non-const struct software_node pointers in
SOFTWARE_NODE_REFERENCE(), after introducing _Generic()
- Avoid NULL pointer dereference in the unwind path of
simple_xattrs_free()
* tag 'driver-core-6.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
fs/kernfs: null-ptr deref in simple_xattrs_free()
software node: Also support referencing non-constant software nodes
debugfs: Fix memleak in debugfs_change_name().
samples: rust: fix endianness issue in rust_driver_pci
rust: dma: add helpers for architectures without CONFIG_HAS_DMA
Linus Torvalds [Fri, 26 Dec 2025 21:37:11 +0000 (13:37 -0800)]
Merge tag 'efi-fixes-for-v6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
"A couple of fixes for EFI regressions introduced this cycle:
- Make EDID handling in the EFI stub mixed mode safe
- Ensure that efi_mm.user_ns has a sane value - this is needed now
that EFI runtime calls are preemptible on arm64"
* tag 'efi-fixes-for-v6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
kthread: Warn if mm_struct lacks user_ns in kthread_use_mm()
arm64: efi: Fix NULL pointer dereference by initializing user_ns
efi/libstub: gop: Fix EDID support in mixed-mode
Linus Torvalds [Fri, 26 Dec 2025 19:44:35 +0000 (11:44 -0800)]
Merge tag 'block-6.19-20251226' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- Fix for a signedness issue introduced in this kernel release for rnbd
- Fix up user copy references for ublk when the server exits
* tag 'block-6.19-20251226' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
block: rnbd-clt: Fix signedness bug in init_dev()
ublk: clean up user copy references on ublk server exit
Linus Torvalds [Fri, 26 Dec 2025 19:34:38 +0000 (11:34 -0800)]
Merge tag 'io_uring-6.19-20251226' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fix from Jens Axboe:
"Just a single fix for a bug that can cause a leak of the filename with
IORING_OP_OPENAT, if direct descriptors are asked for and O_CLOEXEC
has been set in the request flags"
* tag 'io_uring-6.19-20251226' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring: fix filename leak in __io_openat_prep()
__io_openat_prep() allocates a struct filename using getname(). However,
for the condition of the file being installed in the fixed file table as
well as having O_CLOEXEC flag set, the function returns early. At that
point, the request doesn't have REQ_F_NEED_CLEANUP flag set. Due to this,
the memory for the newly allocated struct filename is not cleaned up,
causing a memory leak.
Fix this by setting the REQ_F_NEED_CLEANUP for the request just after the
successful getname() call, so that when the request is torn down, the
filename will be cleaned up, along with other resources needing cleanup.
Breno Leitao [Tue, 23 Dec 2025 10:55:44 +0000 (02:55 -0800)]
kthread: Warn if mm_struct lacks user_ns in kthread_use_mm()
Add a WARN_ON_ONCE() check to detect mm_struct instances that are
missing user_ns initialization when passed to kthread_use_mm().
When a kthread adopts an mm via kthread_use_mm(), LSM hooks and
capability checks may access current->mm->user_ns for credential
validation. If user_ns is NULL, this leads to a NULL pointer
dereference crash.
This was observed with efi_mm on arm64, where commit a5baf582f4c0
("arm64/efi: Call EFI runtime services without disabling preemption")
introduced kthread_use_mm(&efi_mm), but efi_mm lacked user_ns
initialization, causing crashes during /proc access.
Adding this warning helps catch similar bugs early during development
rather than waiting for hard-to-debug NULL pointer crashes in
production.
I've bissected the problem to commit a5baf582f4c0 ("arm64/efi: Call EFI
runtime services without disabling preemption").
>From my analyzes, the crash occurs because efi_mm lacks a user_ns field
initialization. This was previously harmless, but commit a5baf582f4c0
("arm64/efi: Call EFI runtime services without disabling preemption")
changed the EFI runtime call path to use kthread_use_mm(&efi_mm), which
temporarily adopts efi_mm as the current mm for the calling kthread.
When a thread has an active mm, LSM hooks like cap_capable() expect
mm->user_ns to be valid for credential checks. With efi_mm.user_ns being
NULL, capability checks during possible /proc access dereference the
NULL pointer and crash.
Fix by initializing efi_mm.user_ns to &init_user_ns.
Hans de Goede [Tue, 23 Dec 2025 10:10:46 +0000 (11:10 +0100)]
efi/libstub: gop: Fix EDID support in mixed-mode
The efi_edid_discovered_protocol and efi_edid_active_protocol have mixed
mode fields. So all their attributes should be accessed through
the efi_table_attr() helper.
Doing so fixes the upper 32 bits of the 64 bit gop_edid pointer getting
set to random values (followed by a crash at boot) when booting a x86_64
kernel on a machine with 32 bit UEFI like the Asus T100TA.
Fixes: 17029cdd8f9d ("efi/libstub: gop: Add support for reading EDID") Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Linus Torvalds [Wed, 24 Dec 2025 17:23:04 +0000 (09:23 -0800)]
Merge tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"A set of NFSD fixes that arrived just a bit late for the 6.19 merge
window.
Regression fixes:
- Mark variable __maybe_unused to avoid W=1 build break
Stable fixes:
- NFSv4 file creation neglects setting ACL
- Clear TIME_DELEG in the suppattr_exclcreat bitmap
- Clear SECLABEL in the suppattr_exclcreat bitmap
- Fix memory leak in nfsd_create_serv error paths
- Bound check rq_pages index in inline path
- Return 0 on success from svc_rdma_copy_inline_range
- Use rc_pageoff for memcpy byte offset
- Avoid NULL deref on zero length gss_token in gss_read_proxy_verf"
* tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: NFSv4 file creation neglects setting ACL
NFSD: Clear TIME_DELEG in the suppattr_exclcreat bitmap
NFSD: Clear SECLABEL in the suppattr_exclcreat bitmap
nfsd: fix memory leak in nfsd_create_serv error paths
nfsd: Mark variable __maybe_unused to avoid W=1 build break
svcrdma: bound check rq_pages index in inline path
svcrdma: return 0 on success from svc_rdma_copy_inline_range
svcrdma: use rc_pageoff for memcpy byte offset
SUNRPC: svcauth_gss: avoid NULL deref on zero length gss_token in gss_read_proxy_verf
Linus Torvalds [Wed, 24 Dec 2025 17:15:30 +0000 (09:15 -0800)]
Merge tag 'erofs-for-6.19-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
"Junbeom reported that synchronous reads could hit unintended EIOs
under memory pressure due to incorrect error propagation in
z_erofs_decompress_queue(), where earlier physical clusters in the
same decompression queue may be served for another readahead.
This addresses the issue by decompressing each physical cluster
independently as long as disk I/Os succeed, rather than being impacted
by the error status of previous physical clusters in the same queue.
Summary:
- Fix unexpected EIOs under memory pressure caused by recent
incorrect error propagation logic"
* tag 'erofs-for-6.19-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix unexpected EIO under memory pressure
Zilin Guan [Wed, 24 Dec 2025 15:21:42 +0000 (15:21 +0000)]
cifs: Fix memory and information leak in smb3_reconfigure()
In smb3_reconfigure(), if smb3_sync_session_ctx_passwords() fails, the
function returns immediately without freeing and erasing the newly
allocated new_password and new_password2. This causes both a memory leak
and a potential information leak.
Fix this by calling kfree_sensitive() on both password buffers before
returning in this error case.
Fixes: 0f0e357902957 ("cifs: during remount, make sure passwords are in sync") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
Evan Lambert [Wed, 24 Dec 2025 12:44:22 +0000 (12:44 +0000)]
drm/msm: Replace unsafe snprintf usage with scnprintf
The refill_buf function uses snprintf to append to a fixed-size buffer.
snprintf returns the length that would have been written, which can
exceed the remaining buffer size. If this happens, ptr advances beyond
the buffer and rem becomes negative. In the 2nd iteration, rem is
treated as a large unsigned integer, causing snprintf to write oob.
While this behavior is technically mitigated by num_perfcntrs being
locked at 5, it's still unsafe if num_perfcntrs were ever to change/a
second source was added.
vhost/vsock: improve RCU read sections around vhost_vsock_get()
vhost_vsock_get() uses hash_for_each_possible_rcu() to find the
`vhost_vsock` associated with the `guest_cid`. hash_for_each_possible_rcu()
should only be called within an RCU read section, as mentioned in the
following comment in include/linux/rculist.h:
/**
* hlist_for_each_entry_rcu - iterate over rcu list of given type
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the hlist_node within the struct.
* @cond: optional lockdep expression if called from non-RCU protection.
*
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as hlist_add_head_rcu()
* as long as the traversal is guarded by rcu_read_lock().
*/
Currently, all calls to vhost_vsock_get() are between rcu_read_lock()
and rcu_read_unlock() except for calls in vhost_vsock_set_cid() and
vhost_vsock_reset_orphans(). In both cases, the current code is safe,
but we can make improvements to make it more robust.
About vhost_vsock_set_cid(), when building the kernel with
CONFIG_PROVE_RCU_LIST enabled, we get the following RCU warning when the
user space issues `ioctl(dev, VHOST_VSOCK_SET_GUEST_CID, ...)` :
WARNING: suspicious RCU usage
6.18.0-rc7 #62 Not tainted
-----------------------------
drivers/vhost/vsock.c:74 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by rpc-libvirtd/3443:
#0: ffffffffc05032a8 (vhost_vsock_mutex){+.+.}-{4:4}, at: vhost_vsock_dev_ioctl+0x2ff/0x530 [vhost_vsock]
This is not a real problem, because the vhost_vsock_get() caller, i.e.
vhost_vsock_set_cid(), holds the `vhost_vsock_mutex` used by the hash
table writers. Anyway, to prevent that warning, add lockdep_is_held()
condition to hash_for_each_possible_rcu() to verify that either the
caller is in an RCU read section or `vhost_vsock_mutex` is held when
CONFIG_PROVE_RCU_LIST is enabled; and also clarify the comment for
vhost_vsock_get() to better describe the locking requirements and the
scope of the returned pointer validity.
About vhost_vsock_reset_orphans(), currently this function is only
called via vsock_for_each_connected_socket(), which holds the
`vsock_table_lock` spinlock (which is also an RCU read-side critical
section). However, add an explicit RCU read lock there to make the code
more robust and explicit about the RCU requirements, and to prevent
issues if the calling context changes in the future or if
vhost_vsock_reset_orphans() is called from other contexts.
Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack callers") Cc: stefanha@redhat.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20251126133826.142496-1-sgarzare@redhat.com>
Message-ID: <20251126210313.GA499503@fedora> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
oot build tends to help uncover bugs so it's worth keeping around,
as long as it's low effort.
add stubs for a couple of macros virtio gained recently,
and disable vdpa in the test build.
tools/virtio: add dev_WARN_ONCE and is_vmalloc_addr stubs
Add dev_WARN_ONCE and is_vmalloc_addr stubs needed by virtio_ring.c.
is_vmalloc_addr stub always returns false - that's fine since it's
merely a sanity check.
Add #undef __user before and after including compiler_types.h to avoid
redefinition warnings when compiling with system headers that also
define __user. This allows tools/virtio to build without warnings.
Nikolay Kuratov [Thu, 11 Dec 2025 09:36:30 +0000 (12:36 +0300)]
drm/msm/dpu: Add missing NULL pointer check for pingpong interface
It is checked almost always in dpu_encoder_phys_wb_setup_ctl(), but in a
single place the check is missing.
Also use convenient locals instead of phys_enc->* where available.
Cc: stable@vger.kernel.org Fixes: d7d0e73f7de33 ("drm/msm/dpu: introduce the dpu_encoder_phys_* for writeback") Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/693860/ Link: https://lore.kernel.org/r/20251211093630.171014-1-kniv@yandex-team.ru Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Randy Dunlap [Fri, 19 Dec 2025 18:46:38 +0000 (10:46 -0800)]
drm/msm: msm_iommu.c: fix all kernel-doc warnings
Correct or add kernel-doc comments to eliminate all warnings:
Warning: ../drivers/gpu/drm/msm/msm_iommu.c:381 expecting prototype for
alloc_pt(). Prototype was for msm_iommu_pagetable_alloc_pt() instead
Warning: ../drivers/gpu/drm/msm/msm_iommu.c:426 expecting prototype for
free_pt(). Prototype was for msm_iommu_pagetable_free_pt() instead
Randy Dunlap [Fri, 19 Dec 2025 18:46:37 +0000 (10:46 -0800)]
drm/msm: msm_gpu.h: fix all kernel-doc warnings
Correct or add kernel-doc comments to eliminate all warnings:
Warning: drivers/gpu/drm/msm/msm_gpu.h:119 Incorrect use of kernel-doc
format: * devfreq: devfreq instance
Warning: drivers/gpu/drm/msm/msm_gpu.h:125 Incorrect use of kernel-doc
format: * idle_freq:
Warning: drivers/gpu/drm/msm/msm_gpu.h:136 Incorrect use of kernel-doc
format: * boost_constraint:
Warning: drivers/gpu/drm/msm/msm_gpu.h:144 Incorrect use of kernel-doc
format: * busy_cycles: Last busy counter value, for calculating elapsed
busy
Warning: drivers/gpu/drm/msm/msm_gpu.h:156 Incorrect use of kernel-doc
format: * idle_work:
Warning: drivers/gpu/drm/msm/msm_gpu.h:163 Incorrect use of kernel-doc
format: * boost_work:
Warning: drivers/gpu/drm/msm/msm_gpu.h:170 struct member 'devfreq' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:170 struct member 'boost_freq' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'devfreq' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'lock' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'governor' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'we are
continuing to sample busyness and * adjust frequency while the GPU is
idle' not described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'boost_freq' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'busy_cycles'
not described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'time' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'idle_time' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'idle_work' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'boost_work' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:172 struct member 'suspended' not
described in 'msm_gpu_devfreq'
Warning: drivers/gpu/drm/msm/msm_gpu.h:472 No description found for
return value of 'msm_context_is_vmbind'
Warning: drivers/gpu/drm/msm/msm_gpu.h:476 struct member 'ref' not
described in 'msm_context'
Warning: drivers/gpu/drm/msm/msm_gpu.h:476 struct member 'elapsed_ns' not
described in 'msm_context'
Warning: drivers/gpu/drm/msm/msm_gpu.h:492 expecting prototype for
msm_context_is_vm_bind(). Prototype was for msm_context_is_vmbind()
instead
Warning: drivers/gpu/drm/msm/msm_gpu.h:523 No description found for
return value of 'msm_gpu_convert_priority'
Warning: drivers/gpu/drm/msm/msm_gpu.h:583 expecting prototype for
struct msm_gpu_submitqueues. Prototype was for struct msm_gpu_submitqueue
instead
Randy Dunlap [Fri, 19 Dec 2025 18:46:35 +0000 (10:46 -0800)]
drm/msm: msm_fence.h: fix all kernel-doc warnings
Correct or add kernel-doc comments to eliminate all warnings:
Warning: drivers/gpu/drm/msm/msm_fence.h:27 Incorrect use of kernel-doc
format: * last_fence:
Warning: drivers/gpu/drm/msm/msm_fence.h:36 Incorrect use of kernel-doc
format: * completed_fence:
Warning: drivers/gpu/drm/msm/msm_fence.h:44 Incorrect use of kernel-doc
format: * fenceptr:
Warning: drivers/gpu/drm/msm/msm_fence.h:65 Incorrect use of kernel-doc
format: * next_deadline_fence:
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'dev' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'name' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'context' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'index' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'fence' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'there is no
remaining pending work */ uint32_t last_fence' not described in
'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'updated from the CPU after interrupt * from GPU */ uint32_t completed_fence' not described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'fenceptr' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'spinlock' not
described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'next_deadline'
not described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member
'next_deadline_fence' not described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'deadline_timer'
not described in 'msm_fence_context'
Warning: drivers/gpu/drm/msm/msm_fence.h:74 struct member 'deadline_work'
not described in 'msm_fence_context'
Randy Dunlap [Fri, 19 Dec 2025 18:46:34 +0000 (10:46 -0800)]
drm/msm/dpu: dpu_hw_wb.h: fix all kernel-doc warnings
Correct or add kernel-doc comments to eliminate all warnings:
Warning: drivers/gpu/drm/msm/disp/dpu1/dpu_hw_wb.h:24 Cannot find
identifier on line: *
Warning: drivers/gpu/drm/msm/disp/dpu1/dpu_hw_wb.h:57 struct member
'setup_roi' not described in 'dpu_hw_wb_ops'
Warning: drivers/gpu/drm/msm/disp/dpu1/dpu_hw_wb.h:75 struct member
'caps' not described in 'dpu_hw_wb'