git.ipfire.org Git - thirdparty/kernel/linux.git/log

Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing

vmbus_alloc_synic_and_connect() declares a local 'int
hyperv_cpuhp_online' that shadows the file-scope global of the same
name. The cpuhp state returned by cpuhp_setup_state() is stored in
the local, leaving the global at 0 (CPUHP_OFFLINE). When
hv_kexec_handler() or hv_machine_shutdown() later call
cpuhp_remove_state(hyperv_cpuhp_online) they pass 0, which hits the
BUG_ON in __cpuhp_remove_state_cpuslocked().

Remove the local declaration so the cpuhp state is stored in the
file-scope global where hv_kexec_handler() and hv_machine_shutdown()
expect it.

Fixes: 2647c96649ba ("Drivers: hv: Support establishing the confidential VMBus connection")
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

mshv: Add tracepoint for GPA intercept handling

Provide visibility into GPA intercept operations for debugging and
performance analysis of Microsoft Hypervisor guest memory management.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>

pwm: atmel-tcb: Cache clock rates and mark chip as atomic

atmel_tcb_pwm_apply() holds tcbpwmc->lock as a spinlock via
guard(spinlock)() and then calls atmel_tcb_pwm_config(), which calls
clk_get_rate() twice. clk_get_rate() acquires clk_prepare_lock (a
mutex), so this is a sleep-in-atomic-context violation.

On CONFIG_DEBUG_ATOMIC_SLEEP kernels every pwm_apply_state() that
enables or reconfigures the PWM triggers a "BUG: sleeping function
called from invalid context" warning.

Acquire exclusive control over the clock rates with
clk_rate_exclusive_get() at probe time and cache the rates in struct
atmel_tcb_pwm_chip, then read the cached rates from
atmel_tcb_pwm_config(). This keeps the spinlock-based mutual exclusion
introduced in commit 37f7707077f5 ("pwm: atmel-tcb: Fix race condition
and convert to guards") and removes the sleeping calls from the atomic
section.

With no sleeping calls left in .apply() and the regmap-mmio bus already
running with fast_io=true, also mark the chip as atomic so consumers
can use pwm_apply_atomic() from atomic context.

Fixes: 37f7707077f5 ("pwm: atmel-tcb: Fix race condition and convert to guards")
Signed-off-by: Sangyun Kim <sangyun.kim@snu.ac.kr>
Link: https://patch.msgid.link/20260419080838.3192357-1-sangyun.kim@snu.ac.kr
[ukleinek: Ensure .clk is enabled before calling clk_get_rate on it.]
Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>

io_uring: take page references for NOMMU pbuf_ring mmaps

Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
virtual address of the io_mapped_region's backing pages directly;
the user's VMA aliases the kernel allocation. io_uring_mmap() then
just returns 0 -- it takes no page references.

The CONFIG_MMU path uses vm_insert_pages(), which takes a reference on
each inserted page.  Those references are released when the VMA is torn
down (zap_pte_range -> put_page). io_free_region() -> release_pages()
drops the io_uring-side references, but the pages survive until munmap
drops the VMA-side references.

Under NOMMU there are no VMA-side references. io_unregister_pbuf_ring ->
io_put_bl -> io_free_region -> release_pages drops the only references
and the pages return to the buddy allocator while the user's VMA still
has vm_start pointing into them.  The user can then write into whatever
the allocator hands out next.

Mirror the MMU lifetime: take get_page references in io_uring_mmap() and
release them via vm_ops->close.  NOMMU's delete_vma() calls vma_close()
which runs ->close on munmap.

This also incidentally addresses the duplicate-vm_start case: two mmaps
of SQ_RING and CQ_RING resolve to the same ctx->ring_region pointer.
With page refs taken per mmap, the second mmap takes its own refs and
the pages survive until both mmaps are closed.  The nommu rb-tree BUG_ON
on duplicate vm_start is a separate mm/nommu.c concern (it should share
the existing region rather than BUG), but the page lifetime is now
correct.

Cc: Jens Axboe <axboe@kernel.dk>
Reported-by: Anthropic
Assisted-by: gkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026042115-body-attention-d15b@gregkh
[axboe: get rid of region lookup, just iterate pages in vma]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'probes-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:
"fprobe bug fixes:

   - Prevent re-registration

     Add an earlier check to reject re-registering an already active
     fprobe before its state is modified during the initialization phase

   - Robustness in failure paths:
      - Ensure fprobes are correctly removed from all internal tables
        and properly RCU-freed during registration failure
      - Make unregister_fprobe() proceed with unregistration even if
        temporary memory allocation fails

   - RCU safety in module unloading

     Avoid a potential "sleep in RCU" warning by removing a kcalloc()
     call in the module notifier path. This also tries to remove
     fprobe_hash_node even if memory allocation fails.

   - Type-aware unregistration

     Fix a bug where unregistering an fprobe did not account for
     different types (entry-only vs entry-exit) at the same address,
     which previously left "junk" entries in the underlying
     ftrace/fgraph ops

   - Unregistration of empty ftrace_ops

     Avoid unneeded performance overhead due to making registered
     ftrace_ops empty - which means 'trace all functions'. This counts
     remaining entries and unregister ftrace_ops when it becomes empty.

  Two new selftests to check above fixes:

   - Module Unloading Test:

     Specifically verifies that fprobe events on a module are correctly
     cleaned up and do not trigger 'trace-all' behavior when the module
     is removed.

   - Multiple Fprobe Events Test:

     Ensure that having multiple fprobes on the same function correctly
     manages the ftrace hash map during removal"

* tag 'probes-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  selftests/ftrace: Add a testcase for multiple fprobe events
  selftests/ftrace: Add a testcase for fprobe events on module
  tracing/fprobe: Fix to unregister ftrace_ops if it is empty on module unloading
  tracing/fprobe: Check the same type fprobe on table as the unregistered one
  tracing/fprobe: Avoid kcalloc() in rcu_read_lock section
  tracing/fprobe: Remove fprobe from hash in failure path
  tracing/fprobe: Unregister fprobe even if memory allocation fails
  tracing/fprobe: Reject registration of a registered fprobe before init

io_uring/poll: ensure EPOLL_ONESHOT is propagated for EPOLL_URING_WAKE

Commit:

aacf2f9f382c ("io_uring: fix req->apoll_events")

fixed an issue where poll->events and req->apoll_events weren't
synchronized, but then when the commit referenced in Fixes got added,
it didn't ensure the same thing.

If we mask in EPOLLONESHOT in the regular EPOLL_URING_WAKE path, then
ensure it's done for both. Including a link to the original report
below, even though it's mostly nonsense. But it includes a reproducer
that does show that IORING_CQE_F_MORE is set in the previous CQE,
while no more CQEs will be generated for this request. Just ignore
anything that pretends this is security related in any way, it's just
the typical AI nonsense.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/io-uring/CAM0zi7yQzF3eKncgHo4iVM5yFLAjsiob_ucqyWKs=hyd_GqiMg@mail.gmail.com/
Reported-by: Azizcan Daştan <azizcan.d@mileniumsec.com>
Fixes: 4464853277d0 ("io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'amd-drm-next-7.1-2026-04-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-7.1-2026-04-17:

amdgpu:
- SMU 14 fixes
- Partition fixes
- SMUIO 15.x fix
- SR-IOV fixes
- JPEG fix
- PSP 15.x fix
- NBIF fix
- Devcoredump fixes
- DPC fix
- RAS fixes
- Aldebaran smu fix
- IP discovery fix
- SDMA 7.1 fix
- Runtime pm fix
- MES 12.1 fix
- DML2 fixes
- DCN 4.2 fixes
- YCbCr fixes
- Freesync fixes
- ISM fixes
- Overlay cursor fix
- DC FP fixes
- UserQ locking fixes

amdkfd:
- Fix memory clear handling

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260417225351.8714-1-alexander.deucher@amd.com

Merge tag 'drm-next-2026-04-22' of https://gitlab.freedesktop.org/drm/kernel

Pull more drm updates from Dave Airlie:
"This is a followup which is mostly next material with some fixes.

  Alex pointed out I missed one of his AMD MRs from last week, so I
  added that, then Jani sent the pipe reordering stuff, otherwise it's
  just some minor i915 fixes and a dma-buf fix.

  drm:
   - Add support for AMD VSDB parsing to drm_edid

  dma-buf:
   - fix documentation formatting

  i915:
   - add support for reordered pipes to support joined pipes better
   - Fix VESA backlight possible check condition
   - Verify the correct plane DDB entry

  amdgpu:
   - Audio regression fix
   - Use drm edid parser for AMD VSDB
   - Misc cleanups
   - VCE cs parse fixes
   - VCN cs parse fixes
   - RAS fixes
   - Clean up and unify vram reservation handling
   - GPU Partition updates
   - system_wq cleanups
   - Add CONFIG_GCOV_PROFILE_AMDGPU kconfig option
   - SMU vram copy updates
   - SMU 13/14/15 fixes
   - UserQ fixes
   - Replace pasid idr with an xarray
   - Dither handling fix
   - Enable amdgpu by default for CIK APUs
   - Add IBs to devcoredump

  amdkfd:
   - system_wq cleanups

  radeon:
   - system_wq cleanups"

* tag 'drm-next-2026-04-22' of https://gitlab.freedesktop.org/drm/kernel: (62 commits)
  drm/i915/display: change pipe allocation order for discrete platforms
  drm/i915/wm: Verify the correct plane DDB entry
  drm/i915/backlight: Fix VESA backlight possible check condition
  drm/i915: Walk crtcs in pipe order
  drm/i915/joiner: Make joiner "nomodeset" state copy independent of pipe order
  dma-buf: fix htmldocs error for dma_buf_attach_revocable
  drm/amdgpu: dump job ibs in the devcoredump
  drm/amdgpu: store ib info for devcoredump
  drm/amdgpu: extract amdgpu_vm_lock_by_pasid from amdgpu_vm_handle_fault
  drm/amdgpu: Use amdgpu by default for CIK APUs too
  drm/amd/display: Remove unused NUM_ELEMENTS macros
  drm/amd/display: Replace inline NUM_ELEMENTS macro with ARRAY_SIZE
  drm/amdgpu: save ring content before resetting the device
  drm/amdgpu: make userq fence_drv drop explicit in queue destroy
  drm/amdgpu: rework userq fence driver alloc/destroy
  drm/amdgpu/userq: use dma_fence_wait_timeout without test for signalled
  drm/amdgpu/userq: call dma_resv_wait_timeout without test for signalled
  drm/amdgpu/userq: add the return code too in error condition
  drm/amdgpu/userq: fence wait for max time in amdgpu_userq_wait_for_signal
  drm/amd/display: Change dither policy for 10 bpc output back to dithering
  ...

selftests/ftrace: Add a testcase for multiple fprobe events

Add a testcase for multiple fprobe events on the same function
so that it clears ftrace hash map correctly when removing the
events.

Link: https://lore.kernel.org/all/177669370353.132053.16801520791509406141.stgit@mhiramat.tok.corp.google.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

selftests/ftrace: Add a testcase for fprobe events on module

Add a testcase for fprobe events on module, which unloads a kernel
module on which fprobe events are probing and ensure the ftrace
hash map is cleared correctly.

Link: https://lore.kernel.org/all/177669369564.132053.623527664540176496.stgit@mhiramat.tok.corp.google.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

tracing/fprobe: Fix to unregister ftrace_ops if it is empty on module unloading

Fix fprobe to unregister ftrace_ops if corresponding type of fprobe
does not exist on the fprobe_ip_table and it is expected to be empty
when unloading modules.

Since ftrace thinks that the empty hash means everything to be traced,
if we set fprobes only on the unloaded module, all functions are traced
unexpectedly after unloading module.
e.g.

# modprobe xt_LOG.ko
# echo 'f:test log_tg*' > dynamic_events
# echo 1 > events/fprobes/test/enable
# cat enabled_functions
log_tg [xt_LOG] (1)             tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490
log_tg_check [xt_LOG] (1)               tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490
log_tg_destroy [xt_LOG] (1)             tramp: 0xffffffffa0004000 (fprobe_ftrace_entry+0x0/0x490) ->fprobe_ftrace_entry+0x0/0x490
# rmmod xt_LOG
# wc -l enabled_functions
34085 enabled_functions

Link: https://lore.kernel.org/all/177669368776.132053.10042301916765771279.stgit@mhiramat.tok.corp.google.com/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

ceph: add subvolume metrics collection and reporting

Add complete infrastructure for per-subvolume I/O metrics collection
and reporting to the MDS. This enables administrators to monitor I/O
patterns at the subvolume granularity, which is useful for multi-tenant
CephFS deployments.

This patch adds:
- CEPHFS_FEATURE_SUBVOLUME_METRICS feature flag for MDS negotiation
- CEPH_SUBVOLUME_ID_NONE constant (0) for unknown/unset state
- Red-black tree based metrics tracker for efficient per-subvolume
aggregation with kmem_cache for entry allocations
- Wire format encoding matching the MDS C++ AggregatedIOMetrics struct
- Integration with the existing CLIENT_METRICS message
- Recording of I/O operations from file read/write and writeback paths
- Debugfs interfaces for monitoring (metrics/subvolumes, metrics/metric_features)

Metrics tracked per subvolume include:
- Read/write operation counts
- Read/write byte counts
- Read/write latency sums (for average calculation)

The metrics are periodically sent to the MDS as part of the existing
metrics reporting infrastructure when the MDS advertises support for
the SUBVOLUME_METRICS feature.

CEPH_SUBVOLUME_ID_NONE enforces subvolume_id immutability. Following
the FUSE client convention, 0 means unknown/unset. Once an inode has
a valid (non-zero) subvolume_id, it should not change during the
inode's lifetime.

Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: parse subvolume_id from InodeStat v9 and store in inode

Add support for parsing the subvolume_id field from InodeStat v9 and
storing it in the inode for later use by subvolume metrics tracking.

The subvolume_id identifies which CephFS subvolume an inode belongs to,
enabling per-subvolume I/O metrics collection and reporting.

This patch:
- Adds subvolume_id field to struct ceph_mds_reply_info_in
- Adds i_subvolume_id field to struct ceph_inode_info
- Parses subvolume_id from v9 InodeStat in parse_reply_info_in()
- Adds ceph_inode_set_subvolume() helper to propagate the ID to inodes
- Initializes i_subvolume_id in inode allocation and clears on destroy

Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: handle InodeStat v8 versioned field in reply parsing

Add forward-compatible handling for the new versioned field introduced
in InodeStat v8. This patch only skips the field without using it,
preparing for future protocol extensions.

The v8 encoding adds a versioned sub-structure that needs to be properly
decoded and skipped to maintain compatibility with newer MDS versions.

Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

libceph: Fix slab-out-of-bounds access in auth message processing

If a (potentially corrupted) message of type CEPH_MSG_AUTH_REPLY
contains a positive value in its result field, it is treated as an
error code by ceph_handle_auth_reply() and returned to
handle_auth_reply(). Thereafter, an attempt is made to send the
preallocated message of type CEPH_MSG_AUTH, where the returned value is
interpreted as the size of the front segment to send. If the result
value in the message is greater than the size of the memory buffer
allocated for the front segment, an out-of-bounds access occurs, and
the content of the memory region beyond this buffer is sent out.

This patch fixes the issue by treating only negative values in the
result field as errors. Positive values are therefore treated as success
in the same way as a zero value. Additionally, a BUG_ON is added to
__send_prepared_auth_request() comparing the len parameter to
front_alloc_len to prevent sending the message if it exceeds the bounds
of the allocation and to make it easier to catch any logic flaws leading
to this.

Cc: stable@vger.kernel.org
Signed-off-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

rbd: fix null-ptr-deref when device_add_disk() fails

do_rbd_add() publishes the device with device_add() before calling
device_add_disk(). If device_add_disk() fails after device_add()
succeeds, the error path calls rbd_free_disk() directly and then later
falls through to rbd_dev_device_release(), which calls rbd_free_disk()
again. This double teardown can leave blk-mq cleanup operating on
invalid state and trigger a null-ptr-deref in
__blk_mq_free_map_and_rqs(), reached from blk_mq_free_tag_set().

Fix this by following the normal remove ordering: call device_del()
before rbd_dev_device_release() when device_add_disk() fails after
device_add(). That keeps the teardown sequence consistent and avoids
re-entering disk cleanup through the wrong path.

The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available.

We reproduced the bug on v7.0 with a real Ceph backend and a QEMU x86_64
guest booted with KASAN and CONFIG_FAILSLAB enabled. The reproducer
confines failslab injections to the __add_disk() range and injects
fail-nth while mapping an RBD image through
/sys/bus/rbd/add_single_major.

On the unpatched kernel, fail-nth=4 reliably triggered the fault:

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 UID: 0 PID: 273 Comm: bash Not tainted 7.0.0-01247-gd60bc1401583 #6 PREEMPT(lazy)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:__blk_mq_free_map_and_rqs+0x8c/0x240
Code: 00 00 48 8b 6b 60 41 89 f4 49 c1 e4 03 4c 01 e5 45 85 ed 0f 85 0a 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 e9 48 c1 e9 03 <80> 3c 01 00 0f 85 31 01 00 00 4c 8b 6d 00 4d 85 ed 0f 84 e2 00 00
RSP: 0018:ff1100000ab0fac8 EFLAGS: 00000246
RAX: dffffc0000000000 RBX: ff1100000c4806a0 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000000 RDI: ff1100000c4806f4
RBP: 0000000000000000 R08: 0000000000000001 R09: ffe21c000189001b
R10: ff1100000c4800df R11: ff1100006cf37be0 R12: 0000000000000000
R13: 0000000000000000 R14: ff1100000c480700 R15: ff1100000c480004
FS: 00007f0fbe8fe740(0000) GS:ff110000e5851000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe53473b2e0 CR3: 0000000012eef000 CR4: 00000000007516f0
PKRU: 55555554
Call Trace:
<TASK>
blk_mq_free_tag_set+0x77/0x460
do_rbd_add+0x1446/0x2b80
? __pfx_do_rbd_add+0x10/0x10
? lock_acquire+0x18c/0x300
? find_held_lock+0x2b/0x80
? sysfs_file_kobj+0xb6/0x1b0
? __pfx_sysfs_kf_write+0x10/0x10
kernfs_fop_write_iter+0x2f4/0x4a0
vfs_write+0x98e/0x1000
? expand_files+0x51f/0x850
? __pfx_vfs_write+0x10/0x10
ksys_write+0xf2/0x1d0
? __pfx_ksys_write+0x10/0x10
do_syscall_64+0x115/0x690
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f0fbea15907
Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007ffe22346ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000058 RCX: 00007f0fbea15907
RDX: 0000000000000058 RSI: 0000563ace6c0ef0 RDI: 0000000000000001
RBP: 0000563ace6c0ef0 R08: 0000563ace6c0ef0 R09: 6b6435726d694141
R10: 5250337279762f78 R11: 0000000000000246 R12: 0000000000000058
R13: 00007f0fbeb1c780 R14: ff1100000c480700 R15: ff1100000c480004
</TASK>

With this fix applied, rerunning the reproducer over fail-nth=1..256
yields no KASAN reports.

[ idryomov: rename err_out_device_del -> err_out_device ]

Cc: stable@vger.kernel.org
Fixes: 27c97abc30e2 ("rbd: add add_disk() error handling")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

crush: cleanup in crush_do_rule() method

Commit 41ebcc0907c5 ("crush: remove forcefeed functionality") from
May 7, 2012 (linux-next), leads to the following Smatch static
checker warning:

net/ceph/crush/mapper.c:1015 crush_do_rule()
warn: iterator 'j' not incremented

Before commit 41ebcc0907c5 ("crush: remove forcefeed functionality"),
we had this logic:

  j = 0;
  if (osize == 0 && force_pos >= 0) {
      o[osize] = force_context[force_pos];
      if (recurse_to_leaf)
          c[osize] = force_context[0];
      j++;           /* <-- this was the only increment, now gone */
      force_pos--;
  }
  /* then crush_choose_*(..., o+osize, j, ...) */

Now, the variable j is dead code — a variable that is set
and never meaningfully varied. This patch simply removes
the dead code.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Reviewed-by: Alex Markuze <amarkuze@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: clear s_cap_reconnect when ceph_pagelist_encode_32() fails

This MDS reconnect error path leaves s_cap_reconnect set.
send_mds_reconnect() sets the bit at the beginning of the reconnect,
but the first failing operation after that, ceph_pagelist_encode_32(),
can jump to `fail:` without clearing it.

__ceph_remove_cap() consults that flag to decide whether cap releases
should be queued. A reconnect-preparation failure therefore leaves the
session in reconnect mode from the cap-release path's point of view
and can strand release work until some later state transition repairs
it.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: only d_add() negative dentries when they are unhashed

Ceph can call d_add(dentry, NULL) on a negative dentry that is already
present in the primary dcache hash.

In the current VFS that is not safe.  d_add() goes through __d_add()
to __d_rehash(), which unconditionally reinserts dentry->d_hash into
the hlist_bl bucket.  If the dentry is already hashed, reinserting the
same node can corrupt the bucket, including creating a self-loop.
Once that happens, __d_lookup() can spin forever in the hlist_bl walk,
typically looping only on the d_name.hash mismatch check and
eventually triggering RCU stall reports like this one:

rcu: INFO: rcu_sched self-detected stall on CPU
rcu:         87-....: (2100 ticks this GP) idle=3a4c/1/0x4000000000000000 softirq=25003319/25003319 fqs=829
rcu:         (t=2101 jiffies g=79058445 q=698988 ncpus=192)
CPU: 87 UID: 2952868916 PID: 3933303 Comm: php-cgi8.3 Not tainted 6.18.17-i1-amd #950 NONE
Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.6 09/22/2023
RIP: 0010:__d_lookup+0x46/0xb0
Code: c1 e8 07 48 8d 04 c2 48 8b 00 49 89 fc 49 89 f5 48 89 c3 48 83 e3 fe 48 83 f8 01 77 0f eb 2d 0f 1f 44 00 00 48 8b 1b 48 85 db <74> 20 39 6b 18 75 f3 48 8d 7b 78 e8 ba 85 d0 00 4c 39 63 10 74 1f
RSP: 0018:ff745a70c8253898 EFLAGS: 00000282
RAX: ff26e470054cb208 RBX: ff26e470054cb208 RCX: 000000006e958966
RDX: ff26e48267340000 RSI: ff745a70c82539b0 RDI: ff26e458f74655c0
RBP: 000000006e958966 R08: 0000000000000180 R09: 9cd08d909b919a89
R10: ff26e458f74655c0 R11: 0000000000000000 R12: ff26e458f74655c0
R13: ff745a70c82539b0 R14: d0d0d0d0d0d0d0d0 R15: 2f2f2f2f2f2f2f2f
FS:  00007f5770896980(0000) GS:ff26e482c5d88000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5764de50c0 CR3: 000000a72abb5001 CR4: 0000000000771ef0
PKRU: 55555554
Call Trace:
  <TASK>
  lookup_fast+0x9f/0x100
  walk_component+0x1f/0x150
  link_path_walk+0x20e/0x3d0
  path_lookupat+0x68/0x180
  filename_lookup+0xdc/0x1e0
  vfs_statx+0x6c/0x140
  vfs_fstatat+0x67/0xa0
  __do_sys_newfstatat+0x24/0x60
  do_syscall_64+0x6a/0x230
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

This is reachable with reused cached negative dentries.  A Ceph lookup
or atomic_open can be handed a negative dentry that is already hashed,
and fs/ceph/dir.c then hits one of two paths that incorrectly assume
"negative" also means "unhashed":

  - ceph_finish_lookup():
      MDS reply is -ENOENT with no trace
      -> d_add(dentry, NULL)

  - ceph_lookup():
      local ENOENT fast path for a complete directory with shared caps
      -> d_add(dentry, NULL)

Both paths can therefore re-add an already-hashed negative dentry.

Ceph already uses the correct pattern elsewhere: ceph_fill_trace() only
calls d_add(dn, NULL) for a negative null-dentry reply when d_unhashed(dn)
is true.

Fix both fs/ceph/dir.c sites the same way: only call d_add() for a
negative dentry when it is actually unhashed.  If the negative dentry
is already hashed, leave it in place and reuse it as-is.

This preserves the existing behavior for unhashed dentries while
avoiding d_hash list corruption for reused hashed negatives.

Cc: stable@vger.kernel.org
Fixes: 2817b000b02c ("ceph: directory operations")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

libceph: update outdated comment in ceph_sock_write_space()

The function try_write() was renamed to ceph_con_v1_try_write()
in commit 566050e17e53 ("libceph: separate msgr1 protocol
implementation") and subsequently moved to net/ceph/messenger_v1.c
in commit 2f713615ddd9 ("libceph: move msgr1 protocol implementation
to its own file"). Update the comment in ceph_sock_write_space()
accordingly.

[ idryomov: account for msgr2 in the updated comment as well ]

Signed-off-by: kexinsun <kexinsun@smail.nju.edu.cn>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

libceph: Remove obsolete session key alignment logic

Since the call to crypto_shash_setkey() was replaced with
hmac_sha256_preparekey() which doesn't allocate memory regardless of the
alignment of the input key, remove the session key alignment logic from
process_auth_done(). Also remove the inclusion of crypto/hash.h, which
is no longer needed since crypto_shash is no longer used.

[ idryomov: rewrap comment ]

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: fix num_ops off-by-one when crypto allocation fails

move_dirty_folio_in_page_array() may fail if the file is encrypted, the
dirty folio is not the first in the batch, and it fails to allocate a
bounce buffer to hold the ciphertext. When that happens,
ceph_process_folio_batch() simply redirties the folio and flushes the
current batch -- it can retry that folio in a future batch.

However, if this failed folio is not contiguous with the last folio that
did make it into the batch, then ceph_process_folio_batch() has already
incremented `ceph_wbc->num_ops`; because it doesn't follow through and
add the discontiguous folio to the array, ceph_submit_write() -- which
expects that `ceph_wbc->num_ops` accurately reflects the number of
contiguous ranges (and therefore the required number of "write extent"
ops) in the writeback -- will panic the kernel:

BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops);

This issue can be reproduced on affected kernels by writing to
fscrypt-enabled CephFS file(s) with a 4KiB-written/4KiB-skipped/repeat
pattern (total filesize should not matter) and gradually increasing the
system's memory pressure until a bounce buffer allocation fails.

Fix this crash by decrementing `ceph_wbc->num_ops` back to the correct
value when move_dirty_folio_in_page_array() fails, but the folio already
started counting a new (i.e. still-empty) extent.

The defect corrected by this patch has existed since 2022 (see first
`Fixes:`), but another bug blocked multi-folio encrypted writeback until
recently (see second `Fixes:`). The second commit made it into 6.18.16,
6.19.6, and 7.0-rc1, unmasking the panic in those versions. This patch
therefore fixes a regression (panic) introduced by cac190c7674f.

Cc: stable@vger.kernel.org
Fixes: d55207717ded ("ceph: add encryption support to writepage and writepages")
Fixes: cac190c7674f ("ceph: fix write storm on fscrypted files")
Signed-off-by: Sam Edwards <CFSworks@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

libceph: Prevent potential null-ptr-deref in ceph_handle_auth_reply()

If a message of type CEPH_MSG_AUTH_REPLY contains a zero value for both
protocol and result, this is currently not treated as an error. In case
of ac->negotiating == true and ac->protocol > 0, this leads to setting
ac->protocol = 0 and ac->ops = NULL. Thereafter, the check for
ac->protocol != protocol returns false, and init_protocol() is not
called. Subsequently, ac->ops->handle_reply() is called, which leads to
a null pointer dereference, because ac->ops is still NULL.

This patch changes the check for ac->protocol != protocol to
!ac->protocol, as this also includes the case when the protocol was set
to zero in the message. This causes the message to be treated as
containing a bad auth protocol.

Cc: stable@vger.kernel.org
Signed-off-by: Raphael Zimmer <raphael.zimmer@tu-ilmenau.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

x86/cpu: Disable FRED when PTI is forced on

FRED and PTI were never intended to work together. No FRED hardware is
vulnerable to Meltdown and all of it should have LASS anyway.
Nevertheless, if you boot a system with pti=on and fred=on, the kernel
tries to do what is asked of it and dies a horrible death on the first
attempt to run userspace (since it never switches to the user page
tables).

Disable FRED when PTI is forced on, and print a warning about it.

A quick brain dump about what a FRED+PTI implementation would look like
is below. I'm not sure it would make any sense to do it, but never say
never. All I know is that it's way too complicated to be worth it today.

<brain dump>
The SWITCH_TO_USER/KERNEL_CR3 bits are simple to fix (or at least we
have the assembly tools to do it already), as is sticking the FRED entry
text in .entry.text (it's not in there today).

The nasty part is the stacks. Today, the CPU pops into the kernel on
MSR_IA32_FRED_RSP0 which is normal old kernel memory and not mapped to
userspace. The hardware pushes gunk on to MSR_IA32_FRED_RSP0, which is
currently the task stacks. MSR_IA32_FRED_RSP0 would need to point
elsewhere, probably cpu_entry_stack(). Then, start playing games with
stacks on entry/exit, including copying gunk to and from the task stack.

While I'd *like* to have PTI everywhere, I'm not sure it's worth mucking
up the FRED code with PTI kludges. If a user wants fast entry/exit, they
use FRED. If you want PTI (and sekuritay), you certainly don't care
about fast entry and FRED isn't going to help you *all* that much, so
you can just stay with the IDT.

Plus, FRED hardware should have LASS which gives you a similar security
profile to PTI without the CR3 munging.
</brain dump>

Reported-by: Gayatri Kammela <Gayatri.Kammela@amd.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>
Cc:stable@vger.kernel.org
Link: https://patch.msgid.link/20260421163136.E7C6788A@davehans-spike.ostc.intel.com

Merge tag 'f2fs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
"In this round, the changes primarily focus on resolving race
  conditions, memory safety issues (UAF), and improving the robustness
  of garbage collection (GC), and folio management.

  Enhancements:
   - add page-order information for large folio reads in iostat
   - add defrag_blocks sysfs node

  Bug fixes:
   - fix uninitialized kobject put in f2fs_init_sysfs()
   - disallow setting an extension to both cold and hot
   - fix node_cnt race between extent node destroy and writeback
   - preserve previous reserve_{blocks,node} value when remount
   - freeze GC and discard threads quickly
   - fix false alarm of lockdep on cp_global_sem lock
   - fix data loss caused by incorrect use of nat_entry flag
   - skip empty sections in f2fs_get_victim
   - fix inline data not being written to disk in writeback path
   - fix fsck inconsistency caused by FGGC of node block
   - fix fsck inconsistency caused by incorrect nat_entry flag usage
   - call f2fs_handle_critical_error() to set cp_error flag
   - fix fiemap boundary handling when read extent cache is incomplete
   - fix use-after-free of sbi in f2fs_compress_write_end_io()
   - fix UAF caused by decrementing sbi->nr_pages[] in f2fs_write_end_io()
   - fix incorrect file address mapping when inline inode is unwritten
   - fix incomplete search range in f2fs_get_victim when f2fs_need_rand_seg is enabled
   - avoid memory leak in f2fs_rename()"

* tag 'f2fs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (35 commits)
  f2fs: add page-order information for large folio reads in iostat
  f2fs: do not support mmap write for large folio
  f2fs: fix uninitialized kobject put in f2fs_init_sysfs()
  f2fs: protect extension_list reading with sb_lock in f2fs_sbi_show()
  f2fs: disallow setting an extension to both cold and hot
  f2fs: fix node_cnt race between extent node destroy and writeback
  f2fs: allow empty mount string for Opt_usr|grp|projjquota
  f2fs: fix to preserve previous reserve_{blocks,node} value when remount
  f2fs: invalidate block device page cache on umount
  f2fs: fix to freeze GC and discard threads quickly
  f2fs: fix to avoid uninit-value access in f2fs_sanity_check_node_footer
  f2fs: fix false alarm of lockdep on cp_global_sem lock
  f2fs: fix data loss caused by incorrect use of nat_entry flag
  f2fs: fix to skip empty sections in f2fs_get_victim
  f2fs: fix inline data not being written to disk in writeback path
  f2fs: fix fsck inconsistency caused by FGGC of node block
  f2fs: fix fsck inconsistency caused by incorrect nat_entry flag usage
  f2fs: fix to do sanity check on dcc->discard_cmd_cnt conditionally
  f2fs: refactor node footer flag setting related code
  f2fs: refactor f2fs_move_node_folio function
  ...

tools/power turbostat: Fix AMD RAPL regression on big systems

turbostat.c:8688: rapl_perf_init: Assertion `next_domain < num_domains' failed.

The initial fix for this regression was incomplete, as it did not
handle multi-package systems with sparse core ids.

Fixes: ef0e60083f76 ("tools/power turbostat: Fix AMD RAPL regression")
Signed-off-by: Len Brown <len.brown@intel.com>

Merge tag 'libnvdimm-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull dax updates from Ira Weiny:
"The series adds DAX support required for the upcoming fuse/famfs file
  system.[1] The support here is required because famfs is backed by
  devdax rather than pmem. This all lays the groundwork for using shared
  memory as a file system"

Link: https://lore.kernel.org/all/0100019d43e5f632-f5862a3e-361c-4b54-a9a6-96c242a8f17a-000000@email.amazonses.com/
* tag 'libnvdimm-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax/fsdev: fix uninitialized kaddr in fsdev_dax_zero_page_range()
  dax: export dax_dev_get()
  dax: Add fs_dax_get() func to prepare dax for fs-dax usage
  dax: Add dax_set_ops() for setting dax_operations at bind time
  dax: Add dax_operations for use by fs-dax on fsdev dax
  dax: Save the kva from memremap
  dax: add fsdev.c driver for fs-dax on character dax
  dax: Factor out dax_folio_reset_order() helper
  dax: move dax_pgoff_to_phys from [drivers/dax/] device.c to bus.c

drm/amd/display: Disable 10-bit truncation and dithering on DCE 6.x

DCE 6.x doesn't support 10-bit truncation and 10-bit dithering
because the following fields are 1-bit only:
FMT_TEMPORAL_DITHER_DEPTH
FMT_SPATIAL_DITHER_DEPTH
FMT_TRUNCATE_DEPTH
Programming these fields to "2" will program them as if the
dithering option was 6-bit, resulting in sub-par picture
quality and an ugly "color banding" effect.

Note that a recent commit changed the default 10-bit dithering
option to DITHER_OPTION_SPATIAL10 which improves the picture
quality because it happens to look better, but is still not
actually supported by DCE 6.x versions.

When the color depth is 10-bit or more, just disable
any kind of dithering options on DCE 6.x.

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5151
Fixes: 529cad0f945c ("drm/amd/display: Add function to set dither option")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6be8ced880dfe29ce38c2d5e74489822da5c250e)

drm/amdgpu: OR init_pte_flags into invalid leaf PTE updates

Invalid leaf clears that only set AMDGPU_PTE_EXECUTABLE match the old
GMC9 fault-priority workaround but omit adev->gmc.init_pte_flags.
On GFX12 that includes AMDGPU_PTE_IS_PTE; without it, some cleared
PTEs can fault as no-retry and bypass the SVM/XNACK handler when a
VA is reused after a BO unmap.

Apply init_pte_flags in amdgpu_vm_pte_update_flags() alongside
EXECUTABLE so range-driven clears (e.g. amdgpu_vm_clear_freed) match
amdgpu_vm_pt_clear() for leaf templates.

Signed-off-by: Siwei He <siwei.he@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9d47b2c36b9a6c6b844c33cab407a5d7ad102234)

Merge tag 'pull-coda' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull coda dcache updates from Al Viro:
"Coda dcache-related cleanups and fixes"

* tag 'pull-coda' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  coda_flag_children(): fix a UAF
  sanitize coda_dentry_delete()
  coda: is_bad_inode() is always false there

drm/amd: Adjust ASPM support quirk to cover more Intel hosts

Some of the same issues identified in commit c770ef19673fb
("drm/amd/amdgpu: disable ASPM in some situations") also affect
Tiger Lake systems with GFX11 connected over USB4. Widen the net
to also match these hosts.

Fixes: d9b3a066dfcd ("drm/amd: Exclude dGPUs in eGPU enclosures from DPM quirks")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5145
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0a214d888485b9f35fe03882a92962e6d5697849)

drm/amd/display: Undo accidental fix revert in amdgpu_dm_ism.c

[Why]

Pausing DPM power profiles during static screen caused a bunch of
audio/performance/clock issues that were addressed in this fix:
'commit 1412482b7143 ("Revert "drm/amd/display: pause the workload setting in dm"")'

This logic in function amdgpu_dm_crtc_vblank_control_worker() was moved
to amdgpu_dm_ism.c, but the fix was lost in the process.

[How]

Reapply the fix to amdgpu_dm_ism.c

Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bc621e91d6fc004cfae9148c5a91acad19ada3e4)

Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull more crypto library updates from Eric Biggers:
"Crypto library fix and documentation update:

   - Fix an integer underflow in the mpi library

   - Improve the crypto library documentation"

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crypto: docs: Add rst documentation to Documentation/crypto/
  docs: kdoc: Expand 'at_least' when creating parameter list
  lib/crypto: mpi: Fix integer underflow in mpi_read_raw_from_sgl()

io_uring/zcrx: warn on freelist violations

The freelist is appropriately sized to always be able to take a free
niov, but let's be more defensive and check the invariant with a
warning. That should help to catch any double-free issues.

Suggested-by: Kai Aizen <kai@snailsploit.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/2f3cea363b04649755e3b6bb9ab66485a95936d5.1776760901.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: clear RQ headers on init

It might be unexpected to users if the RQ head/tail after a ring
creation are not zeroed, fix that.

Cc: stable@vger.kernel.org
Fixes: 6f377873cb239 ("io_uring/zcrx: add interface queue and refill queue")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/331f94663c3e8f021ffa3cb770ca2844a07d4855.1776760911.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/zcrx: fix user_struct uaf

io_free_rbuf_ring() usees a struct user_struct, which
io_zcrx_ifq_free() puts it down before destroying the ring.

Cc: stable@vger.kernel.org
Fixes: 5c686456a4e83 ("io_uring/zcrx: add user_struct and mm_struct to io_zcrx_ifq")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/e560ae00960d27a810522a7efc0e201c82dff351.1776760917.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/register: fix ring resizing with mixed/large SQEs/CQEs

The ring resizing only properly handles "normal" sized SQEs or CQEs, if
there are pending entries around a resize. This normally should not be
the case, but the code is supposed to handle this regardless.

For the mixed SQE/CQE cases, the current copying works fine as they
are indexed in the same way. Each half is just copied separately. But
for fixed large SQEs and CQEs, the iteration and copy need to take that
into account.

Cc: stable@kernel.org
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/futex: ensure partial wakes are appropriately dequeued

If a FUTEX_WAITV vectored operation is only partially woken, we
should call __futex_wake_mark() on the queue to account for that.
If not, then a later wakeup will wake the same entry, rather than
the next one in line.

Fixes: 8f350194d5cfd ("io_uring: add support for vectored futex waits")
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/rw: add defensive hardening for negative kbuf lengths

No real bug here, just being a bit defensive in ensuring that whatever
gets passed into io_put_kbuf() is always >= 0 and not some random error
value.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/rsrc: use kvfree() for the imu cache

Currently anything that requires kvmalloc_flex() for allocations will
not get re-cached, and hence the cache freeing path is correct in that
it always uses kfree() to free the allocated memory. But this seems a
bit fragile as it's something that could get mix should that situation
change, so switch io_free_imu() and io_alloc_cache_free() to use kvfree
as the desctructor.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring/rsrc: unify nospec indexing for direct descriptors

For file updates, the node reset isn't capping the value via
array_index_nospec() like the other paths do. Ensure it's all sane and
have the update path do the proper capping as well.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

io_uring: fix spurious fput in registered ring path

Fix an issue with io_uring_ctx_get_file() not gating fput() on whether
or not the file descriptor is a registered/direct one or not.

Fixes: c5e9f6a96bf7 ("io_uring: unify getting ctx from passed in file descriptor")
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'erofs-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

- Fix dirent nameoff handling to avoid out-of-bound reads
   out of crafted images

- Fix two type truncation issues on 32-bit platforms

* tag 'erofs-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: unify lcn as u64 for 32-bit platforms
  erofs: fix offset truncation when shifting pgoff on 32-bit platforms
  erofs: fix the out-of-bounds nameoff handling for trailing dirents

vfio/cdx: Consolidate MSI configured state onto cdx_irqs

struct vfio_cdx_device carries three fields that track whether MSI has
been configured: vdev->cdx_irqs (the allocated vector array), vdev->
msi_count (the array length), and vdev->config_msi (a boolean flag).
The three are set together when vfio_cdx_msi_enable() succeeds and
cleared together by vfio_cdx_msi_disable().  However, the error paths
in vfio_cdx_msi_enable() free the cdx_irqs allocation on failure
without resetting the pointer, leaving it stale and skewed from the
other two fields until the next enable call overwrites it.

Clear vdev->cdx_irqs to NULL alongside the kfree() in both error paths
so the pointer consistently reflects the configured state.  With that
invariant restored and access to the MSI state serialized by
cdx_irqs_lock, vdev->config_msi is fully redundant with
(vdev->cdx_irqs != NULL).  Drop the config_msi field and switch all
readers to test cdx_irqs directly.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20260417202800.88287-4-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/cdx: Serialize VFIO_DEVICE_SET_IRQS with a per-device mutex

vfio_cdx_set_msi_trigger() reads vdev->config_msi and operates on the
vdev->cdx_irqs array based on its value, but provides no serialization
against concurrent VFIO_DEVICE_SET_IRQS ioctls. Two callers can race
such that one observes config_msi as set while another clears it and
frees cdx_irqs via vfio_cdx_msi_disable(), resulting in a use-after-free
of the cdx_irqs array.

Add a cdx_irqs_lock mutex to struct vfio_cdx_device and acquire it in
vfio_cdx_set_msi_trigger(), which is the single chokepoint through
which all updates to config_msi, cdx_irqs, and msi_count flow, covering
both the ioctl path and the close-device cleanup path. This keeps the
test of config_msi atomic with the subsequent enable, disable, or
trigger operations.

Drop the pre-call !cdx_irqs test from vfio_cdx_irqs_cleanup() as part
of this change: the optimization it provided is redundant with the
!config_msi early-return inside vfio_cdx_msi_disable(), and leaving the
test in place would be an unsynchronized read of state the new lock is
meant to protect.

Fixes: 848e447e000c ("vfio/cdx: add interrupt support")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20260417202800.88287-3-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/cdx: Fix NULL pointer dereference in interrupt trigger path

Add validation to ensure MSI is configured before accessing cdx_irqs
array in vfio_cdx_set_msi_trigger(). Without this check, userspace
can trigger a NULL pointer dereference by calling VFIO_DEVICE_SET_IRQS
with VFIO_IRQ_SET_DATA_BOOL or VFIO_IRQ_SET_DATA_NONE flags before
ever setting up interrupts via VFIO_IRQ_SET_DATA_EVENTFD.

The vfio_cdx_msi_enable() function allocates the cdx_irqs array and
sets config_msi to 1 only when called through the EVENTFD path. The
trigger loop (for DATA_BOOL/DATA_NONE) assumed this had already been
done, but there was no enforcement of this call ordering.

This matches the protection used in the PCI VFIO driver where
vfio_pci_set_msi_trigger() checks irq_is() before the trigger loop.

Fixes: 848e447e000c ("vfio/cdx: add interrupt support")
Cc: stable@vger.kernel.org
Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Acked-by: Nipun Gupta <nipun.gupta@amd.com>
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Acked-by: Nikhil Agarwal <nikhil.agarwal@amd.com>
Link: https://lore.kernel.org/r/20260417202800.88287-2-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio: replace vfio->device_class with a const struct class

The class_create() call has been deprecated in favor of class_register()
as the driver core now allows for a struct class to be in read-only
memory. Replace vfio->device_class with a const struct class and drop
the class_create() call.

Compile tested with both CONFIG_VFIO_DEVICE_CDEV on and off (and
CONFIG_VFIO on); found no errors/warns in dmesg.

Link: https://lore.kernel.org/all/2023040244-duffel-pushpin-f738@gregkh/
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
[Remove unused vfio_cdev_init() args]
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Link: https://lore.kernel.org/r/20260417152814.18026-1-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/virtio: Use guard() for bar_mutex in legacy I/O

Convert the bar_mutex acquisition in virtiovf_issue_legacy_rw_cmd()
to use guard(), eliminating the out label and goto-based error paths
in favor of direct returns.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-5-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/virtio: Use guard() for migf->lock where applicable

Convert migf->lock acquisitions in virtiovf_disable_fd() and
virtiovf_save_read() to use guard(). In virtiovf_save_read() this
eliminates the out_unlock label and multiple goto paths by allowing
direct returns, and removes the need for the done variable to double
as an error carrier.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-4-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/virtio: Use guard() for list_lock where applicable

Convert list_lock mutex acquisitions to use guard() and scoped_guard()
where the lock scope aligns with the function or block scope. This
simplifies virtiovf_get_data_buff_from_pos() by replacing goto-based
unwinding with direct returns.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-3-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/virtio: Convert list_lock from spinlock to mutex

The list_lock spinlock with IRQ disabling was copied from the mlx5
vfio-pci variant driver, where it is justified by a hardirq async
command completion callback that accesses the protected lists. The
virtio driver has no such interrupt context usage; all list_lock
acquisitions occur in process context via file read/write operations
or state transitions under state_mutex.

Convert list_lock to a mutex to be consistent with peer vfio-pci
variant drivers (hisilicon, pds, qat, xe) which all use mutexes for
equivalent migration data protection. This also fixes a mismatched
spin_lock()/spin_unlock_irq() pair in virtiovf_read_device_context_chunk()
that could incorrectly enable interrupts.

Reported-by: Jinhui Guo <guojinhui.liam@bytedance.com>
Closes: https://lore.kernel.org/all/20260413073603.30538-1-guojinhui.liam@bytedance.com
Fixes: 0bbc82e4ec79 ("vfio/virtio: Add support for the basic live migration functionality")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260414200625.3601509-2-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

vfio/pci: Clean up DMABUFs before disabling function

On device shutdown, make vfio_pci_core_close_device() call
vfio_pci_dma_buf_cleanup() before the function is disabled via
vfio_pci_core_disable(). This ensures that all access via DMABUFs is
revoked before the function's BARs become inaccessible.

This fixes an issue where, if the function is disabled first, a tiny
window exists in which the function's MSE is cleared and yet BARs
could still be accessed via the DMABUF. The resources would also be
freed and up for grabs by a different driver.

Fixes: 5d74781ebc86c ("vfio/pci: Add dma-buf export support for MMIO regions")
Signed-off-by: Matt Evans <mattev@meta.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20260415181752.1027604-1-mattev@meta.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

block: only restrict bio allocation gfp mask asked to block

If the caller is asking for a non-blocking allocation, we should not
further restrict the gfp mask, which just increases the likelihood
of failures.

Fixes: b520c4eef83d ("block: split bio_alloc_bioset more clearly into a fast and slowpath")
Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://patch.msgid.link/20260415060813.807659-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ALSA: hda/realtek - Add mute LED support for HP Victus 15-fa2xxx

The mute LED on this laptop uses ALC245 but requires a quirk to work.
This patch enables the existing ALC245_FIXUP_HP_MUTE_LED_COEFBIT
quirk for the device.

Tested my Victus 15-fa2xxx (PCI SSID 103c:8dcd).
The LED behaviour works as intended.

Cc: stable@vger.kernel.org
Signed-off-by: Spencer Payton <spayton681@gmail.com>
Link: https://patch.msgid.link/20260421084918.14685-1-spayton681@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: pcmtest: Fix resource leaks in module init error paths

pcmtest allocates its pattern buffers and creates its debugfs tree
before registering the platform device and driver, but mod_init()
does not release those resources when a later init step fails.

As a result, a debugfs directory creation failure leaks the pattern
buffers, while platform_device_register() and
platform_driver_register() failures leave both the pattern buffers
and the debugfs tree behind. The recent fix for failed device
registration only dropped the embedded device reference.

Add the missing cleanup for the debugfs tree and pattern buffers in
the remaining module init error paths.

Fixes: 315a3d57c64c ("ALSA: Implement the new Virtual PCM Test Driver")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260421-alsa-pcmtest-init-unwind-v1-1-03fe0c423dbb@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

tpm: tpm_tis: stop transmit if retries are exhausted

tpm_tis_send_main() will attempt to retry sending data TPM_RETRY times.
Currently, if those retries are exhausted, the driver will attempt to
call execute. The TPM will be in the wrong state, leading to the
operation simply timing out.

Instead, if there is still an error after retries are exhausted, return
that error immediately.

Cc: stable@vger.kernel.org # v6.6+
Fixes: 280db21e153d8 ("tpm_tis: Resend command to recover from data transfer errors")
Signed-off-by: Jacqueline Wong <jacqwong@google.com>
Signed-off-by: Jordan Hand <jhand@google.com>
Link: https://lore.kernel.org/r/20260415160006.2275325-3-jacqwong@google.com
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: tpm_tis: add error logging for data transfer

Add logging to more easily determine reason for transmit failure

Cc: stable@vger.kernel.org # v6.6+
Fixes: 280db21e153d8 ("tpm_tis: Resend command to recover from data transfer errors")
Signed-off-by: Jacqueline Wong <jacqwong@google.com>
Signed-off-by: Jordan Hand <jhand@google.com>
Link: https://lore.kernel.org/r/20260415160006.2275325-2-jacqwong@google.com
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: avoid -Wunused-but-set-variable

Outside of the EFI tpm code, the TPM_MEMREMAP()/TPM_MEMUNMAP functions are
defined as trivial macros, leading to the mapping_size variable ending
up unused:

In file included from drivers/char/tpm/tpm-sysfs.c:16:
In file included from drivers/char/tpm/tpm.h:28:
include/linux/tpm_eventlog.h:167:6: error: variable 'mapping_size' set but not used [-Werror,-Wunused-but-set-variable]
167 | int mapping_size;

Turn the stubs into inline functions to avoid this warning.

Cc: stable@vger.kernel.org # v5.3+
Fixes: c46f3405692d ("tpm: Reserve the TPM final events table")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: Use kfree_sensitive() to free auth session in tpm_dev_release()

tpm_dev_release() uses plain kfree() to free chip->auth, which contains
sensitive cryptographic material including HMAC session keys, nonces,
and passphrase data (struct tpm2_auth).

Every other code path that frees this structure uses kfree_sensitive()
to zero the memory before releasing it: both tpm2_end_auth_session()
and tpm_buf_check_hmac_response() do so. The tpm_dev_release() path
is the only one that does not, leaving key material in freed slab
memory until it is eventually overwritten.

Use kfree_sensitive() for consistency with the rest of the driver and
to ensure session keys are scrubbed during device teardown.

Cc: stable@vger.kernel.org # v6.10+
Fixes: 699e3efd6c64 ("tpm: Add HMAC session start and end functions")
Signed-off-by: Gunnar Kudrjavets <gunnarku@amazon.com>
Reviewed-by: Justinien Bouron <jbouron@amazon.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm2-sessions: Fix missing tpm_buf_destroy() in tpm2_read_public()

tpm2_read_public() calls tpm_buf_init() but fails to call
tpm_buf_destroy() on two exit paths, leaking a page allocation:

1. When name_size() returns an error (unrecognized hash algorithm),
the function returns directly without destroying the buffer.

2. On the success path, the buffer is never destroyed before
returning.

All other error paths in the function correctly call
tpm_buf_destroy() before returning.

Fix both by adding the missing tpm_buf_destroy() calls.

Cc: stable@vger.kernel.org # v6.19+
Fixes: bda1cbf73c6e ("tpm2-sessions: Fix tpm2_read_public range checks")
Signed-off-by: Gunnar Kudrjavets <gunnarku@amazon.com>
Reviewed-by: Justinien Bouron <jbouron@amazon.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: Fix auth session leak in tpm2_get_random() error path

When tpm_buf_fill_hmac_session() fails inside the do-while loop in
tpm2_get_random(), the function returns directly after destroying the
buffer, without ending the auth session via tpm2_end_auth_session().

This leaks the TPM auth session resource. All other error paths within
the loop correctly reach the 'out' label which calls both
tpm_buf_destroy() and tpm2_end_auth_session().

Fix this by replacing the early return with a goto to the existing 'out'
label, which already handles both cleanup operations. The redundant
tpm_buf_destroy() call is removed since 'out' takes care of it.

Cc: stable@vger.kernel.org # v6.19+
Fixes: 6e9722e9a7bf ("tpm2-sessions: Fix out of range indexing in name_size")
Signed-off-by: Gunnar Kudrjavets <gunnarku@amazon.com>
Reviewed-by: Justinien Bouron <jbouron@amazon.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: i2c: atmel: fix block comment formatting

Multiple block comments in tpm_i2c_atmel.c placed the closing '*/' on the
same line as the comment text. This violates the kernel's preferred
comment style, which requires the closing delimiter to appear on its
line.

Fix the formatting to improve readability and resolve checkpatch
warnings.

Signed-off-by: Ethan Luna <trunixcodes@zohomail.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm_crb: Convert ACPI driver to a platform one

In all cases in which a struct acpi_driver is used for binding a driver
to an ACPI device object, a corresponding platform device is created by
the ACPI core and that device is regarded as a proper representation of
underlying hardware. Accordingly, a struct platform_driver should be
used by driver code to bind to that device. There are multiple reasons
why drivers should not bind directly to ACPI device objects [1].

Overall, it is better to bind drivers to platform devices than to their
ACPI companions, so convert the tpm_crb ACPI driver to a platform one.

While this is not expected to alter functionality, it changes sysfs
layout and so it will be visible to user space.

Link: https://lore.kernel.org/all/2396510.ElGaqSPkdT@rafael.j.wysocki/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

tpm: Make tcpci_pm_ops variable static const

File-scope 'tcpci_pm_ops' is not used outside of this unit and is not
modified anywhere, so make it static const to silence sparse warning:

tcpci.c:1002:1: warning: symbol 'tcpci_pm_ops' was not declared. Should it be static?

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

kgdb: update outdated references to kgdb_wait()

The function kgdb_wait() was folded into the static function
kgdb_cpu_enter() by commit 62fae312197a ("kgdb: eliminate
kgdb_wait(), all cpus enter the same way").  Update the four stale
references accordingly:

- include/linux/kgdb.h and arch/x86/kernel/kgdb.c: the
   kgdb_roundup_cpus() kdoc describes what other CPUs are rounded up
   to call.  Because kgdb_cpu_enter() is static, the correct public
   entry point is kgdb_handle_exception(); also fix a pre-existing
   grammar error ("get them be" -> "get them into") and reflow the
   text.

- kernel/debug/debug_core.c: replace with the generic description
   "the debug trap handler", since the actual entry path is
   architecture-specific.

- kernel/debug/gdbstub.c: kgdb_cpu_enter() is correct here (it
   describes internal state, not a call target); add the missing
   parentheses.

Suggested-by: Daniel Thompson <daniel@riscstar.com>
Assisted-by: unnamed:deepseek-v3.2 coccinelle
Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn>

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk updates from Stephen Boyd:
"We've finally gotten rid of the struct clk_ops::round_rate() code
  after months of effort from Brian Masney. Now the only option is to
  use determine_rate(), which is good because that takes a struct
  argument instead of just a couple unsigned longs, allowing us to
  easily modify the way we determine and set rates in the clk tree.

  Beyond that core framework change we've got the typical pile of new
  SoC clk driver additions, fixes for clk data and/or adding missing
  clks because the consumer driver using those clks wasn't ready, etc.
  The usual suspects are all here: Qualcomm, Samsung, Mediatek, and
  Rockchip along with some newcomers making RISC-V SoCs like ESWIN's
  eic700 and Tenstorrent's Atlantis. The clk driver side of this looks
  pretty normal.

  Core:
   - Remove the round_rate() clk op (yay!)

  New Drivers:
   - ESWIN eic700 SoC clk support
   - Econet EN751221 SoC clock/reset support
   - Global TCSR, RPMh, and display clock controller support for the
     Qualcomm Eliza platform
   - TCSR, the multiple global, and the RPMh clock controller support
     for the Qualcomm Nord platform
   - GPU clock controller support for Qualcomm SM8750
   - Video and GPU clock controller support for Qualcomm Glymur
   - Global clock controller support for Qualcomm IPQ5210
   - Axis ARTPEC-9: Add new PLL clocks and new drivers for eight clock
     controllers on the SoC
   - ExynosAutov920: Add G3D (GPU) clock controller
   - Clock driver for the Rockchip RV1103B SoC
   - Initial support for the Renesas RZ/G3L (R9A08G046) SoC
   - Clock and reset controllers (e.g. PRCM) in the Tenstorrent Atlantis SoC"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (132 commits)
  clk: visconti: pll: initialize clk_init_data to zero
  clk: fsl-sai: Add MCLK generation support
  clk: fsl-sai: Extract clock setup into fsl_sai_clk_register()
  dt-bindings: clock: fsl-sai: Document clock-cells = <1> support
  clk: fsl-sai: Add i.MX8M support with 8 byte register offset
  clk: fsl-sai: Sort the headers
  dt-bindings: clock: fsl-sai: Document i.MX8M support
  clk: qcom: gcc: Add multiple global clock controller driver for Nord SoC
  clk: qcom: rpmh: Add support for Nord rpmh clocks
  clk: qcom: Add TCSR clock driver for Nord SoC
  dt-bindings: clock: qcom: Add Nord Global Clock Controller
  dt-bindings: clock: qcom-rpmhcc: Add support for Nord SoCs
  dt-bindings: clock: qcom: Document the Nord SoC TCSR Clock Controller
  clk: qcom: gcc-x1e80100: Keep GCC USB QTB clock always ON
  clk: qcom: Constify list of critical CBCR registers
  clk: qcom: Constify qcom_cc_driver_data
  clk: qcom: videocc-glymur: Constify qcom_cc_desc
  clk: qcom: Add a driver for SM8750 GPU clocks
  dt-bindings: clock: qcom: Add SM8750 GPU clocks
  clk: qcom: ipq-cmn-pll: Add IPQ8074 SoC support
  ...

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
"Usual driver updates (ufs, lpfc, fnic, target, mpi3mr).

  The substantive core changes are adding a 'serial' sysfs attribute and
  getting sd to support > PAGE_SIZE sectors"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (98 commits)
  scsi: target: Don't validate ignored fields in PROUT PREEMPT
  scsi: qla2xxx: Use nr_cpu_ids instead of NR_CPUS for qp_cpu_map allocation
  scsi: ufs: core: Disable timestamp for Kioxia THGJFJT0E25BAIP
  scsi: mpi3mr: Fix typo
  scsi: sd: fix missing put_disk() when device_add(&disk_dev) fails
  scsi: libsas: Delete unused to_dom_device() and to_dev_attr()
  scsi: storvsc: Handle PERSISTENT_RESERVE_IN truncation for Hyper-V vFC
  scsi: iscsi_tcp: Remove unneeded selections of CRYPTO and CRYPTO_MD5
  scsi: lpfc: Update lpfc version to 15.0.0.0
  scsi: lpfc: Add PCI ID support for LPe42100 series adapters
  scsi: lpfc: Introduce 128G link speed selection and support
  scsi: lpfc: Check ASIC_ID register to aid diagnostics during failed fw updates
  scsi: lpfc: Update construction of SGL when XPSGL is enabled
  scsi: lpfc: Remove deprecated PBDE feature
  scsi: lpfc: Add REG_VFI mailbox cmd error handling
  scsi: lpfc: Log MCQE contents for mbox commands with no context
  scsi: lpfc: Select mailbox rq_create cmd version based on SLI4 if_type
  scsi: lpfc: Break out of IRQ affinity assignment when mask reaches nr_cpu_ids
  scsi: ufs: core: Make the header files self-contained
  scsi: ufs: core: Remove an include directive from ufshcd-crypto.h
  ...

Merge tag 'v7.1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

- Fix IPsec ESN regression in authencesn

- Fix hmac setkey failure in eip93

- Guard against IV changing in algif_aead

- Fix async completion handling in krb5enc

- Fix fallback async completion in acomp

- Fix handling of MAY_BACKLOG requests in pcrypt

- Fix issues with firmware-returned values in ccp

* tag 'v7.1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: krb5enc - fix async decrypt skipping hash verification
  crypto: algif_aead - snapshot IV for async AEAD requests
  crypto: acomp - fix wrong pointer stored by acomp_save_req()
  crypto: ccp - copy IV using skcipher ivsize
  crypto: ccp: Don't attempt to copy ID to userspace if PSP command failed
  crypto: ccp: Don't attempt to copy PDH cert to userspace if PSP command failed
  crypto: ccp: Don't attempt to copy CSR to userspace if PSP command failed
  crypto: pcrypt - Fix handling of MAY_BACKLOG requests
  crypto: sa2ul - Fix AEAD fallback algorithm names
  crypto: authencesn - Fix src offset when decrypting in-place
  crypto: eip93 - fix hmac setkey algo selection

tracing/fprobe: Check the same type fprobe on table as the unregistered one

Commit 2c67dc457bc6 ("tracing: fprobe: optimization for entry only case")
introduced a different ftrace_ops for entry-only fprobes.

However, when unregistering an fprobe, the kernel only checks if another
fprobe exists at the same address, without checking which type of fprobe
it is.
If different fprobes are registered at the same address, the same address
will be registered in both fgraph_ops and ftrace_ops, but only one of
them will be deleted when unregistering. (the one removed first will not
be deleted from the ops).

This results in junk entries remaining in either fgraph_ops or ftrace_ops.
For example:
=======
cd /sys/kernel/tracing

# 'Add entry and exit events on the same place'
echo 'f:event1 vfs_read' >> dynamic_events
echo 'f:event2 vfs_read%return' >> dynamic_events

# 'Enable both of them'
echo 1 > events/fprobes/enable
cat enabled_functions
vfs_read (2)            ->arch_ftrace_ops_list_func+0x0/0x210

# 'Disable and remove exit event'
echo 0 > events/fprobes/event2/enable
echo -:event2 >> dynamic_events

# 'Disable and remove all events'
echo 0 > events/fprobes/enable
echo > dynamic_events

# 'Add another event'
echo 'f:event3 vfs_open%return' > dynamic_events
cat dynamic_events
f:fprobes/event3 vfs_open%return

echo 1 > events/fprobes/enable
cat enabled_functions
vfs_open (1)            tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60    subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150}
vfs_read (1)            tramp: 0xffffffffa0001000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60    subops: {ent:fprobe_fgraph_entry+0x0/0x620 ret:fprobe_return+0x0/0x150}
=======

As you can see, an entry for the vfs_read remains.

To fix this issue, when unregistering, the kernel should also check if
there is the same type of fprobes still exist at the same address, and
if not, delete its entry from either fgraph_ops or ftrace_ops.

Link: https://lore.kernel.org/all/177669367993.132053.10553046138528674802.stgit@mhiramat.tok.corp.google.com/
Fixes: 2c67dc457bc6 ("tracing: fprobe: optimization for entry only case")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

tracing/fprobe: Avoid kcalloc() in rcu_read_lock section

fprobe_remove_node_in_module() is called under RCU read locked, but
this invokes kcalloc() if there are more than 8 fprobes installed
on the module. Sashiko warns it because kcalloc() can sleep [1].

[1] https://sashiko.dev/#/patchset/177552432201.853249.5125045538812833325.stgit%40mhiramat.tok.corp.google.com

To fix this issue, expand the batch size to 128 and do not expand
the fprobe_addr_list, but just cancel walking on fprobe_ip_table,
update fgraph/ftrace_ops and retry the loop again.

Link: https://lore.kernel.org/all/177669367206.132053.1493637946869032744.stgit@mhiramat.tok.corp.google.com/
Fixes: 0de4c70d04a4 ("tracing: fprobe: use rhltable for fprobe_ip_table")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

spi: Fix the error description in the `ptp_sts_word_post` comment

Based on the comment information, the description within the
`ptp_sts_word_post` section should be changed to "See @ptp_sts_word_pre".

Signed-off-by: Dewei Meng <mengdewei@cqsoftware.com.cn>
Link: https://patch.msgid.link/20260421025808.6572-1-mengdewei@cqsoftware.com.cn
Signed-off-by: Mark Brown <broonie@kernel.org>

tracing/fprobe: Remove fprobe from hash in failure path

When register_fprobe_ips() fails, it tries to remove a list of
fprobe_hash_node from fprobe_ip_table, but it missed to remove
fprobe itself from fprobe_table. Moreover, when removing
the fprobe_hash_node which is added to rhltable once, it must
use kfree_rcu() after removing from rhltable.

To fix these issues, this reuses unregister_fprobe() internal
code to rollback the half-way registered fprobe.

Link: https://lore.kernel.org/all/177669366417.132053.17874946321744910456.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

tracing/fprobe: Unregister fprobe even if memory allocation fails

unregister_fprobe() can fail under memory pressure because of memory
allocation failure, but this maybe called from module unloading, and
usually there is no way to retry it. Moreover. trace_fprobe does not
check the return value.

To fix this problem, unregister fprobe and fprobe_hash_node even if
working memory allocation fails.
Anyway, if the last fprobe is removed, the filter will be freed.

Link: https://lore.kernel.org/all/177669365629.132053.8433032896213721288.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

tracing/fprobe: Reject registration of a registered fprobe before init

Reject registration of a registered fprobe which is on the fprobe
hash table before initializing fprobe.
The add_fprobe_hash() checks this re-register fprobe, but since
fprobe_init() clears hlist_array field, it is too late to check it.
It has to check the re-registration before touncing fprobe.

Link: https://lore.kernel.org/all/177669364845.132053.18375367916162315835.stgit@mhiramat.tok.corp.google.com/
Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Merge tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull dcache busy loop updates from Al Viro:
"Fix livelocks in shrink_dcache_tree()

  If shrink_dcache_tree() finds a dentry in the middle of being killed
  by another thread, it has to wait until the victim finishes dying,
  gets detached from the tree and ceases to pin its parent.

  The way we used to deal with that amounted to busy-wait;
  unfortunately, it's not just inefficient but can lead to reliably
  reproducible hard livelocks.

  Solved by having shrink_dentry_tree() attach a completion to such
  dentry, with dentry_unlist() calling complete() on all objects
  attached to it. With a bit of care it can be done without growing
  struct dentry or adding overhead in normal case"

* tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  get rid of busy-waiting in shrink_dcache_tree()
  dcache.c: more idiomatic "positives are not allowed" sanity checks
  struct dentry: make ->d_u anonymous
  for_each_alias(): helper macro for iterating through dentries of given inode

arm64: dts: meson-gxl-p230: fix ethernet PHY interrupt number

Correct the interrupt number assigned to the Realtek PHY in the p230

following the same logic as commit 3106507e1004 ("ARM64: dts: meson-gxm:
fix q200 interrupt number"),as reported in [PATCH 0/2] Ethernet PHY
interrupt improvements [1].

[1] https://lore.kernel.org/all/20171202214037.17017-1-martin.blumenstingl@googlemail.com/

Fixes: b94d22d94ad2 ("ARM64: dts: meson-gx: add external PHY interrupt on some platforms")
Signed-off-by: Jun Yan <jerrysteve1101@gmail.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://patch.msgid.link/20260330145111.115318-1-jerrysteve1101@gmail.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: meson-axg: Add missing cache information to cpu0

Add missing L1 data and instruction cache parameters to the CPU node 0
for the Cortex-A53 caches on the Meson AXG SoC.

Fixes: 3b6ad2a43367 ("arm64: dts: amlogic: Add cache information to the Amlogic AXG SoCS")
Signed-off-by: Anand Moon <linux.amoon@gmail.com>
Link: https://patch.msgid.link/20260219103548.18392-1-linux.amoon@gmail.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: t7: khadas-vim4: fix board model name

Update the model property to "Khadas VIM4" to match the official
product branding and maintain consistency with other Khadas boards
(e.g., VIM1, VIM2, VIM3) in the kernel tree.

Signed-off-by: Nick Xie <nick@khadas.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260306030756.2421841-1-nick@khadas.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: Fix GIC register ranges for Amlogic T7

This patch aims to fix the GIC register ranges for Amlogic T7 SoC family.

- Context
Kernel log shows a warning about GIC
[ 0.000000] GIC: GICv2 detected, but range too small and irqchip.gicv2_force_probe not set

Using cat /proc/interrupts command shows GIC as GIC-0

Adding some peripherals sometimes causes hangs on interrupts.

- According to the GIC-400 ARM doc, the memory map is like:
0x1000-0x1FFF Distributor
0x2000-0x3FFF CPU interfaces
0x4000-0x5FFF Virtual interface control block
0x6000-0x7FFF Virtual CPU interfaces

- Identify GIC model from distributor register

Offset | Name | Type | Reset
0x008 | GICD_IIDR | RO | 0x0200143B

kvim4# md.l 0xFFF01008 1
fff01008: 0200143b

- Identify CPU interface from CPU interface register

Offset | Name | Type | Reset
0x00FC | GICC_IIDR | RO | 0x0202143B

kvim4# md.l 0xFFF020FC 1
fff020fc: 0202143b

- Virtual interface control register check

Offset | Name | Type | Reset
0x004 | GICH_VTR | RO | 0x90000003

kvim4# md.l 0xFFF04004 1
fff04004: 90000003

- Virtual CPU interfaces check

Offset | Name | Type | Reset
0x00FC | GICV_IIDR | RO | 0x0202143B

kvim4# md.l 0xFFF060FC 1
fff060fc: 0202143b

- After this patch there is no warning anymore.
GICv2 is correctly identified.

[ 0.000000] GIC: Using split EOI/Deactivate mode

Using cat /proc/interrupts command shows GIC as GICv2

Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260305-fix-amlt7-gic-dts-v1-1-5944415c74bf@aliel.fr
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: t7: khadas-vim4: fix memory layout for 8GB RAM

The Khadas VIM4 features 8GB of LPDDR4X RAM. The previous memory node
mapped a single incorrect region. This caused the kernel to map MMIO
and secure firmware (ATF/TrustZone) memory holes as standard RAM,
leading to an Asynchronous SError Interrupt during early boot
(paging_init) when the kernel attempted to clear those pages.

Fix this by splitting the 8GB memory layout into three separate
regions to properly avoid the memory holes (e.g., 0xe0000000 -
0xffffffff):
- 3.5GB @ 0x000000000
- 3.5GB @ 0x100000000
- 1.0GB @ 0x200000000

Signed-off-by: Nick Xie <nick@khadas.com>
Suggested-by: Ronald Claveau <linux-kernel-dev@aliel.fr>
Link: https://patch.msgid.link/20260319023446.3422695-1-nick@khadas.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

arm64: dts: amlogic: s6: Drop CPU masks from GICv3 PPI interrupts

Unlike older GIC variants, the GICv3 DT bindings do not support
specifying a CPU mask in PPI interrupt specifiers. Drop the masks.
While at it, replace the magic number for IRQ_TYPE_LEVEL_HIGH by its
symbolic definition.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/f9c6eddebebcd2e128edd2dbc51706e23589f9e8.1772643434.git.geert+renesas@glider.be
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>

net/sched: sch_dualpi2: drain both C-queue and L-queue in dualpi2_change()

Fix dualpi2_change() to correctly enforce updated limit and memlimit
values after a configuration change of the dualpi2 qdisc.

Before this patch, dualpi2_change() always attempted to dequeue packets
via the root qdisc (C-queue) when reducing backlog or memory usage, and
unconditionally assumed that a valid skb will be returned. When traffic
classification results in packets being queued in the L-queue while the
C-queue is empty, this leads to a NULL skb dereference during limit or
memlimit enforcement.

This is fixed by first dequeuing from the C-queue path if it is
non-empty. Once the C-queue is empty, packets are dequeued directly from
the L-queue. Return values from qdisc_dequeue_internal() are checked for
both queues. When dequeuing from the L-queue, the parent qdisc qlen and
backlog counters are updated explicitly to keep overall qdisc statistics
consistent.

Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc")
Reported-by: "Kito Xu (veritas501)" <hxzene@gmail.com>
Closes: https://lore.kernel.org/netdev/20260413075740.2234828-1-hxzene@gmail.com/
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Link: https://patch.msgid.link/20260417152551.71648-1-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: airoha: Fix PPE cpu port configuration for GDM2 loopback path

When QoS loopback is enabled for GDM3 or GDM4, incoming packets are
forwarded to GDM2. However, the PPE cpu port for GDM2 is not configured
in this path, causing traffic originating from GDM3/GDM4, which may
be set up as WAN ports backed by QDMA1, to be incorrectly directed
to QDMA0 instead.
Configure the PPE cpu port for GDM2 when QoS loopback is active on
GDM3 or GDM4 to ensure traffic is routed to the correct QDMA instance.

Fixes: 9cd451d414f6 ("net: airoha: Add loopback support for GDM2")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260417-airoha-ppe-cpu-port-for-gdm2-loopback-v1-1-c7a9de0f6f57@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-sleepable-ndo_set_rx_mode'

Stanislav Fomichev says:

====================
net: sleepable ndo_set_rx_mode

This series adds a new ndo_set_rx_mode_async callback that enables
drivers to handle address list updates in a sleepable context. The
current ndo_set_rx_mode is called under the netif_addr_lock spinlock
with BHs disabled, which prevents drivers from sleeping. This is
problematic for ops-locked drivers that need to sleep.

The approach:
1. Add snapshot/reconcile infrastructure for address lists
2. Introduce dev_rx_mode_work that takes snapshots under the lock,
drops the lock, calls the driver, then reconciles changes back
3. Move promiscuity handling into the scheduled work as well
4. Convert existing ops-locked drivers to ndo_set_rx_mode_async
5. Add a warning for ops-locked drivers still using ndo_set_rx_mode
6. Add a selftest exercising the team+bridge+macvlan topology that
triggers the addr_lock -> ops_lock ordering issue
====================

Link: https://patch.msgid.link/20260416185712.2155425-1-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: net: use ip commands instead of teamd in team rx_mode test

Replace teamd daemon usage with ip link commands for team device
setup. teamd -d daemonizes and returns to the shell before port
addition completes, creating a race: the test may create the macvlan
(and check for its address on a slave) before teamd has finished
adding ports. This makes the test inherently dependent on scheduling
timing.

Using ip commands makes port addition synchronous, removing the race
and making the test deterministic.

Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Jay Vosburgh <jv@jvosburgh.net>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-16-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: net: add team_bridge_macvlan rx_mode test

Add a test that exercises the ndo_change_rx_flags path through a
macvlan -> bridge -> team -> dummy stack. This triggers dev_uc_add
under addr_list_lock which flips promiscuity on the lower device.
With the new work queue approach, this must not deadlock.

Link: https://lore.kernel.org/netdev/20260214033859.43857-1-jiayuan.chen@linux.dev/
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-15-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: warn ops-locked drivers still using ndo_set_rx_mode

Now that all in-tree ops-locked drivers have been converted to
ndo_set_rx_mode_async, add a warning in register_netdevice to catch
any remaining or newly added drivers that use ndo_set_rx_mode with
ops locking. This ensures future driver authors are guided toward
the async path.

Also route ops-locked devices through netdev_rx_mode_work even if they
lack rx_mode NDOs, to ensure netdev_ops_assert_locked() does not fire
on the legacy path where only RTNL is held.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-14-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netkit: convert to ndo_set_rx_mode_async

Convert netkit driver from ndo_set_rx_mode to ndo_set_rx_mode_async.
The netkit driver's set_multicast_list is a no-op, presumably
for the same reason as the one in dummy? (fake multicast ability)

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-13-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dummy: convert to ndo_set_rx_mode_async

Convert dummy driver from ndo_set_rx_mode to ndo_set_rx_mode_async.
The dummy driver's set_multicast_list is a no-op, so the conversion
is straightforward: update the signature and the ops assignment.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-12-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

netdevsim: convert to ndo_set_rx_mode_async

Convert netdevsim from ndo_set_rx_mode to ndo_set_rx_mode_async.
The callback is a no-op stub so just update the signature and
ops struct wiring.

Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-11-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

iavf: convert to ndo_set_rx_mode_async

Convert iavf from ndo_set_rx_mode to ndo_set_rx_mode_async.
iavf_set_rx_mode now takes explicit uc/mc list parameters and
uses __hw_addr_sync_dev on the snapshots instead of __dev_uc_sync
and __dev_mc_sync.

The iavf_configure internal caller passes the real lists directly.

Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-10-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bnxt: use snapshot in bnxt_cfg_rx_mode

With the introduction of ndo_set_rx_mode_async (as discussed in [1])
we can call bnxt_cfg_rx_mode directly. Convert bnxt_cfg_rx_mode to
use uc/mc snapshots and move its call in bnxt_sp_task to the
section that resets BNXT_STATE_IN_SP_TASK. Switch to direct call in
bnxt_set_rx_mode.

Link: https://lore.kernel.org/netdev/CACKFLi=5vj8hPqEUKDd8RTw3au5G+zRgQEqjF+6NZnyoNm90KA@mail.gmail.com/
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-9-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bnxt: convert to ndo_set_rx_mode_async

Convert bnxt from ndo_set_rx_mode to ndo_set_rx_mode_async.
bnxt_set_rx_mode, bnxt_mc_list_updated and bnxt_uc_list_updated
now take explicit uc/mc list parameters and iterate with
netdev_hw_addr_list_for_each instead of netdev_for_each_{uc,mc}_addr.

The bnxt_cfg_rx_mode internal caller passes the real lists under
netif_addr_lock_bh.

BNXT_RX_MASK_SP_EVENT is still used here, next patch converts to
the direct call.

Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-8-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mlx5: convert to ndo_set_rx_mode_async

Convert mlx5 from ndo_set_rx_mode to ndo_set_rx_mode_async. The
driver's mlx5e_set_rx_mode now receives uc/mc snapshots and calls
mlx5e_fs_set_rx_mode_work directly instead of queueing work.

mlx5e_sync_netdev_addr and mlx5e_handle_netdev_addr now take
explicit uc/mc list parameters and iterate with
netdev_hw_addr_list_for_each instead of netdev_for_each_{uc,mc}_addr.

Fallback to netdev's uc/mc in a few places and grab addr lock.

Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-7-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

fbnic: convert to ndo_set_rx_mode_async

Convert fbnic from ndo_set_rx_mode to ndo_set_rx_mode_async. The
driver's __fbnic_set_rx_mode() now takes explicit uc/mc list
parameters and uses __hw_addr_sync_dev() on the snapshots instead
of __dev_uc_sync/__dev_mc_sync on the netdev directly.

Update callers in fbnic_up, fbnic_fw_config_after_crash,
fbnic_bmc_rpc_check and fbnic_set_mac to pass the real address
lists calling __fbnic_set_rx_mode outside the async work path.

Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: kernel-team@meta.com
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-6-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: move promiscuity handling into netdev_rx_mode_work

Move unicast promiscuity tracking into netdev_rx_mode_work so it runs
under netdev_ops_lock instead of under the addr_lock spinlock. This
is required because __dev_set_promiscuity calls dev_change_rx_flags
and __dev_notify_flags, both of which may need to sleep.

Change ASSERT_RTNL() to netdev_ops_assert_locked() in
__dev_set_promiscuity, netif_set_allmulti and __dev_change_flags
since these are now called from the work queue under the ops lock.

Link: https://lore.kernel.org/netdev/20260214033859.43857-1-jiayuan.chen@linux.dev/
Fixes: 78cd408356fe ("net: add missing instance lock to dev_set_promiscuity")
Reported-by: syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-5-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: cache snapshot entries for ndo_set_rx_mode_async

Add a per-device netdev_hw_addr_list cache (rx_mode_addr_cache) that
allows __hw_addr_list_snapshot() and __hw_addr_list_reconcile() to
reuse previously allocated entries instead of hitting GFP_ATOMIC on
every snapshot cycle.

snapshot pops entries from the cache when available, falling back to
__hw_addr_create(). reconcile splices both snapshot lists back into
the cache via __hw_addr_splice(). The cache is flushed in
free_netdev().

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-4-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: introduce ndo_set_rx_mode_async and netdev_rx_mode_work

Add ndo_set_rx_mode_async callback that drivers can implement instead
of the legacy ndo_set_rx_mode. The legacy callback runs under the
netif_addr_lock spinlock with BHs disabled, preventing drivers from
sleeping. The async variant runs from a work queue with rtnl_lock and
netdev_lock_ops held, in fully sleepable context.

When __dev_set_rx_mode() sees ndo_set_rx_mode_async, it schedules
netdev_rx_mode_work instead of calling the driver inline. The work
function takes two snapshots of each address list (uc/mc) under
the addr_lock, then drops the lock and calls the driver with the
work copies. After the driver returns, it reconciles the snapshots
back to the real lists under the lock.

Add netif_rx_mode_sync() to opportunistically execute the pending
workqueue update inline, so that rx mode changes are committed
before returning to userspace:
  - dev_change_flags (SIOCSIFFLAGS / RTM_NEWLINK)
  - dev_set_promiscuity
  - dev_set_allmulti
  - dev_ifsioc SIOCADDMULTI / SIOCDELMULTI
  - do_setlink (RTM_SETLINK)

Note that some deep hierarchies still do skip the lower updates via:
  - dev_uc_sync
  - dev_mc_sync

If we do end up hitting user-visible issues, we can add more calls to
netif_rx_mode_sync in specific places. But hopefully we should not,
the actual user-visible lists are still synced, it's that just HW state
that might be lagging.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-3-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: add address list snapshot and reconciliation infrastructure

Introduce __hw_addr_list_snapshot() and __hw_addr_list_reconcile()
for use by the upcoming ndo_set_rx_mode_async callback.

The async rx_mode path needs to snapshot the device's unicast and
multicast address lists under the addr_lock, hand those snapshots
to the driver (which may sleep), and then propagate any sync_cnt
changes back to the real lists. Two identical snapshots are taken:
a work copy for the driver to pass to __hw_addr_sync_dev() and a
reference copy to compute deltas against.

__hw_addr_list_reconcile() walks the reference snapshot comparing
each entry against the work snapshot to determine what the driver
synced or unsynced. It then applies those deltas to the real list,
handling concurrent modifications:

  - If the real entry was concurrently removed but the driver synced
    it to hardware (delta > 0), re-insert a stale entry so the next
    work run properly unsyncs it from hardware.
  - If the entry still exists, apply the delta normally. An entry
    whose refcount drops to zero is removed.

  # dev_addr_test_snapshot_benchmark: 1024 addrs x 1000 snapshots: 89872802 ns total, 89872 ns/iter
  # dev_addr_test_snapshot_benchmark.speed: slow

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260416185712.2155425-2-sdf@fomichev.me
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

erofs: unify lcn as u64 for 32-bit platforms

As sashiko reported [1], `lcn` was typed as `unsigned long` (or
`unsigned int` sometimes), which is only 32 bits wide on 32-bit
platforms, which causes `(lcn << lclusterbits)` to be truncated
at 4 GiB.

In order to consolidate the logic, just use `u64` consistently
around the codebase.

[1] https://sashiko.dev/r/20260420034612.1899973-1-hsiangkao%40linux.alibaba.com

Fixes: 152a333a5895 ("staging: erofs: add compacted compression indexes support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>