git.ipfire.org Git - thirdparty/linux.git/log

PCI: Initialize temporary device in new_id_store()

When setting new_id of a PCI device driver using sysfs a lockdep splat
occurs. This is because new_id_store() builds a temporary pci_dev for
pci_match_device(), which calls device_match_driver_override().  That
depends on the driver_override.lock added by cb3d1049f4ea ("driver core:
generalize driver_override in struct device").

The new driver_override.lock was not initialized in the temporary pci_dev,
resulting in this lockdep splat.

Initialize the temporary pci_dev to fix this.

Repro:

  Build with CONFIG_LOCKDEP=y, boot with QEMU, and add a new ID:

  # echo "8086 10f5" > /sys/bus/pci/drivers/e1000e/new_id

  INFO: trying to register non-static key.
  The code is fine but needs lockdep annotation, or maybe
  you didn't initialize this object before use?
  turning off the locking correctness validator.
  CPU: 2 UID: 0 PID: 177 Comm: liveupdate-iomm Not tainted 7.0.0+ #9 PREEMPT(full)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x80
   register_lock_class+0x77e/0x790
   lock_acquire+0xbf/0x2e0
   pci_match_device+0x24/0x180
   new_id_store+0x189/0x1d0
   kernfs_fop_write_iter+0x14f/0x210
   vfs_write+0x263/0x5e0
   ksys_write+0x79/0xf0
   do_syscall_64+0x117/0xf80

Fixes: 10a4206a2401 ("PCI: use generic driver_override infrastructure")
Fixes: 8895d3bcb8ba ("PCI: Fail new_id for vendor/device values already built into driver")
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
[bhelgaas: add commit log details and repro, trim backtrace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260505234327.716630-1-skhawaja@google.com

PCI: Update saved_config_space upon resource assignment

Bernd reports passthrough failure of a Digital Devices Cine S2 V6 DVB
adapter plugged into an ASRock X570S PG Riptide board with BIOS version
P5.41 (09/07/2023):

  ddbridge 0000:05:00.0: detected Digital Devices Cine S2 V6 DVB adapter
  ddbridge 0000:05:00.0: cannot read registers
  ddbridge 0000:05:00.0: fail

BIOS assigns an incorrect BAR to the DVB adapter which doesn't fit into the
upstream bridge window.  The kernel corrects the BAR assignment:

  pci 0000:07:00.0: BAR 0 [mem 0xfffffffffc500000-0xfffffffffc50ffff 64bit]: can't claim; no compatible bridge window
  pci 0000:07:00.0: BAR 0 [mem 0xfc500000-0xfc50ffff 64bit]: assigned

Correction of the BAR assignment happens in an x86-specific fs_initcall,
pcibios_assign_resources(), after device enumeration in a subsys_initcall.
This order was introduced at the behest of Linus in 2004:

  https://git.kernel.org/tglx/history/c/a06a30144bbc

No other architecture performs such a late BAR correction.

Bernd bisected the issue to commit a2f1e22390ac ("PCI/ERR: Ensure error
recoverability at all times"), but it only occurs in the absence of commit
4d4c10f763d7 ("PCI: Explicitly put devices into D0 when initializing").
This combination exists in stable kernel v6.12.70, but not in mainline,
hence Bernd cannot reproduce the issue with mainline.

Since a2f1e22390ac, config space is saved on enumeration, prior to BAR
correction.  Upon passthrough, the corrected BAR is overwritten with the
incorrect saved value by:

  vfio_pci_core_register_device()
    vfio_pci_set_power_state()
      pci_restore_state()

But only if the device's current_state is PCI_UNKNOWN, as it was prior to
commit 4d4c10f763d7.  Since the commit, it is PCI_D0, which changes the
behavior of vfio_pci_set_power_state() to no longer restore the state
without saving it first.

Alexandre is reporting the same issue as Bernd, but in his case, mainline
is affected as well.  The difference is that on Alexandre's system, the
host kernel binds a driver to the device which is unbound prior to
passthrough, whereas on Bernd's system no driver gets bound by the host
kernel.

Unbinding sets current_state to PCI_UNKNOWN in pci_device_remove(), so when
vfio-pci is subsequently bound to the device, pci_restore_state() is once
again called without invoking pci_save_state() first.

To robustly fix the issue, always update saved_config_space upon resource
assignment.

Reported-by: Bernd Schumacher <bernd@bschu.de>
Closes: https://lore.kernel.org/r/acfZrlP0Ua_5D3U4@eldamar.lan/
Reported-by: Alexandre N. <an.tech@mailo.com>
Closes: https://lore.kernel.org/r/dd3c3358-de0f-4a56-9c81-04aceaab4058@mailo.com/
Fixes: a2f1e22390ac ("PCI/ERR: Ensure error recoverability at all times")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Bernd Schumacher <bernd@bschu.de>
Tested-by: Alexandre N. <an.tech@mailo.com>
Cc: stable@vger.kernel.org # v6.12+
Link: https://patch.msgid.link/febc3f354e0c1f5a9f5b3ee9ffddaa44caccf651.1776268054.git.lukas@wunner.de

drm/xe/multi_queue: Use QUEUE_TIMESTAMP as job timestamp for multi-queue

Each queue in a multi queue group has a dedicated timestamp counter. Use
this QUEUE TIMESTAMP register to capture the start timestamp for the
job.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-23-umesh.nerlige.ramappa@intel.com

drm/xe/multi_queue: Add trace event for the multi queue timestamp

Add a trace event for multi queue timestamp capture.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-22-umesh.nerlige.ramappa@intel.com

drm/xe/multi_queue: Capture queue run times for active queues

If a queue is currently active on the CS, query the QUEUE TIMESTAMP
register to get an up to date value of the runtime. To do so, ensure
that the primary queue is active and then check if the secondary queue
is executing on the CS.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-21-umesh.nerlige.ramappa@intel.com

drm/xe/lrc: Refactor out engine id to hwe conversion

We need to define more helpers that read engine ID specific register, so
move that logic outside of get_ctx_timestamp().

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-20-umesh.nerlige.ramappa@intel.com

drm/xe/multi_queue: Add helpers to access CS QUEUE TIMESTAMP from lrc

In secondary queue LRCs, the QUEUE TIMESTAMP register is saved and
restored allowing us to view the individual queue run times. Add helpers
to read this value from the LRC.

BSpec: 73988

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-19-umesh.nerlige.ramappa@intel.com

drm/xe/multi_queue: Store primary LRC and position info in LRC

For an LRC belonging to the secondary queue, in order to check if its
context group is active, we need to check the LRC of the primary queue.
In addition to that we want to compare the secondary queue position to
CSMQDEBUG register to check if the queue itself is active.

To do so, store primary LRC and position information in the LRC.

A note on references involved:

- In general the Queue takes a ref on its LRC.
- In addition, for multi-queue,
a. Primary Queue takes a ref for each Secondary LRC.
b. Each Secondary Queue takes a ref to the Primary Queue

In the current patch, each LRC in the queue group is storing a pointer
to primary LRC. Both primary and secondary LRCs are freed only when
primary queue is destroyed. At this time, all secondary queues are
already destroyed, so there is no one using secondary LRCs. We should be
good without taking any additional references.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-18-umesh.nerlige.ramappa@intel.com

drm/xe/multi_queue: Refactor check for multi queue support for engine class

xe exec queue code is using a check to see if a class of engines support
multi queue. This check is also needed by other code, so move it to
xe_gt and export it for others.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-17-umesh.nerlige.ramappa@intel.com

drm/xe/lrc: Refactor xe_lrc_timestamp to simplify logic

Use a context_active() helper and simplify the timestamp logic.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-16-umesh.nerlige.ramappa@intel.com

drm/xe: Add timestamp_ms to LRC snapshot

Add a timestamp in milliseconds to the LRC snapshot to make it easier to
reason about how long the LRC has been running and the average duration
of each job.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-15-umesh.nerlige.ramappa@intel.com

drm/xe/lrc: Use 64 bit ctx timestamp in the LRC snapshot

Use the 64 bit value when available for the context timestamp in the LRC
snapshot.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20260507162016.3888309-14-umesh.nerlige.ramappa@intel.com

bpf: Free reuseport cBPF prog after RCU grace period.

Eulgyu Kim reported the splat below with a repro. [0]

The repro sets up a UDP reuseport group with a cBPF prog and
replaces it with a new one while another thread is sending
a UDP packet to the group.

The reuseport prog is freed by sk_reuseport_prog_free().
bpf_prog_put() is called for "e"BPF prog to destruct through
multiple stages while cBPF prog is freed immediately by
bpf_release_orig_filter() and bpf_prog_free().

If a reuseport prog is detached from the setsockopt() path
(reuseport_attach_prog() or reuseport_detach_prog()),
sk_reuseport_prog_free() is called without waiting for RCU
readers to complete, resulting in various bugs.

Let's defer freeing the reuseport cBPF prog after one RCU
grace period.

Note "e"BPF prog is safe as is unless the fast path starts
to touch fields destroyed in bpf_prog_put_deferred() and
__bpf_prog_put_noref().

[0]:
BUG: KASAN: vmalloc-out-of-bounds in reuseport_select_sock+0xedc/0x1220 net/core/sock_reuseport.c:596
Read of size 4 at addr ffffc9000051e004 by task slowme/10208
CPU: 6 UID: 1000 PID: 10208 Comm: slowme Not tainted 7.0.0-geb7ac95ff75e #32 PREEMPT(full)
Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
reuseport_select_sock+0xedc/0x1220 net/core/sock_reuseport.c:596
udp4_lib_lookup2+0x3bc/0x950 net/ipv4/udp.c:495
__udp4_lib_lookup+0x768/0xe20 net/ipv4/udp.c:723
__udp4_lib_lookup_skb+0x297/0x390 net/ipv4/udp.c:752
__udp4_lib_rcv+0x1312/0x2620 net/ipv4/udp.c:2752
ip_protocol_deliver_rcu+0x282/0x440 net/ipv4/ip_input.c:207
ip_local_deliver_finish+0x3bb/0x6f0 net/ipv4/ip_input.c:241
NF_HOOK+0x30c/0x3a0 include/linux/netfilter.h:318
NF_HOOK+0x30c/0x3a0 include/linux/netfilter.h:318
__netif_receive_skb_one_core net/core/dev.c:6181 [inline]
__netif_receive_skb net/core/dev.c:6294 [inline]
process_backlog+0xaa4/0x1960 net/core/dev.c:6645
__napi_poll+0xae/0x340 net/core/dev.c:7709
napi_poll net/core/dev.c:7772 [inline]
net_rx_action+0x5d7/0xf50 net/core/dev.c:7929
handle_softirqs+0x22b/0x870 kernel/softirq.c:622
do_softirq+0x76/0xd0 kernel/softirq.c:523
</IRQ>
<TASK>
__local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:924 [inline]
__dev_queue_xmit+0x1dd7/0x3710 net/core/dev.c:4890
neigh_output include/net/neighbour.h:556 [inline]
ip_finish_output2+0xca9/0x1070 net/ipv4/ip_output.c:237
NF_HOOK_COND include/linux/netfilter.h:307 [inline]
ip_output+0x29f/0x450 net/ipv4/ip_output.c:438
ip_send_skb+0x45/0xc0 net/ipv4/ip_output.c:1508
udp_send_skb+0xb04/0x1510 net/ipv4/udp.c:1195
udp_sendmsg+0x1a71/0x2350 net/ipv4/udp.c:1485
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
__sys_sendto+0x554/0x680 net/socket.c:2206
__do_sys_sendto net/socket.c:2213 [inline]
__se_sys_sendto net/socket.c:2209 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2209
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x160/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x415a2d
Code: b3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6bc31e41e8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f6bc31e4cdc RCX: 0000000000415a2d
RDX: 0000000000000001 RSI: 00007f6bc31e421f RDI: 0000000000000003
RBP: 00007f6bc31e4240 R08: 00007f6bc31e4220 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000212 R12: 00007f6bc31e46c0
R13: ffffffffffffffb8 R14: 0000000000000000 R15: 00007ffc9b0d70b0
</TASK>

Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Reported-by: Eulgyu Kim <eulgyukim@snu.ac.kr>
Reported-by: Taeyang Lee <0wn@theori.io>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20260426012647.3233119-1-kuniyu@google.com

drm/xe/eustall: Return ENODEV from read if EU stall registers get reset

If a reset (GT or engine) happens during EU stall data sampling, all the
EU stall registers can get reset to 0. This will result in EU stall data
buffers' read and write pointer register values to be out of sync with
the cached values. This will result in read() returning invalid data. To
prevent this, check the value of a EU stall base register. If it is zero,
it indicates a reset may have happened that wiped the register to zero.
If this happens, return ENODEV from read() upon which the user space
should disable and enable EU stall data sampling or close the fd and
open a new fd for a new EU stall data collection session.

This patch has been tested by running two IGT tests simultaneously
xe_eu_stall and xe_exec_reset.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Felix Degrood <felix.j.degrood@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/6935698d70b6c7c347ff8b138209d601030c2c9f.1776200094.git.harish.chegondi@intel.com

dma-debug: Ensure mappings are created and released with matching attributes

The DMA API expects that callers use the same attributes when mapping
and unmapping. Add tracking to verify this and catch mismatches.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-6-8dbac75cd501@nvidia.com

dma-debug: Feed DMA attribute for unmapping flows too

There are multiple unmapping flows which didn't provide DMA attributes,
which limited DMA debug code to compare the mapping and unmapping
attributes. Let's fix it.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-5-8dbac75cd501@nvidia.com

dma-debug: Record DMA attributes in debug entry

To enable reliable comparison of DMA attributes between map and
unmap operations, store the attribute value in dma_debug_entry.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-4-8dbac75cd501@nvidia.com

dma-debug: Remove unused DMA attribute parameter

debug_dma_alloc_pages() always receives a DMA attribute value of 0,
because dma_alloc_pages() never receives any attributes from its callers.
As preparation for upcoming patches, remove this unused attribute from
the debug routine.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-3-8dbac75cd501@nvidia.com

ntb: Use consistent DMA attributes when freeing DMA mappings

The counterpart of dma_alloc_attrs() is dma_free_attrs(), which must
receive the same DMA attributes used during allocation. The code
previously used dma_free_coherent(), which does not accept or apply any
DMA attributes.

Fixes: 061a785a114f ("ntb: Force physically contiguous allocation of rx ring buffers")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-2-8dbac75cd501@nvidia.com

ntb: Store original DMA address for future release

The DMA API requires that dma_free_attrs receive the exact dma_handle
originally returned by the allocation function. Do not modify it.

Fixes: fc5d1829f9bf ("NTB: transport: Try harder to alloc an aligned MW buffer")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-1-8dbac75cd501@nvidia.com

Merge tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

- Fix for ublk not doing an actual issue from the task_work fallback
   path. Any request hitting that should be canceled automatically

- Fix for uring_cmd prep side handling, for the block side uring_cmd
   discard handling

- Fix for missing validation of the io and physical block size shifts

- Fix for a use-after-free in ublk's cancel command handling

* tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  ublk: fix use-after-free in ublk_cancel_cmd()
  ublk: validate physical_bs_shift, io_min_shift and io_opt_shift
  block: only read from sqe on initial invocation of blkdev_uring_cmd()
  ublk: don't issue uring_cmd from fallback task work

Merge tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

- Ensure that the absolute timeouts for both the command side and the
   waiting side honor the callers time namespace

- Ensure tracked NAPI entries are cleared at unregistration time, as
   the NAPI polling loop checks the list state rather than the general
   NAPI state. This can lead to NAPI polling even after unregistration
   has been done. If unregistered, all NAPI polling should be disabled

- Fix for eventfd recursive invocation handling

* tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
  io_uring/eventfd: reset deferred signal state
  io_uring/napi: clear tracked NAPI entries on unregister

drm/xe/multi_queue: Refactor CGP_SYNC send path

Factor the repeated CGP_SYNC action build-and-send sequence into a new
helper guc_exec_queue_send_cgp_sync(). Drop the redundant guc parameter
from __register_exec_queue_group() since it can be derived via
exec_queue_to_guc(q). Remove xe_guc_exec_queue_group_add() which is now
identical to the helper and replace its call site directly.

No functional change.

Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260508194428.61819-6-niranjana.vishwanathapura@intel.com

drm/xe/multi_queue: Remove redundant assignment in guc_exec_queue_run_job

The 'killed_or_banned_or_wedged = true' assignment is redundant
since the variable is never read after that point.

Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260508194428.61819-5-niranjana.vishwanathapura@intel.com

Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"

Some older systems don't support CPPC in the firmware and this just makes
noise for them when booting. Drop back to debug.

This reverts commit 21fb59ab4b9767085f4fe1edbdbe3177fbb9ec97.

Fixes: 21fb59ab4b976 ("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn")
Suggested-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: All applicable <stable@vger.kernel.org>
Link: https://patch.msgid.link/20260504230141.484743-2-mario.limonciello@amd.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI: PMIC: Replace mutex_lock/unlock() with guard()/scoped_guard()

Replace mutex_lock() and unlock() calls with the newer guard() and
scoped_guard() macros to make the code more straightforward.

Signed-off-by: Maxwell Doose <m32285159@gmail.com>
[ rjw: Subject tweak, changelog edits ]
Link: https://patch.msgid.link/20260430013507.46259-1-m32285159@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

drm/nouveau/gsp: Use kzalloc_flex() for r535 display funcs

struct nvkm_disp_func ends with the user flexible array member. Allocate
the r535 display function table with kzalloc_flex() instead of open-coding
the size calculation with sizeof().

Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[dropped nothing-burger sentence from commit message]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20260508052056.1744665-1-rosenp@gmail.com

nouveau/vmm: use kzalloc_flex

Use the proper macro do to these sizeof calculations.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
[fixed style warning from checkpatch]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20260312195529.13002-1-rosenp@gmail.com

Merge branch 'bpf-tcp-fix-type-confusion-in-bpf-helper-functions'

Kuniyuki Iwashima says:

====================
bpf: tcp: Fix type confusion in bpf helper functions.

bpf_tcp_sock() only check if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

  socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

The same issues exist in other bpf functions:

  * bpf_mptcp_sock_from_subflow()
  * bpf_skc_to_tcp_sock()
  * bpf_skc_to_tcp6_sock()
  * sol_tcp_sockopt()

Patch 1 fixes bpf_tcp_sock() and Patch 2 adds a test for it.
Patch 3 ~ 6 fix the rest of the functions above.

Changes:
  v2:
    * Inverse if (err) to if (!err) in the selftest
    * Add patch 3 ~ 6

  v1: https://lore.kernel.org/bpf/20260430184405.1227386-1-kuniyu@google.com/
      https://lore.kernel.org/mptcp/20260430-mptcp-bpf-mptcp-sock-type-v1-1-d2ed5cda7da9@kernel.org/
====================

Link: https://patch.msgid.link/20260504210610.180150-1-kuniyu@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

bpf: tcp: Fix type confusion in sol_tcp_sockopt().

sol_tcp_sockopt() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Let's use sk_is_tcp().

Note that initially sol_tcp_sockopt() checked sk->sk_prot->setsockopt.

Fixes: 2ab42c7b871f ("bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-7-kuniyu@google.com

bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().

bpf_skc_to_tcp6_sock() only checks if sk->sk_protocol is IPPROTO_TCP
and sk->sk_family is AF_INET6, but RAW socket can bypass it:

socket(AF_INET6, SOCK_RAW, IPPROTO_TCP)

Let's check sk->sk_type too.

Fixes: af7ec1383361 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-6-kuniyu@google.com

bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().

bpf_skc_to_tcp_sock() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Let's use sk_is_tcp().

Fixes: 478cfbdf5f13 ("bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-5-kuniyu@google.com

mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()

bpf_mptcp_sock_from_subflow() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

In this case, it would NOT be valid to call sk_is_mptcp() which will
assume sk is a pointer to a struct tcp_sock, and wrongly checks for:
tcp_sk(sk)->is_mptcp.

Fixes: 3bc253c2e652 ("bpf: Add bpf_skc_to_mptcp_sock_proto")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260504210610.180150-4-kuniyu@google.com

selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.

Let's extend sockopt_sk.c to cover bpf_tcp_sock() for the
wrong socket type.

Before:
  # ./test_progs -t sockopt_sk
  [  151.948613] ==================================================================
  [  151.951376] BUG: KASAN: slab-out-of-bounds in sol_tcp_sockopt+0xc7/0x8e0
  [  151.954159] Read of size 8 at addr ffff88801083d760 by task test_progs/1259
  ...
  run_test:FAIL:getsetsockopt unexpected error: -1 (errno 0)
  #427     sockopt_sk:FAIL

After:
  #427     sockopt_sk:OK

While at it, missing free() is fixed up.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-3-kuniyu@google.com

crypto/ccp: Skip SNP_INIT if preparation fails

If snp_prepare() fails, SNP_INIT will fail, so skip it and return early.

Note that this is not a change in initialization behavior: if SNP_INIT fails
even before this change, it will still return an error and
__sev_snp_init_locked() will fail initialization of other SEV modes.

Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Link: https://lore.kernel.org/20260429155636.540040-1-tycho@kernel.org

x86/sev: Do not initialize SNP if missing CPUs

The SEV firmware checks that the SNP enable bit is set on each CPU during SNP
initialization, and will fail if not. If there are some CPUs offline, they
will not run the setup functions, so SNP initialization will always fail.

Skip the IPIs in this case and return an error so that the CCP driver can
skip the SNP_INIT that will fail. Also print the CPU masks in order to leave
breadcrumbs so people can figure out what happened.

[ bp: Massage commit message. ]

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://20260429155636.540040-1-tycho@kernel.org

drm/nouveau/kms/nvd9-: Remove unused header in crc.c

No functional changes.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20260501215703.820656-2-lyude@redhat.com
Reviewed-by: Dave Airlie <airlied@redhat.com>

nilfs2: fix backing_dev_info reference leak

setup_bdev_super() already initializes sb->s_bdev and takes a
reference on the block device backing_dev_info when assigning sb->s_bdi.

nilfs_fill_super() takes another reference to the same
backing_dev_info and stores it in sb->s_bdi again. The extra
reference is not paired with a matching bdi_put(), since
generic_shutdown_super() releases sb->s_bdi only once.

Drop the redundant bdi_get() in nilfs_fill_super(). The single
reference taken by setup_bdev_super() is enough and is released
during superblock shutdown.

Fixes: c1e012ea9e83 ("nilfs2: use setup_bdev_super to de-duplicate the mount code")
Signed-off-by: Shuangpeng Bai <shuangpeng.kernel@gmail.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>

workqueue: Fix wq->cpu_pwq leak in alloc_and_link_pwqs() WQ_UNBOUND path

For WQ_UNBOUND workqueues, alloc_and_link_pwqs() allocates wq->cpu_pwq
via alloc_percpu() and then calls apply_workqueue_attrs_locked(). On
failure it returns the error directly, bypassing the enomem: label
which holds the only free_percpu(wq->cpu_pwq) in this function.

The caller's error path kfree()s wq without touching wq->cpu_pwq,
leaking one percpu pointer table (nr_cpu_ids * sizeof(void *) bytes) per
failed call.

If kmemleak is enabled, we can see:

  unreferenced object (percpu) 0xc0fffa5b121048 (size 8):
    comm "insmod", pid 776, jiffies 4294682844
    backtrace (crc 0):
      pcpu_alloc_noprof+0x665/0xac0
      __alloc_workqueue+0x33f/0xa20
      alloc_workqueue_noprof+0x60/0x100

Route the error through the existing enomem: cleanup and any error
before this one.

Cc: stable@kernel.org
Fixes: 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: Release PENDING in __queue_work() drain/destroy reject path

The caller of __queue_work() owns WORK_STRUCT_PENDING, won via
test_and_set_bit() in queue_work_on()/__queue_delayed_work(). The
state machine documented above __queue_work() requires that owner
to either hand the token to a pwq (insert_work() -> set_work_pwq()),
hand it to a timer, or release it via set_work_pool_and_clear_pending().
try_to_grab_pending() relies on this: when it observes
"PENDING && off-queue" it busy-loops, trusting the current owner to
make progress.

The (__WQ_DESTROYING | __WQ_DRAINING) early-return path violates that
contract. It WARN_ONCE()s and bare-returns, leaving work->data with
PENDING set, WORK_STRUCT_PWQ clear, and work->entry empty.

The path is reachable without explicit API abuse: queue_delayed_work()
arms a timer with PENDING set; if drain_workqueue() runs while the
timer is still pending, delayed_work_timer_fn() -> __queue_work() in
softirq context hits the WARN, current is not a wq worker so
is_chained_work() is false, and the work is silently dropped with
PENDING leaked.

Mirror what clear_pending_if_disabled() already does on its analogous
reject path: unpack the off-queue data and call
set_work_pool_and_clear_pending() to release the token before
returning.

I was able to reproduce this by queueing several slow works on
a max_active=1 wq, arm a delayed_work whose timer fires while
drain_workqueue() is blocked, then call cancel_delayed_work_sync().
Without this patch the cancel livelocks at 100% CPU; with it the cancel
returns immediately.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>

Merge tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix for two ACL issues (security fix to validate dacloffset better
   and chmod fix)

- Fix out of bounds reads (in check_wsl_eas and smb2_check_msg for
   symlinks)

- Two Kerberos fixes including an important one when AES-256 encryption
   chosen

- Fix open_cached_dir problem when directory leases disabled

* tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: validate dacloffset before building DACL pointers
  smb/client: fix out-of-bounds read in smb2_compound_op()
  smb/client: fix out-of-bounds read in symlink_data()
  smb: client: Zero-pad short GSS session keys per MS-SMB2
  smb: client: Use FullSessionKey for AES-256 encryption key derivation
  smb: client: use kzalloc to zero-initialize security descriptor buffer
  cifs: abort open_cached_dir if we don't request leases

Merge tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"There's two main series here, fixing issues that came up in the
  Microchip QSPI and Freescale i.MX drivers. Both of those could result
  in some quite noticable issues if they were encountered in production.
  We also have one minor documentation fix in the ch341 driver"

* tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: ch341: correct company name in MODULE_DESCRIPTION
  spi: microchip-core-qspi: remove some inline markings
  spi: microchip-core-qspi: don't attempt to transmit during emulated read-only dual/quad operations
  spi: microchip-core-qspi: control built-in cs manually
  spi: imx: Propagate prepare_transfer() error from spi_imx_setupxfer()
  spi: imx: Fix UAF on package-1 prepare failure in spi_imx_dma_data_prepare()
  spi: imx: Fix precedence bug in spi_imx_dma_max_wml_find()

Merge tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
"A straightforward fix for an incorrect description of one of the
regulators on the Qualcomm PMH0101"

* tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom-rpmh: Fix index for pmh0101 ldo16

bpf: tcp: Fix type confusion in bpf_tcp_sock().

bpf_tcp_sock() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Calling bpf_setsockopt() in SOCKOPT prog triggers out-of-bounds
access to another slab object. [0]

Let's use sk_is_tcp().

[0]:
BUG: KASAN: slab-out-of-bounds in sol_tcp_sockopt (net/core/filter.c:5519)
Read of size 8 at addr ffff88801083d760 by task test_progs/1259

CPU: 1 UID: 0 PID: 1259 Comm: test_progs Tainted: G OE 7.0.0-11175-gb5c111f4967b #1 PREEMPT(full)
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
print_report (mm/kasan/report.c:378 mm/kasan/report.c:482)
kasan_report (mm/kasan/report.c:595)
sol_tcp_sockopt (net/core/filter.c:5519)
__bpf_getsockopt (net/core/filter.c:5633)
bpf_sk_getsockopt (net/core/filter.c:5654)
bpf_prog_629ba00a1601e9f2__setsockopt+0x86/0x22c
__cgroup_bpf_run_filter_setsockopt (./include/linux/bpf.h:1402 ./include/linux/filter.h:722 ./include/linux/filter.h:729 kernel/bpf/cgroup.c:81 kernel/bpf/cgroup.c:2026)
do_sock_setsockopt (net/socket.c:2363)
__x64_sys_setsockopt (net/socket.c:2406)
do_syscall_64 (arch/x86/entry/syscall_64.c:63)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
RIP: 0033:0x7f85f82fe7de
Code: 55 48 63 c9 48 63 ff 45 89 c9 48 89 e5 48 83 ec 08 6a 2c e8 34 69 f7 ff c9 c3 66 90 f3 0f 1e fa 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 e1
RSP: 002b:00007ffe59dcecd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f85f82fe7de
RDX: 000000000000001c RSI: 0000000000000006 RDI: 000000000000000d
RBP: 00007ffe59dcef20 R08: 000000000000003c R09: 0000000000000000
R10: 00007ffe59dcef00 R11: 0000000000000202 R12: 00007ffe59dcf268
R13: 0000000000000003 R14: 00007f85f9da5000 R15: 000055b2f3201400
</TASK>

The buggy address belongs to the object at ffff88801083d280
which belongs to the cache RAW of size 1792
The buggy address is located 1248 bytes inside of
allocated 1792-byte region [ffff88801083d280, ffff88801083d980)

Fixes: 655a51e536c0 ("bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260504210610.180150-2-kuniyu@google.com

drm/xe: Make decision to use Xe2-style blitter instructions a feature flag

The blitter engines' MEM_COPY and MEM_SET instructions were added as
part of the same hardware change that introduced service copy engines
(i.e., BCS1-BCS8) which is why the driver checks for service copy engine
presence when deciding whether to use these instructions or the older
XY_* instructions.  However when making this decision the driver should
consider which engines are part of the hardware architecture, not which
engines are present/usable on the current device.  For graphics IP
versions that architecturally include service copy engines (i.e.,
everything Xe2 and later, plus PVC's Xe_HPC) we should use MEM_SET and
MEM_COPY even in if all of the service copy engines wind up getting
fused off.  I.e., we need to decide based on whether the platform's
graphics descriptor contains these engines, rather than whether the
usable engine mask contains them.  This logic got broken when
gt->info.__engine_mask was removed, although in practice that mistake
has been harmless so far because there haven't been any hardware
SKUs that fuse off all of the service copy engines yet.

Replace the incorrect has_service_copy_support() function with a GT
feature flag that tracks more accurately whether the new blitter
instructions are usable.  In addition to fixing incorrect logic if all
service copies are fused off, the flag also makes it more obvious what
the calling code is trying to do; previously it wasn't terribly obvious
why "has service copy engines" was being used as the condition for using
different instructions on all copy engine types.

The new feature flag is named 'has_xe2_blt_instructions' because we
expect this flag to be set for all Xe2 and later platforms (i.e.,
everything officially supported by the Xe driver).  Technically there's
also one Xe1-era platform (PVC) that supports these engines/instructions
and will set this flag, but this still seems to be the most clear and
understandable name for the flag.

Fixes: 61549a2ee594 ("drm/xe: Drop __engine_mask")
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patch.msgid.link/20260507-xe2_copy-v1-1-26506381b821@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>

Documentation: core-api/cpu_hotplug: Remove stale cpu0_hotplug docs

Commit e59e74dc48a3 ("x86/topology: Remove CPU0 hotplug option")
removed the 'cpu0_hotplug' option, but its documentation remained in
cpu_hotplug.rst. Remove the stale entry.

Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20260507134732.254617-1-chao.gao@intel.com

selftests/cgroup: Fix incorrect variable check in online_cpus()

"OFFLINE_CPUS" is a literal string that is always non-empty. It should
be "$OFFLINE_CPUS" to check the variable's value instead.

Signed-off-by: Hongfu Li <lihongfu@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>

sched_ext: Use offsetofend on both sides of the ops_cid layout assert

sizeof() includes trailing struct pad, offsetofend() doesn't. On
32-bit PPC, sched_ext_ops_cid tail-pads 4 bytes past @priv and the
assert trips. Use offsetofend() on both sides.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605081637.DbH4SZ1E-lkp@intel.com/
Fixes: 7e655ed7b953 ("sched_ext: Add bpf_sched_ext_ops_cid struct_ops type")
Signed-off-by: Tejun Heo <tj@kernel.org>

selftests/sched_ext: Fix select_cpu_dfl link leak on early return

If run() exits early via SCX_EQ/SCX_ASSERT (which calls return
directly), bpf_link__destroy() is never reached and the BPF
scheduler stays loaded. All subsequent tests then fail to attach
because SCX is not in the DISABLED state.

Move bpf_link into a context struct so cleanup() always destroys
it, regardless of how run() exits. Also skip waitpid() for children
where fork() returned -1, avoiding waitpid(-1,...) accidentally
reaping an unrelated child and triggering the early return path.

Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>

Merge tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly fixes, lots of them but all pretty small, amdgpu and xe are the
  usual but then a large amount of fixes all over.

  core:
   - fix race condition in handle change ioctl

  fb-helper:
   - fix clipping

  rust:
   - fix unsound initialization
   - fix GEM state cleanup
   - fix wrong ARef import

  ttm:
   - update GPU MM stats on pool shrinking

  i915:
   - Re-enable ccs modifiers on dg2

  nova:
   - fix mailing list

  xe:
   - Add NULL check for media_gt in intel_hdcp_gsc_check_status
   - Fix EAGAIN sign in pf_migration_consume
   - Fix MMIO access using PF view instead of VF view during migration
   - Exclude indirect ring state page from ADS engine state size

  amdgpu:
   - GFX9 fixes
   - Hawaii SMU fixes
   - SDMA4 fix
   - GART fix
   - Userq fixes

  amdkfd:
   - GPUVM TLB flush fix
   - Hotplug fix

  radeon:
   - Hawaii SMU fixes

  bochs:
   - fix managed cleanup

  bridge:
   - tda998x: fix sparse warnings on type correctness

  etnaviv:
   - schedule armed jobs

  exynos:
   - managed bridge cleanup

  ivpu:
   - disallow reexport of GEM buffer objects

  noveau:
   - revert support for GA100

  panel:
   - boe-tv101wum-nl16: use correct MIPI_DSI mode
   - feyjang-fy07024di26a30d: fix error reporting
   - himax-hx83102: use correct MIPI_DSI mode
   - himax-hx83121a: fix error checks
   - himax-hx83121a: select DRM_DISPLAY_DSC_HELPER

  qaic:
   - fix RAS message handling

  qxl:
   - clean up polling

  sti:
   - managed bridge cleanup

* tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel: (37 commits)
  drm: Set old handle to NULL before prime swap in change_handle
  drm/bochs: Drop manual put on probe error path
  drm/xe/guc: Exclude indirect ring state page from ADS engine state size
  drm/xe/pf: Fix MMIO access using PF view instead of VF view during migration
  drm/xe/pf: Fix EAGAIN sign in pf_migration_consume()
  drm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status()
  drm/exynos: remove bridge when component_add fails
  drm/amdgpu: nuke amdgpu_userq_fence_slab v2
  drm/amdgpu/userq: fix access to stale wptr mapping
  drm/amdkfd: Check if there are kfd porcesses using adev by kfd_processes_count
  drm/amdgpu: zero-initialize GART table on allocation
  drm/amdgpu/sdma4: replace BUG_ON with WARN_ON in fence emission
  drm/radeon: add missing revision check for CI
  drm/amdgpu/pm: align Hawaii mclk workaround with radeon
  drm/amdgpu/pm: add missing revision check for CI
  drm/amdgpu/gfx9: drop unnecessary 64-bit fence flag check in KIQ
  drm/amdkfd: Make all TLB-flushes heavy-weight
  drm/panel: himax-hx83102: restore MODE_LPM after sending disable cmds
  drm/panel: boe-tv101wum-nl6: restore MODE_LPM after sending disable cmds
  drm/panel: feiyang-fy07024di26a30d: return display-on error
  ...

drm/ttm: Fix ttm_bo_swapout() infinite LRU walk on swapout failure

When ttm_tt_swapout() fails, the current code calls
ttm_resource_add_bulk_move() followed by ttm_resource_move_to_lru_tail()
to restore the resource's bulk_move membership.

However, ttm_resource_move_to_lru_tail() places the resource at the tail
of the LRU list which, relative to the walk cursor's hitch node (placed
immediately after the resource when it was yielded), puts the resource
*in front of the* the hitch. The next list_for_each_entry_continue() from
the hitch finds the same resource again, causing an infinite loop.

Fix by deferring del_bulk_move to the success path only.

On the success path, TTM_TT_FLAG_SWAPPED has just been set by
ttm_tt_swapout() but the resource is still tracked in the bulk_move range,
so ttm_resource_del_bulk_move()'s !ttm_resource_unevictable() guard would
incorrectly skip the removal. Introduce
ttm_resource_del_bulk_move_unevictable() which bypasses that guard.

Reported-by: Jatin Kataria <jkataria@netflix.com>
Fixes: fc5d96670eb2 ("drm/ttm: Move swapped objects off the manager's LRU list")
Cc: Christian König <christian.koenig@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <dri-devel@lists.freedesktop.org>
Cc: <stable@vger.kernel.org> # v6.13+
Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260428094442.16985-1-thomas.hellstrom@linux.intel.com

Merge tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB serial device ids for 7.1-rc3

Here are some new modem device ids.

This one has been in linux-next with no reported issues.

* tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Telit Cinterion LE910Cx compositions

Merge tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:
"Core:
   - Cache-flushing fix for non-x86 platforms

  AMD-Vi:
   - Security fix when SEV-SNP is enabled
   - Operator precedence fix in DTE setting"

* tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/amd: Fix precedence order in set_dte_passthrough()
  iommu/pages: Fix iommu_pages_flush_incoherent() for non-x86
  iommu/amd: Use maximum PPR log buffer size when SNP is enabled on Family 0x19
  iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19

sched_ext: Use IRQ_WORK_INIT_HARD() to initialize sch->disable_irq_work

For built with PREEMPT_RT kernels, the scx_disable_irq_workfn() is
called from per-cpu irq_work kthreads context, this means that
when call the scx_dump_state() in the scx_disable_irq_workfn() to
output current->comm/pid, it always output current irq_work kthread's
comm/pid. this commit therefore use the IRQ_WORK_INIT_HARD() to
initialize sch->disable_irq_work to make scx_disable_irq_workfn() is
called from hardirq context.

Fixes: f4a6c506d118 ("sched_ext: Always bounce scx_disable() through irq_work")
Signed-off-by: Zqiang <qiang.zhang@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>

arm64/entry: Fix arm64-specific rseq brokenness

Mathias Stearn reports that since v6.19, there are two big issues
affecting rseq:

(1) On arm64 specifically, rseq critical sections aren't aborted when
    they should be.

(2) The 'cpu_id_start' field is no longer written by the kernel in all
    cases it used to be, including some cases where TCMalloc depends on
    the kernel clobbering the field.

This patch fixes issue #1. This patch DOES NOT fix issue #2, which will
need to be addressed by other patches.

The arm64-specific brokenness is a result of commits:

  2fc0e4b4126c ("rseq: Record interrupt from user space")
  39a167560a61 ("rseq: Optimize event setting")

The first commit failed to add a call to rseq_note_user_irq_entry() on
arm64. Thus arm64 never sets rseq_event::user_irq to record that it may
be necessary to abort an active rseq critical section upon return to
userspace. On its own, this commit had no functional impact as the value
of rseq_event::user_irq was not consumed.

The second commit relied upon rseq_event::user_irq to determine whether
or not to bother to perform rseq work when returning to userspace. As
rseq_event::user_irq wasn't set on arm64, this work would be skipped,
and consequently an active rseq critical section would not be aborted.

Fix this by giving arm64 syscall-specific entry/exit paths, and
performing the relevant logic in syscall and non-syscall paths,
including calling rseq_note_user_irq_entry() for non-syscall entry.

Currently arm64 cannot use syscall_enter_from_user_mode(),
syscall_exit_to_user_mode(), and irqentry_exit_to_user_mode(), due to
ordering constraints with exception masking, and risk of ABI breakage
for syscall tracing/audit/etc. For the moment the entry/exit logic is
left as arm64-specific, directly using enter_from_user_mode() and
exit_to_user_mode(), but mirroring the generic code.

I intend to follow up with refactoring/cleanup, as we did for kernel
mode entry paths in commit:

  041aa7a85390 ("entry: Split preemption from irqentry_exit_to_kernel_mode()")

... which will allow arm64 to use the GENERIC_IRQ_ENTRY functions directly.

Fixes: 39a167560a61 ("rseq: Optimize event setting")
Reported-by: Mathias Stearn <mathias@mongodb.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/regressions/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com/
Link: https://patch.msgid.link/20260508142023.3268622-1-mark.rutland@arm.com

x86/kexec: Push kjump return address even for non-kjump kexec

The version of purgatory code shipped by kexec-tools attempts to look above
the top of its stack to find a return address for a kjump, even in a non-kjump
kexec.

After the commit in Fixes: the word above the stack might not be there,
leading to a fault (which is at least now caught by my exception-handling code
in kexec).

That commit fixed things for the actual kjump path, but no longer
"gratuitously" pushes the unused return address to the stack in the non-kjump
path. Put that *back* in the non-kjump path, to prevent purgatory from
crashing when trying to access it.

Fixes: 2cacf7f23a02 ("x86/kexec: Fix stack and handling of re-entry point for ::preserve_context")
Reported-by: Rohan Kakulawaram <rohanka@google.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Rohan Kakulawaram <rohanka@google.com>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/32d627134143ffd957891cb697138e839c623211.camel@infradead.org

ntfs: avoid leaking uninitialised bytes in new security descriptors

ntfs_sd_add_everyone() builds the on-disk security descriptor for a
newly created file by kmalloc()'ing a buffer and then partially
filling it in:

sd = kmalloc(sd_len, GFP_NOFS);
...
sd->revision = 1;
sd->control = SE_DACL_PRESENT | SE_SELF_RELATIVE;
...

The buffer is then handed to ntfs_attr_add() and persisted as the
SECURITY_DESCRIPTOR attribute of the new MFT record.  The descriptor
covers a relative security descriptor header, two SIDs (owner and
group), an ACL header, and a single ACE, but several fields inside
those structures are never written before the buffer is committed
to disk:

  - struct security_descriptor_relative
        @alignment (1 byte)
        @sacl (4 bytes; SE_SACL_PRESENT is not set
                                 but the offset still reaches disk)

  - struct ntfs_sid (3 instances: owner, group, ACE.sid)
        identifier_authority.value[0..4] (5 bytes per SID, 15 total
                                          - only value[5] is set)

  - struct ntfs_acl
        @alignment1 (1 byte)
        @alignment2 (2 bytes)

That is 23 bytes of uninitialised slab memory persisted to disk for
every new file or directory the legacy ntfs driver creates.  The
"+ 4" trailing accounting in sd_len holds ace->sid.sub_authority[0],
which the existing code does explicitly write to zero, so it is
not part of the leak.

Anything later able to read the SECURITY_DESCRIPTOR attribute - the
same NTFS volume mounted on Windows or by another NTFS reader, an
offline forensics tool, an unprivileged user that ends up with read
access to the volume - can recover those bytes.  The leak persists
for the lifetime of the file on disk, not just the lifetime of the
kernel that wrote it.

Switch the allocation to kzalloc() so every byte the on-disk
descriptor covers is zero before the explicit initialisations run.
While there, replace the bare "return -1" allocation-failure path
with a proper -ENOMEM so the error reaches userspace as a meaningful
errno instead of an unrelated -EPERM.

Found by inspection while auditing fs/ntfs new-inode paths.

Fixes: af0db57d4293 ("ntfs: update inode operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: fix out-of-bounds write in ntfs_index_walk_down()

ntfs_index_walk_down() used to update the index traversal depth
directly before writing parent_pos[] and parent_vcn[]. A malformed
directory index with too many child-node levels can therefore advance
pindex past MAX_PARENT_VCN and write past the fixed arrays in struct
ntfs_index_context, corrupting context state used by later index
traversal.

Use ntfs_icx_parent_inc() for walk-down transitions so the existing
depth limit is enforced before the arrays are updated. Make the helper
check the limit before incrementing pindex so failed callers do not
leave the context at an out-of-range depth.

This is reachable by iterating a crafted NTFS directory after the volume
has been mounted, including read-only mounts. The reproducer uses
getdents64() on an index root that points to an excessively deep chain
of child index blocks.

A crafted directory index with a chain of child-node entries reproduced
UBSAN array-index-out-of-bounds reports in ntfs_index_walk_down() and
subsequent KASAN reports in ntfs_index_walk_up(). With this change, the
same image is rejected with "Index is over 32 level deep" and no KASAN
or UBSAN report is emitted.

Fixes: 0a8ac0c1fa0b ("ntfs: update directory operations")
Suggested-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: fix out-of-bounds write in ntfs_rl_collapse_range() merge path

ntfs_rl_collapse_range() merges the run on the left of the collapsed
region with the run on its right when they are contiguous. The contiguous
check chooses a clamped index when @new_1st_cnt is 0:

i = new_1st_cnt == 0 ? 1 : new_1st_cnt;
if (ntfs_rle_lcn_contiguous(&new_rl[i - 1], &new_rl[i])) {

but the merge itself uses the unclamped value:

s_rl = &new_rl[new_1st_cnt - 1];
s_rl->length += s_rl[1].length;

When @new_1st_cnt is 0 this computes &new_rl[-1] and writes 8 bytes
before the kvcalloc() runlist buffer. The path is reachable through
fallocate(FALLOC_FL_COLLAPSE_RANGE) starting at vcn 0 against an
attribute whose first run after the collapsed region and the following
run are holes. In that case ntfs_rle_lcn_contiguous() returns true
because both checked entries are LCN_HOLE, so the merge path is entered
with @new_1st_cnt still 0. Such consecutive holes do not occur on a
well-formed runlist (NTFS keeps runlists coalesced in memory), so this
OOB path is only reachable from a crafted volume.

A normal runlist has no element to the left of vcn 0, so the left/right
merge is not valid when @new_1st_cnt is 0. Require @new_1st_cnt to be
positive before checking or performing the merge. This skips the merge
entirely in that case instead of clamping the merge target.

The out-of-bounds write can corrupt an adjacent slab object. On a
non-KASAN kernel, it is reachable after a crafted NTFS volume has been
mounted read-write with the legacy fs/ntfs driver, by a local user that
has write access to the crafted file.

Fixes: 11ccc9107dc4 ("ntfs: update runlist handling and cluster allocator")
Suggested-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: fix variable dereferenced before check ni in ntfs_attr_open()

Smatch warnings:
ntfs_attr_open() warn: variable dereferenced before check 'ni'

Moves the ntfs_debug() call after the NULL pointer checks to ensure safe
access to the structure members.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: fix default_upcase refcount underflow and UAF on fs_context teardown

ntfs_init_fs_context() allocates a fresh ntfs_volume with vol->upcase
left as NULL. ntfs_free_fs_context() unconditionally calls
ntfs_volume_free() during fs_context teardown, even when ntfs_fill_super()
never ran or already cleaned up. ntfs_volume_free() then executes:

mutex_lock(&ntfs_lock);
if (vol->upcase == default_upcase) {
ntfs_nr_upcase_users--;
vol->upcase = NULL;
}

When the global default_upcase is also NULL (very first mount attempt,
or all prior mounts have released the table), the comparison is
NULL == NULL, and ntfs_nr_upcase_users is decremented even though this
volume never claimed a reference. ntfs_nr_upcase_users is unsigned long,
so the decrement wraps to ULONG_MAX.

A subsequent successful mount can then free the shared table while the
mounted volume still points at it:

  1. ntfs_fill_super() does the temporary ntfs_nr_upcase_users++ at the
     "Generate the global default upcase table if necessary" block. With
     the prior wraparound this brings the counter back to 0.
  2. If the volume's $UpCase matches the default, the match path does
     ntfs_nr_upcase_users++ and sets vol->upcase = default_upcase. The
     counter is now 1.
  3. On the success path, !--ntfs_nr_upcase_users evaluates true and
     default_upcase is kvfree()'d while vol->upcase still points at it.
     Subsequent upcase comparisons through that mount touch freed
     memory.

This was reproduced with KASAN by closing a fresh fsopen("ntfs") context,
then mounting an NTFS image whose $UpCase table matches
generate_default_upcase(), and finally doing a case-insensitive lookup.
KASAN reports the dangling vol->upcase access:

  BUG: KASAN: use-after-free in ntfs_collate_names+0x3b4/0x420
  Read of size 2 at addr ffff888008d40048 by task init/1
   ntfs_collate_names+0x3b4/0x420
   ntfs_lookup_inode_by_name+0x1921/0x3130
   ntfs_lookup+0x193/0xc40
   vfs_statx+0xc7/0x190
   vfs_fstatat+0x4b/0xa0
   __do_sys_newfstatat+0x92/0xf0

The same QEMU reproducer was rerun after this change with KASAN
enabled. It reached "reproducer finished", and the log contained no
KASAN, use-after-free, Oops, or panic signatures.

Guard each comparison with an explicit vol->upcase non-NULL check so a
volume that never took a reference cannot decrement the global users
counter. Apply the same guard to the other default_upcase release sites
so all cleanup paths follow the same ownership rule: only volumes that
actually hold a default_upcase reference may drop one.

Fixes: 1e9ea7e04472 ("Revert "fs: Remove NTFS classic"")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: match ntfs_resident_attr_min_value_length with $AttrDef

Update ntfs_resident_attr_min_value_length() to align with $AttrDef.
The $VOLUME_NAME is allowed to have the  size of 0.

The Windows 11 $AttrDef values are as follows:

Attribute Name             (ID)   Size (Min-Max)  Flags

$STANDARD_INFORMATION      (16)   48-72           Resident
$ATTRIBUTE_LIST            (32)   No Limit        Non-resident
$FILE_NAME                 (48)   68-578          Resident, Index
$OBJECT_ID                 (64)   0-256           Resident
$SECURITY_DESCRIPTOR       (80)   No Limit        Non-resident
$VOLUME_NAME               (96)   2-256           Resident
$VOLUME_INFORMATION        (112)  12-12           Resident
$DATA                      (128)  No Limit        (None)
$INDEX_ROOT                (144)  No Limit        Resident
$INDEX_ALLOCATION          (160)  No Limit        Non-resident
$BITMAP                    (176)  No Limit        Non-resident
$REPARSE_POINT             (192)  0-16384         Non-resident
$EA_INFORMATION            (208)  8-8             Resident
$EA                        (224)  0-65536         (None)
$LOGGED_UTILITY_STREAM     (256)  0-65536         Non-resident

Reported-by: woot000 <woot000@woot000.com>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: avoid use-after-free of index inode in ntfs_inode_sync_filename()

ntfs_inode_sync_filename() walks every FILE_NAME attribute and, for
each one that points at a different parent, opens the parent index
inode with ntfs_iget() and locks index_ni->mrec_lock.  All three error
branches (NInoBeingDeleted, ntfs_index_ctx_get failure, ntfs_index_lookup
failure) drop the parent reference before unlocking:

iput(index_vi);
mutex_unlock(&index_ni->mrec_lock);
continue;

index_ni is NTFS_I(index_vi), so the ntfs_inode (and its mrec_lock) is
embedded in the inode allocation.  If the parent directory is not held
outside the icache - no open dentry, recently evicted from dcache, no
other concurrent lookup - ntfs_iget() returns with i_count == 1 and
our iput() drops the last reference.  evict_inode() then runs and
destroy_inode() schedules the slab object for RCU free, while
mutex_unlock() on the next line is still touching index_ni->mrec_lock.

Swap the order so the mutex is dropped while index_vi is still alive,
matching the success path at the bottom of the loop which already
unlocks before iput().

Reproduced under KASAN with a debug build that forces
ntfs_index_ctx_get() to fail when the parent index inode has been
opened with i_count == 1.  KASAN reports a slab-use-after-free read
on the parent's mrec_lock from mutex_unlock() on the writeback worker:

  BUG: KASAN: slab-use-after-free in __mutex_unlock_slowpath+0xb5/0x970
  Read of size 8 at addr ffff8880014b7598 by task kworker/u8:0/12
  Workqueue: writeback wb_workfn (flush-253:0)
  Call Trace:
   mutex_unlock
   ntfs_inode_sync_filename
   __ntfs_write_inode
   ntfs_write_inode
   __writeback_single_inode

  Allocated by task 103:
   ntfs_alloc_big_inode
   ntfs_iget
   ntfs_lookup
   __x64_sys_mkdir

  Freed by task 12:
   ntfs_free_big_inode
   i_callback
   rcu_do_batch

  Last potentially related work creation:
   call_rcu
   destroy_inode
   evict
   dispose_list
   evict_inodes
   ntfs_inode_sync_filename
   __ntfs_write_inode

  The buggy address belongs to the object at ffff8880014b7440
   which belongs to the cache ntfs_big_inode_cache of size 1800

The freed object is the parent directory inode itself: allocated by
mkdir(2) via ntfs_iget(), then released through call_rcu(i_callback)
that destroy_inode() scheduled when evict_inodes() ran from inside
ntfs_inode_sync_filename().  Re-running the same workload with
mutex_unlock() moved before iput() runs cleanly under KASAN.

Fixes: af0db57d4293 ("ntfs: update inode operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: fix copy length in ntfs_bdev_write() for non-page-aligned start

This is not a normal data I/O hot path.  The single in-tree caller is
the $LogFile emptying path used during read-write mount/remount, and
the bug only becomes visible on NTFS volumes whose cluster_size is
strictly smaller than the kernel's PAGE_SIZE (typically 4 KiB on
x86_64).  Per Microsoft's format command documentation, NTFS supports
allocation unit sizes starting at 512 bytes, so 512 B, 1 KiB and 2 KiB
clusters are uncommon but valid on-disk configurations.  When
cluster_size >= PAGE_SIZE every "start" passed in is page-aligned and
the buggy "from != 0" path is never taken.

ntfs_bdev_write() splits the write across one or more block-device
folios.  Inside the loop, "to" is computed as the *end byte offset*
within the current page (0..PAGE_SIZE), and "from" is the start byte
offset within the page (reset to 0 from the second iteration onward).
The copy length should therefore be "to - from", but the current code
uses "to" directly:

to = min_t(u32, end - offset, PAGE_SIZE);
memcpy_to_folio(folio, from, buf + buf_off, to);
buf_off += to;

When "from != 0" (i.e. "start" is not page-aligned) memcpy_to_folio()
copies "from" extra bytes:

  - it reads "from" bytes past the source buffer into kernel heap;
  - it writes "from" bytes past the requested range into the next part
    of the block-device page (or, if "from + to > PAGE_SIZE", past the
    folio boundary entirely, which trips the VM_BUG_ON inside
    memcpy_to_folio() on CONFIG_DEBUG_VM=y kernels).

"buf_off" is then advanced by the wrong amount, so every subsequent
iteration also reads the source buffer at the wrong offset and writes
the wrong content to disk.

ntfs_empty_logfile() calls

ntfs_bdev_write(sb, empty_buf, NTFS_CLU_TO_B(vol, lcn),
vol->cluster_size);

with empty_buf sized to vol->cluster_size.  On a sub-PAGE_SIZE-cluster
volume, any $LogFile run whose LCN is not aligned to
PAGE_SIZE / cluster_size reaches the non-page-aligned path.  The
over-copy can read beyond empty_buf and overwrite the sectors following
the requested cluster in the block-device page with unrelated kernel
heap contents while $LogFile is being emptied.

A userspace reducer of the same arithmetic and copy loop confirms the
bug under AddressSanitizer: ASan reports a heap-buffer-overflow read
past the source buffer for the buggy length, and the fixed version is
ASan-clean.

Compute the copy length as "to - from" and advance buf_off by the same
amount.

Fixes: 5218cd102aec ("ntfs: update misc operations")
Link: https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/format
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: wait for sync mft writes to complete

ntfs_sync_mft_mirror() and write_mft_record_nolock() with @sync set
are both documented as synchronous, but neither actually waits for
the bio they submit nor inspects bi_status.  write_inode() can
return success while dirty mft record bytes are still in flight, and
bio errors are silently dropped: the volume is not marked with
errors and the inode is not redirtied.  This breaks fsync()/sync
metadata durability.

Switch ntfs_sync_mft_mirror() and the @sync path of
write_mft_record_nolock() to submit_bio_wait() and propagate the
returned error to the caller.  Capture ntfs_sync_mft_mirror()'s
return value at its call sites in write_mft_record_nolock() so a
mirror write failure surfaces too.

The @sync parameter only controls the main MFT bio.  The !@sync main
submission is therefore unchanged and still uses ntfs_bio_end_io() to
drop the folio reference taken before submission.  The mirror call
has always been documented as performing synchronous I/O regardless
of @sync, so making it actually block restores the originally
intended contract for both @sync and !@sync callers.

Note this only fixes the synchronous mirror/main paths reachable
from write_mft_record_nolock().  The main MFT write submitted from
ntfs_write_mft_block() (the .writepages path) still does not wait
for completion or check bi_status; that requires a larger
restructuring and is left to a follow-up patch.

Fixes: 115380f9a2f9 ("ntfs: update mft operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: capture mft mirror sync errors in ntfs_write_mft_block()

After ntfs_sync_mft_mirror() became able to return real I/O errors,
ntfs_write_mft_block() still discards its return value at the call
site inside the per-record loop. A failed $MFTMirr write therefore
leaves the volume looking clean from the writeback path even though
the on-disk mirror is now stale.

Capture the return value and feed it into the function's existing
@err variable using the same "first error wins" pattern already used
on other failure paths. The error is propagated to the caller and,
via the existing tail of the function, sets NVolErrors so umount and
chkdsk see the volume as inconsistent.

Fixes: 115380f9a2f9 ("ntfs: update mft operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: redirty folio when ntfs_write_mft_block() runs out of memory

ntfs_write_mft_block() is called by writeback_iter() with the folio
locked. When the per-call allocations for @locked_nis or @ref_inos
fail, the function returns -ENOMEM directly without unlocking the
folio. Any later task that needs the folio's lock then stalls, and
the folio's dirty state is silently lost from the writeback
iterator's point of view.

Use folio_redirty_for_writepage() so the folio remains dirty for a
subsequent writeback pass, unlock it, and only then return -ENOMEM
so the caller can propagate the error to fsync()/sync_filesystem().

Fixes: f462fdf3d6a4 ("ntfs: reduce stack usage in ntfs_write_mft_block()")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

ntfs: use base mft_no when looking up base inode for extent record

When the mft record is an extent record, ntfs_may_write_mft_record()
looks up its base inode in the icache. The hash key passed to
find_inode_nowait() must be the base inode's mft number (na.mft_no,
set just above to MREF_LE(m->base_mft_record)), but the code passes
@mft_no, the extent record's own number.

find_inode_nowait() uses its second argument as the hashval, so the
lookup lands in the wrong bucket and almost always returns NULL.
ntfs_may_write_mft_record() then returns false and the writeback
path (ntfs_write_mft_block()) skips that extent record, leaving the
on-disk copy permanently out of sync with the in-memory one.

The original ilookup5_nowait() call this conversion replaced used
na.mft_no. Restore that.

Fixes: 115380f9a2f9 ("ntfs: update mft operations")
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>

drm/xe: Convert stolen memory over to ttm_range_manager

Stolen memory requires physically contiguous allocations for display
scanout and compressed framebuffers. The stolen memory manager was
sharing the gpu_buddy allocator backend with the VRAM manager, but
buddy manages non-contiguous power-of-two blocks making it a poor fit.
Stolen memory also has fundamentally different allocation patterns:

- Allocation sizes are not power-of-two. Since buddy rounds up to the
  next power-of-two block size, a ~17MB request can fail even with
  ~22MB free, because the free space is fragmented across non-fitting
  power-of-two blocks.
- Hardware restrictions prevent using the first 4K page of stolen for
  certain allocations (e.g., FBC). The display code sets fpfn=1 to
  enforce this, but when fpfn != 0, gpu_buddy enables
  GPU_BUDDY_RANGE_ALLOCATION mode which disables the try_harder
  coalescing path, further reducing allocation success.

This combination caused FBC compressed framebuffer (CFB) allocation
failures on platforms like NVL/PTL. In case of NVL where stolen memory
is ~56MB and the initial plane framebuffer consumes ~34MB at probe time,
leaving ~22MB for subsequent allocations.

Use ttm_range_man_init_nocheck() to set up a drm_mm-backed TTM resource
manager for stolen memory. This reuses the TTM core's ttm_range_manager
callbacks, avoiding duplicate implementations.

Tested on NVL with a 4K DP display: stolen_mm shows a single ~22MB
contiguous free hole after initial plane framebuffer allocation, and
FBC successfully allocates its CFB from that region. The corresponding
IGT was previously skipped and now passes.

v2:
- Clarify that stolen memory requires contiguous allocations (Matt B)
- Properly handle xe_ttm_resource_visible() for stolen instead of
  unconditionally returning true (Matt A)

v3:
- Rebase
- Fix xe_display_bo_fbdev_prefer_stolen() to compare in pages, since
  ttm_range_manager stores stolen->size in pages not bytes (Matt A)

v4:
- Add kernel-doc for struct xe_ttm_stolen_mgr (Matt B)

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7631
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Assisted-by: GitHub Copilot:claude-sonnet-4.6
Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260422125502.3088222-2-sanjay.kumar.yadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-xe-next

Bringing in recent display changes.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

dt-bindings: arm: st,nomadik: Convert to DT schema

Convert the ST Nomadik boards binding from free-form text to DT schema.

The binding documents the Nomadik NHK15/USB-S8815 platform compatibles.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Andrew Rembrandt <kernel@rembrandt.dev>
Link: https://patch.msgid.link/20260507-dt-bindings-arm-st-nomadik-yaml-v2-1-8ab05d1cda96@rembrandt.dev
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

drm/buddy: Integrate lockdep annotations for gpu buddy manager

gpu_buddy APIs are expected to be called with the driver-provided lock
held, but there is no runtime enforcement of this contract. Add lockdep
annotations to catch locking violations early.

Introduce gpu_buddy_driver_set_lock() for the driver to register the
lock that protects the buddy manager. Add gpu_buddy_driver_lock_held()
assertions to all exported gpu_buddy and drm_buddy APIs that
access/modify the manager state. The lock_dep_map field is only compiled
in when CONFIG_LOCKDEP is enabled, adding zero overhead to production
builds.

Wire up xe_ttm_vram_mgr to register its mutex with the buddy manager
after initialization.

Assisted-by: Copilot:claude-opus-4.6
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patch.msgid.link/20260508065544.4049240-2-tejas.upadhyay@intel.com

dlm: init per node debugfs before add to node hash

Avoiding potential issues when a node is added to the hash but the
debugfs is not NULL or IS_ERR() so a potential iteration over the hash
and debugfs_remove() will not fail like in dlm_midcomms_exit().

However dlm_midcomms_exit() will be called in module init/exit function
and the hash should be empty anyway at those stages. We change the
behavior as cleanup to avoid potential issues.

Reported-by: Ginger <ginger.jzllee@gmail.com>
Closes: https://lore.kernel.org/gfs2/CAGp+u1ZE7UsQ4sSUHBKQXU8x3M_jwK=ek1urSjEtd3jXQGFmVg@mail.gmail.com
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

dlm: fix add msg handle in send_queue ordered

In a benchmark scenario triggering a lot of requests that triggers a lot
of DLM messages on the network it can be that the mh->seq is not ordered
according the oldest seq number. This ordering is required by
dlm_receive_ack as "before(mh->seq, seq)" will stop to check for older
sequence numbers that are ordered in the tail of "node->send_queue".

The side effects of not having it correct ordered regarding
"before(mh->seq, seq)" are refcounting issues and use-after free.

I only was able to reproduce this issue in a experimental DLM branch
and a user space DLM benchmark that uses io_uring. After changing this I
don't experienced any refcounting with the sending buffer issues anymore.

Fixes: 489d8e559c659 ("fs: dlm: add reliable connection if reconnect")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

dlm: add usercopy whitelist to dlm_cb cache

The dlm_cb slab cache is created with kmem_cache_create(), which
provides no usercopy whitelist. When a callback carries LVB data,
dlm_user_add_ast() copies the LVB into the inline lvbptr[] array within
the slab-allocated struct dlm_callback and redirects ua->lksb.sb_lvbptr
to point to it. copy_result_to_user() then calls copy_to_user() with
this pointer. With CONFIG_HARDENED_USERCOPY enabled, this triggers
usercopy_abort().

Switch to kmem_cache_create_usercopy() with a whitelist covering the
lvbptr field.

Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Acked-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

dlm: use hlist_for_each_entry_srcu for SRCU protected lists

The connection and node hash tables in DLM are protected by SRCU, but
the code currently uses hlist_for_each_entry_rcu() for traversal.
While this works functionally, it is semantically incorrect and triggers
warnings when RCU lockdep debugging is enabled, as it expects regular
RCU read-side critical sections.

This patch replaces the incorrect macros with hlist_for_each_entry_srcu()
and adds the appropriate lockdep expressions using srcu_read_lock_held()
to ensure consistency with the underlying locking mechanism.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

Merge tag 'riscv-dt-fixes-for-v7.1-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes

RISC-V devicetrees fixes for v7.1-rc3

Microchip:
Fix a pinctrl misconfiguration caused by a erratum fixed between
engineering sample and production silicon, that causes settings for one
to not apply to the other.

Starfive:
Remove nodes relating to the "camss" video device that has been deleted
entirely from staging.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-dt-fixes-for-v7.1-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: microchip: fix icicle i2c pinctrl configuration
riscv: dts: starfive: jh7110: Drop CAMSS node

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

ublk: fix use-after-free in ublk_cancel_cmd()

When ublk_reset_ch_dev() clears io->cmd via ublk_queue_reinit()
concurrently with ublk_cancel_cmd(), ublk_cancel_cmd() can read a
stale pointer and pass it to io_uring_cmd_done(), causing a
use-after-free.

Fix by synchronizing the two paths with ubq->cancel_lock:

- ublk_cancel_cmd(): read and clear io->cmd under cancel_lock,
  then call io_uring_cmd_done() on the saved local copy outside
  the lock.

- ublk_reset_ch_dev(): hold cancel_lock across ublk_queue_reinit()
  so that io->cmd and io->flags are cleared atomically with respect
  to ublk_cancel_cmd().

Fixes: 216c8f5ef0f2 ("ublk: replace monitor with cancelable uring_cmd")
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260508123746.242018-1-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

batman-adv: bla: put backbone reference on failed claim hash insert

When batadv_bla_add_claim() fails to insert a new claim into the hash, it
leaked a reference to the backbone_gw for which the claim was intended.
Call batadv_backbone_gw_put() on the error path to release the reference
and avoid leaking the backbone_gw object.

Cc: stable@kernel.org
Fixes: 3db0decf1185 ("batman-adv: Fix non-atomic bla_claim::backbone_gw access")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: bla: only purge non-released claims

When batadv_bla_purge_claims() goes through the list of claims, it is only
traversing the hash list with an rcu_read_lock(). Due to a potential
parallel batadv_claim_put(), it can happen that it encounters a claim which
was actually in the process of being released+freed by
batadv_claim_release(). In this case, backbone_gw is set to NULL before the
delayed RCU kfree is started. Calling batadv_bla_claim_get_backbone_gw() is
then no longer allowed because it would cause a NULL-ptr derefence.

To avoid this, only claims with a valid reference counter must be purged.
All others are already taken care of.

Cc: stable@kernel.org
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: bla: prevent use-after-free when deleting claims

When batadv_bla_del_backbone_claims() removes all claims for a backbone, it
does this by dropping the link entry in the hash list. This list entry
itself was one of the references which need to be dropped at the same time
via batadv_claim_put().

But the batadv_claim_put() must not be done before the last access to the
claim object in this function. Otherwise the claim might be freed already
by the batadv_claim_release() function before the list entry was dropped.

Cc: stable@kernel.org
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: tp_meter: fix tp_num leak on kmalloc failure

When batadv_tp_start() or batadv_tp_init_recv() fail to allocate a new
tp_vars object, the previously incremented bat_priv->tp_num counter is
never decremented. This causes tp_num to drift upward on each allocation
failure. Since only BATADV_TP_MAX_NUM sessions can be started and the count
is never reduced for these failed allocations, it causes to an exhaustion
of throughput meter sessions. In worst case, no new throughput meter
session can be started until the mesh interface is removed.

The error handling must decrement tp_num releasing the lock and aborting
the creation of an throughput meter session

Cc: stable@kernel.org
Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
Signed-off-by: Sven Eckelmann <sven@narfation.org>

batman-adv: stop caching unowned originator pointers in BAT IV

BAT IV keeps the last-hop neighbor address in each neigh_node, but some
paths also cache an originator pointer derived from a temporary lookup.
That pointer is not owned by the neigh_node and may no longer refer to a
live originator entry after purge handling runs.

Stop storing the auxiliary originator pointer in the BAT IV neighbor
state. When BAT IV needs the neighbor originator data, resolve it from
the stored neighbor address and drop the reference again after use.

Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
[sven: avoid bonding logic for outgoing OGM]
Signed-off-by: Sven Eckelmann <sven@narfation.org>

dt-bindings: display: bridge: lt9211: Require data-lanes on DSI input ports

The Lontium LT9211 is capable of 1..4 DSI lanes per input DSI port,
describe the lane count for each input port in the schema.

For example Linux kernel driver does already use that information and
fails to probe if it is missing.

Signed-off-by: Marek Vasut <marex@nabladev.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260407203109.34302-1-marex@nabladev.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

drm/xe/madvise: Track purgeability with BO-local counters

xe_bo_recompute_purgeable_state() walks all VMAs of a BO to determine
whether the BO can be made purgeable. This makes VMA create/destroy and
madvise updates O(n) in the number of mappings.

Replace the walk with BO-local counters protected by the BO dma-resv
lock:

  - vma_count tracks the number of VMAs mapping the BO.
  - willneed_count tracks active WILLNEED holders, including WILLNEED
    VMAs and active dma-buf exports for non-imported BOs.

A DONTNEED BO is promoted back to WILLNEED on a 0->1 transition of
willneed_count. A BO is demoted to DONTNEED on a 1->0 transition only
when it still has VMAs, preserving the previous behaviour where a BO
with no mappings keeps its current madvise state.

PURGED remains terminal, preserving the existing "once purged, always
purged" rule.

Fixes: 4f44961eab84 ("drm/xe/vm: Prevent binding of purged buffer objects")
v2:
  - Use early return for imported BOs in all four helpers to avoid
    nesting (Matt B).
  - Group purgeability state into a purgeable sub-struct on struct
    xe_bo (Matt B).
  - Reword xe_bo_willneed_put_locked() kernel-doc to explain that a 1->0
    transition means all remaining active VMAs are DONTNEED (Matt B).

v3:
  - Move DONTNEED/PURGED reject from vma_lock_and_validate() into
    xe_vma_create(), gated on attr->purgeable_state == WILLNEED.
    Fixes vm_bind bypass and partial-unbind rejection on DONTNEED
    BOs (Matt B).
  - Drop .check_purged from MAP and REMAP; keep it for PREFETCH and
    add a comment why (Matt B).
  - Skip BO validation in vma_lock_and_validate() for non-WILLNEED
    VMA remnants so cleanup/remap paths do not repopulate
    DONTNEED/PURGED BOs.

Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260506132027.2556046-1-arvind.yadav@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>

fs/resctrl: Document tasks file behaviour for task id 0 and idle tasks

When 0 is written to the tasks file it is interpreted as the current task in
rdtgroup_move_task(). Each CPU's idle task has task_struct::pid set to 0 and,
on x86, task_struct::closid to RESCTRL_RESERVED_CLOSID and task_struct::rmid
to RESCTRL_RESERVED_RMID.

Equivalently, on MPAM platforms, thread_info::mpam_partid_pmg is encoded with
PARTID and PMG set to RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID,
respectively. As there is no interface to change these from the default, the
resctrl configuration for the idle tasks is fixed and they always behave
equivalently to a task in the default tasks file and so take their
configuration from the cpus/cpus_list files.

On read of the tasks file, show_rdt_tasks() filters out any 0 PID. Hence, a
task id of 0 is never shown in the tasks file and the idle tasks are not
represented either.

Document the user visible behaviour.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/20260506082855.3694761-1-ben.horgan@arm.com

fs/resctrl: Document that automatic counter assignment is best effort

When using automatic counter assignment it's useful for a user to know
which counters they can expect to be assigned on group creation.

Document that automatic counter assignment is best effort and how to
discover any assignment failures.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/20260506082855.3694761-1-ben.horgan@arm.com

fs/resctrl: Continue counter allocation after failure

In mbm_event mode, with mbm_assign_on_mkdir set to 1, when a user creates a
new CTRL_MON or MON group resctrl attempts to allocate counters for each of
the supported MBM events on each resctrl domain. As counters are limited,
such allocation may fail and when it does counter allocations for the
remaining domains are skipped even if the domains have available counters.

Because of that, the user needs to view the resource group'smbm_L3_assignments
file to get an accurate view of counter assignment in a new resource group and
then manually create counters in the skipped domains with available counters.

Writes to mbm_L3_assignments using the wildcard format, <event>:*=e, also skip
counter allocation in other domains after a counter allocation failure.

When handling a request to create counters in all domains it is unnecessary
for a counter allocation in one domain to prevent counter allocation in
other domains. Always attempt to allocate all the counters requested.

[ bp: Massage commit message. ]

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/20260506082855.3694761-1-ben.horgan@arm.com

drm: Set old handle to NULL before prime swap in change_handle

There was a potential race condition in change_handle. The ioctl
briefly had a single object with two idr entries; a concurrent
gem_close could delete the object and remove one of the handles
while leaving the other one dangling, which could subsequently
be dereferenced for a use-after-free.

To fix this, do the same dance that gem_close itself does.
(f6cd7daecff5 drm: Release driver references to handle before making it available again)
First idr_replace the old handle to NULL. Later, if the prime
operations are successful, actually close it.

create_tail required a similar dance to avoid a similar problem.
(bd46cece51a3 drm/gem: Fix race in drm_gem_handle_create_tail())
It idr_allocs the new handle with NULL, then swaps in the correct
object later to avoid races. We don't need to do that here, since
the only operations that could race are drm_prime, and
change_handle holds the prime lock for the entire duration.

v2: cleanups of error paths

Signed-off-by: David Francis <David.Francis@amd.com>
Co-authored-by: Dave Airlie <airlied@gmail.com>
Reported-by: Puttimet Thammasaeng <pwn8official@gmail.com>
Tested-by: Vitaly Prosyak <Vitaly.Prosyak@amd.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Christian Koenig <Christian.Koenig@amd.com>
Fixes: 53096728b8910 ("drm: Add DRM prime interface to reassign GEM handle")
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge tag 'amd-drm-fixes-7.1-2026-05-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-7.1-2026-05-06:

amdgpu:
- GFX9 fixes
- Hawaii SMU fixes
- SDMA4 fix
- GART fix
- Userq fixes

amdkfd:
- GPUVM TLB flush fix
- Hotplug fix

radeon:
- Hawaii SMU fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260506154631.1733034-1-alexander.deucher@amd.com

wifi: cfg80211: advance loop vars in cfg80211_merge_profile()

cfg80211_merge_profile() reassembles a Multi-BSSID non-transmitted BSS
profile that has been split across multiple consecutive MBSSID elements.
Its while-loop calls

cfg80211_get_profile_continuation(ie, ielen, mbssid_elem, sub_elem)

but never advances mbssid_elem or sub_elem inside the body. Each
iteration therefore searches for a continuation that follows the same
fixed pair; the helper returns the same next_mbssid; and the same
next_sub bytes are memcpy()'d into merged_ie at a growing offset until
the buffer fills.

Advance both mbssid_elem and sub_elem to the just-consumed continuation
so the next call to cfg80211_get_profile_continuation() searches for a
further continuation beyond it (or returns NULL when none exists).

A specially-crafted malicious beacon can take advantage of this bug
to cause the kernel to spend an excessive amount of time in
cfg80211_merge_profile (up to as much as 2ms per beacon received),
which could theoretically be abused in some way.

Cc: stable@vger.kernel.org
Fixes: fe806e4992c9 ("cfg80211: support profile split between elements")
Signed-off-by: John Walker <johnwalker0@gmail.com>
Link: https://patch.msgid.link/20260507230720.64783-1-johnwalker0@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

x86/cpu, cpufreq: Remove AMD ELAN support

Now that i486 and CONFIG_MELAN support has been removed upstream:

8b793a92d862c ("x86/cpu: Remove M486/M486SX/ELAN support")

the CONFIG_ELAN_CPUFREQ and CONFIG_SC520_CPUFREQ cpufreq
drivers can be removed as well, as they depend on CONFIG_MELAN.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-pm@vger.kernel.org (open list:CPU FREQUENCY SCALING FRAMEWORK)
Link: https://lore.kernel.org/r/20250425084216.3913608-8-mingo@kernel.org

x86/fpu: Remove the math-emu/ FPU emulation library

Now that all enabling code is gone, remove the
FPU emulation library as well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ahmed S. Darwish <darwi@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250425084216.3913608-11-mingo@kernel.org

x86/fpu: Remove the 'no387' boot option

Without math emulation there's no point to this option.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ahmed S. Darwish <darwi@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250425084216.3913608-10-mingo@kernel.org

x86/fpu: Remove MATH_EMULATION and related glue code

Now that support for 486 CPUs is gone upstream, remove
the x86 mathemu code integration.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250425084216.3913608-9-mingo@kernel.org

cpufreq/amd-pstate-ut: Drop policy reference before driver switch

Recent changes to the EPP unit test tries to perform a driver switch
with a cpufreq_policy reference held when the driver is loaded into
anything but the active mode which leads to a circular dependency and
the unit test hanging indefinitely.

Drop the reference before driver switch and grab it back once the driver
mode is stabilized for the test.

The EPP writes are only possible with CPUFREQ_POLICY_POWERSAVE policy.
Temporarily switch the cpudata->policy (while holding the write end of
the policy->rwsem) to CPUFREQ_POLICY_POWERSAVE and restore the original
policy once tests are done. To ensure the final EPP is correct in case
the driver started with CPUFREQ_POLICY_PERFORMANCE, EPP performance is
tested last.

The __free() based cleanup for cpufreq_policy is lost in the process.

Reported-by: Kalpana Shetty <kalpana.shetty@amd.com>
Fixes: 7e173bc310d2b ("cpufreq/amd-pstate-ut: Add a unit test for raw EPP")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20260508051748.10484-7-kprateek.nayak@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

cpufreq/amd-pstate: Use "epp_default_dc" as default when dynamic_epp is disabled

If "dynamic_epp" is disabled, the driver initialization and the default
EPP selection from sysfs currently sets the EPP based on the power
supply state of the system at that time but there is no power supply
callbacks registered to toggle it when the power supply state changes.

This can lead to faster battery drain on platforms that start off while
being plugged to the wall but later move to battery power since the EPP
stays at AMD_CPPC_EPP_PERFORMANCE.

Use "epp_default_dc" as the default EPP selection when dynamic_epp is
disabled, restoring older behavior. On servers, this defaults to
AMD_CPPC_EPP_PERFORMANCE and on other platforms, it defaults to
AMD_CPPC_EPP_BALANCE_PERFORMANCE.

Fixes: e30ca6dd5345 ("cpufreq/amd-pstate: Add dynamic energy performance preference")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20260508051748.10484-6-kprateek.nayak@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

cpufreq/amd-pstate: Reorder notifier unregistration and floor perf reset

An active power supply notifier can race with amd_pstate_epp_cpu_exit()
trying to reset the floor perf and can overwrite the floor perf set in
MSR_AMD_CPPC_REQ.

Unregister the notifier before setting the floor perf to prevent the
rare race.

Fixes: e30ca6dd5345 ("cpufreq/amd-pstate: Add dynamic energy performance preference")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20260508051748.10484-5-kprateek.nayak@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

cpufreq/amd-pstate: Allow writes to dynamic_epp when state isn't modified

Writing the current "dynamic_epp" state to sysfs fails with -EINVAL even
though the desired result was achieved. Allow writes to "dynamic_epp"
that does not modify the state.

Fixes: e30ca6dd5345 ("cpufreq/amd-pstate: Add dynamic energy performance preference")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20260508051748.10484-4-kprateek.nayak@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

cpufreq/amd-pstate: Return -ENOMEM on failure to allocate profile_name

Failure to allocate profile name will return -EINVAL from
platform_profile_register() while in fact, it is a failure to allocate
memory for the profile_name string.

Return -ENOMEM when kasprintf() fails to allocate profile_name string.

Fixes: e30ca6dd5345 ("cpufreq/amd-pstate: Add dynamic energy performance preference")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20260508051748.10484-3-kprateek.nayak@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>