git.ipfire.org Git - thirdparty/kernel/stable.git/log

accel/amdxdna: Fix use-after-free in amdxdna_gem_dmabuf_mmap()

When vm_insert_pages() fails, the error path calls vma->vm_ops->close(vma)
which internally calls drm_gem_vm_close() → drm_gem_object_put(),
releasing the GEM object reference acquired at the start of the function.
However, the close_vma label then falls through to put_obj, which calls
drm_gem_object_put() a second time on the same object.

If the first put releases the last reference, the object is freed and the
second put accesses freed memory, causing a use-after-free.

Fix by returning directly from close_vma instead of falling through to
put_obj, since the close handler already performs all necessary cleanup
including the object put.

Cc: stable@vger.kernel.org
Fixes: e486147c912f ("accel/amdxdna: Add BO import and export")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260625113239.49764-1-vulab@iscas.ac.cn

RDMA/hns: Fix potential integer overflow in mhop hem cleanup

In hns_roce_cleanup_mhop_hem_table(), the expression:

obj = i * buf_chunk_size / table->obj_size;

is evaluated using 32-bit unsigned arithmetic because
'buf_chunk_size' is u32 and the usual arithmetic conversions convert
'i' to unsigned int. The result is assigned to a u64 variable, but the
multiplication may overflow before the assignment.

For sufficiently large HEM tables, this produces an incorrect object
index passed to hns_roce_table_mhop_put().

Cast 'i' to u64 before the multiplication so that the intermediate
calculation is performed with 64-bit arithmetic.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: a25d13cbe816 ("RDMA/hns: Add the interfaces to support multi hop addressing for the contexts in hip08")
Link: https://patch.msgid.link/r/20260627095951.51378-1-listdansp@mail.ru
Signed-off-by: Danila Chernetsov <listdansp@mail.ru>
Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/core: Fix memory leak in __ib_create_cq() on invalid cqe

Move the zero CQE validation before rdma_zalloc_drv_obj() to avoid
leaking the CQ object when returning -EINVAL.

Fixes: a2917582887a ("RDMA/core: Reject zero CQE count")
Link: https://patch.msgid.link/r/20260625020148.224537-1-zhaochenguang@kylinos.cn
Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/mana_ib: initialize err for empty send WR lists

mana_ib_post_send() returns err after walking the send work request list.
If the caller passes an empty list, the loop is skipped and err is not
assigned.

Initialize err to 0 so an empty send work request list returns success
instead of stack data.

Fixes: c8017f5b4856 ("RDMA/mana_ib: UD/GSI work requests")
Link: https://patch.msgid.link/r/20260618041752.481193-2-ruoyuw560@gmail.com
Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

RDMA/erdma: initialize ret for empty receive WR lists

erdma_post_recv() returns ret after walking the receive work request list.
If the caller passes an empty list, the loop is skipped and ret is not
assigned.

Initialize ret to 0 so an empty receive work request list returns success
instead of stack data.

Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation")
Link: https://patch.msgid.link/r/20260618041752.481193-1-ruoyuw560@gmail.com
Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

iio: adc: ad7779: add missing 'select IIO_TRIGGERED_BUFFER' to Kconfig

The Kconfig entry for the AD7779 is missing a
'select IIO_TRIGGERED_BUFFER' parameter, causing build failures.

Fixes: c9a3f8c7bfcb ("drivers: iio: adc: add support for ad777x family")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Crofts <joshua.crofts1@gmail.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>

iio: adc: ad4130: add missing `select IIO_TRIGGERED_BUFFER` to Kconfig

The Kconfig entry is missing a `select IIO_TRIGGERED_BUFFER` parameter,
causing potential build failures.

Fixes: ec98c3b50157 ("iio: adc: ad4130: add new supported parts")
Cc: stable@vger.kernel.org
Signed-off-by: Joshua Crofts <joshua.crofts1@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>

RDMA/irdma: Prevent user-triggered null deref on QP create

Previously, the user QP creation path would only attempt to
populate iwqp->iwpbl if the user-provided req.user_wqe_bufs
field was non-zero. The problem is that iwqp->iwpbl is
unconditionally dereferenced later on in irdma_setup_virt_qp.

While there was a check for iwqp->iwpbl != NULL, this check
would only occur if req.user_wqe_bufs was non-zero. The end
result is that a user could send a zero user_wqe_bufs value
and trigger a null ptr deref.

Fix this by unconditionally calling irdma_get_pbl and bailing
if it fails, similar to the CQ and SRQ paths.

Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://patch.msgid.link/r/20260617164013.280790-1-jmoroni@google.com
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Reviewed-by: David Hu <xuehaohu@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Merge tag 'net-7.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and batman-adv.

  Current release - new code bugs:

   - netfilter: cthelper: cap to maximum number of expectation per master

  Previous releases - regressions:

   - netpoll: fix a use-after-free on shutdown path

   - tcp: restore RCU grace period in tcp_ao_destroy_sock

   - ipv6: fix NULL deref in fib6_walk_continiue() on multi-batch dump

   - batman-adv: dat: ensure accessible eth_hdr proto field

   - eth:
      - virtio_net: disable cb when NAPI is busy-polled
      - lan743x: Initialize eth_syslock spinlock before use

  Previous releases - always broken:

   - netfilter:
      - nft_set_pipapo: don't leak bad clone into future transaction

   - sched:
      - sch_teql: Introduce slaves_lock to avoid race condition and UAF
      - replace direct dequeue call with peek and qdisc_dequeue_peeked

   - sctp: add INIT verification after cookie unpacking

   - tipc: fix out-of-bounds read in broadcast Gap ACK blocks

   - seg6: validate SRH length before reading fixed fields

   - eth:
      - mlx5e: fix use-after-free of metadata_dst on RX SC delete
      - enetc: check the number of BDs needed for xdp_frame
      - fbnic: don't cache shinfo across skb realloc"

* tag 'net-7.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
  net/mlx5: HWS, fix matcher leak on resize target setup failure
  net/sched: hhf: clear heavy-hitter state on reset
  net/sched: dualpi2: clear stale classification on filter miss
  net/sched: act_bpf: use rcu_dereference_bh() to read the filter
  selftests: drv-net: tso: don't touch dangerous feature bits
  cxgb4: Fix decode strings dump for T6 adapters
  virtio_net: disable cb when NAPI is busy-polled
  sctp: fix addr_wq_timer race in sctp_free_addr_wq()
  selftests: net: bump default cmd() timeout to 20 seconds
  bridge: stp: Fix a potential use-after-free when deleting a bridge
  net/sched: sch_teql: Introduce slaves_lock to avoid race condition and UAF
  net: gianfar: dispose irq mappings on probe failure and device removal
  net: lan743x: Initialize eth_syslock spinlock before use
  net: libwx: fix VMDQ mask for 1-queue mode
  net: airoha: fix max receive size configuration
  fsl/fman: Free init resources on KeyGen failure in fman_init()
  netfilter: nftables: restrict checkum update offset
  netfilter: nftables: restrict linklayer and network header writes
  netfilter: nfnetlink_queue: restrict writes to network header
  netfilter: nft_fib: reject fib expression on the netdev egress hook
  ...

Merge tag 'hwmon-for-v7.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- adm1275: Detect coefficient overflow, and prevent reading
   uninitialized stack

- aspeed-g6-pwm-tach: Guard fan RPM calculation against divide-by-zero

- asus_atk0110: Check package count before accessing element

- ltc4283: fix malformed table docs build error

- occ: Unregister sysfs devices outside occ lock to avoid lockdep
   warning

- pmbus core: Fix passing events to regulator core, and honor
   vrm_version in pmbus_data2reg_vid()

- w83627hf: Remove VID sysfs files on error and remove

- w83793: remove vrm sysfs file on probe failure

- Various: Add missing 'select REGMAP_I2C' to Kconfig

* tag 'hwmon-for-v7.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (aspeed-g6-pwm-tach) Guard fan RPM calculation against divide-by-zero
  hwmon: (pmbus) Fix passing events to regulator core
  hwmon: adm1275: Detect coefficient overflow
  hwmon: adm1275: Prevent reading uninitialized stack
  hwmon: (max6697) add missing 'select REGMAP_I2C' to Kconfig
  hwmon: (ltc2992) add missing 'select REGMAP_I2C' to Kconfig
  hwmon: (max1619) add missing 'select REGMAP' to Kconfig
  hwmon: (w83627hf) remove VID sysfs files on error and remove
  hwmon: (w83793) remove vrm sysfs file on probe failure
  hwmon: (asus_atk0110) Check package count before accessing element
  docs: hwmon: ltc4283: fix malformed table docs build error
  hwmon: (pmbus/core) honor vrm_version in pmbus_data2reg_vid()
  hwmon: (occ) unregister sysfs devices outside occ lock

Merge tag 'mfd-fixes-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MFD fix from Lee Jones:

- Add MFD mailing list to MAINTAINERS

* tag 'mfd-fixes-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
MAINTAINERS: Add a mailing list entry to MFD

s390/monwriter: Reject buffer reuse with different data length

When data buffers are reused, e.g. for interval sample records, the
first record determines the data length, and the size of the buffer for
user copy. Current monwriter code does not check if the data length was
changed for subsequent records, which also would never happen for valid
user programs.

However, a malicious user could change the data length, resulting in out
of bounds user copy to the kernel buffer, and memory corruption. By
default, the monwriter misc device is created with root-only permissions,
so practical impact is typically low.

Fix this by checking for changed data length and rejecting such records.

Cc: stable@vger.kernel.org
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

cifs: Fix missing credit release on failure in cifs_issue_read()

Fix missing release of credits in the failure path in cifs_issue_read()
lest retrying the subreq just overwrites the credits value.

Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks")
Link: https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>

ublk: snapshot batch commands before preparing I/O

The batch prepare path rereads its userspace element array when rolling
back a partially prepared batch. Userspace can change an already
processed tag before the second read, causing rollback to reject the
replacement tag and leave earlier I/O slots prepared. The
WARN_ON_ONCE() in the rollback path then fires.

Copy the bounded batch into kernel memory before changing any I/O state
and use the same snapshot for preparation and rollback. Commit and fetch
batches retain the existing chunked userspace walk.

Fixes: b256795b3606 ("ublk: handle UBLK_U_IO_PREP_IO_CMDS")
Reported-by: syzbot+1a67ee1aa79484801ec6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1a67ee1aa79484801ec6
Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260630211827.50475-1-alhouseenyousef@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

RDMA/irdma: Prevent rereg_mr for non-mem regions

When a QP/CQ/SRQ is created, a two step process is used
where the buffer is allocated in userspace and explicitly
registered with the normal reg_mr mechanism prior to creating
the actual QP/CQ/SRQ object.

These special registrations are indicated via an ABI field
so the driver knows that they do not have a valid mkey and
to skip the actual CQP command submission.

Since these are real MR objects from the core's perspective,
it is possible for a user application to invoke rereg_mr on them
and cause a real CQP op to be emitted with the zero-initialized
mkey value of 0.

Fix this by preventing rereg_mr on these special regions.

Fixes: 5ac388db27c4 ("RDMA/irdma: Add support to re-register a memory region")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Reviewed-by: David Hu <xuehaohu@google.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

batman-adv: ensure minimal ethernet header on TX

As documented in commit 8bd67ebb50c0 ("net: bridge: xmit: make sure we have
at least eth header len bytes"), it is possible by for a local user with
eBPF TC hook access to attach a tc filter which truncates the packet and
redirects to an batadv interface. But the code assumes that at least
ETH_HLEN bytes are available and thus might read outside of the available
buffer.

The batadv_interface_tx() must therefore always check itself if enough data
is available for the ethernet header and don't rely on min_header_len.

Cc: stable@vger.kernel.org
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>

uprobes/x86: Use proper mm_struct in __in_uprobe_trampoline

In the unregister path we use __in_uprobe_trampoline check with
current->mm for the VMA lookup, which is wrong, because we are
in the tracer context, not the traced process.

Add mm_struct pointer argument to __in_uprobe_trampoline and
changing related callers to pass proper mm_struct pointer.

Fixes: ba2bfc97b462 ("uprobes/x86: Add support to optimize uprobes")
Reported-by: syzbot+61ce80689253f42e6d80@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: syzbot+61ce80689253f42e6d80@syzkaller.appspotmail.com
Link: https://patch.msgid.link/20260701111337.53943-2-jolsa@kernel.org

selftests/x86: Add shadow stack uprobe CALL test

Add coverage for entry uprobes installed on CALL instructions while user
shadow stack is enabled. The test puts an entry uprobe on a helper whose
first instruction is a relative CALL, then verifies that the call/return
sequence completes without SIGSEGV.

This catches regressions where x86 uprobe CALL emulation updates the
regular user stack but leaves the CET shadow stack stale.

Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/b957039191118c5eba97d01d80c494b859f115a6.1782777969.git.dwindsor@gmail.com

x86/uprobes: Keep shadow stack in sync for emulated CALLs

Uprobe CALL emulation updates the normal user stack, but not the CET user
shadow stack. The subsequent RET then sees a stale shadow stack entry and
raises #CP.

Update the relative CALL emulation and XOL CALL fixup paths to keep the
shadow stack in sync.

Fixes: 488af8ea7131 ("x86/shstk: Wire in shadow stack interface")
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Link: https://patch.msgid.link/8b5b1c7407b98f31664ad7b6a6faf20d2d4a6cad.1782777969.git.dwindsor@gmail.com

perf/core: Detach event groups during remove_on_exec

perf_event_remove_on_exec() removes events by calling
perf_event_exit_event(). For top-level events, this removes the event from
the context with DETACH_EXIT only.

This can leave inconsistent group state when a removed event is a group
leader and the group contains siblings without remove_on_exec. If the group
was active, the surviving siblings can remain active and attached to the
removed leader's sibling list, but are no longer represented by a valid
group leader on the PMU context active lists.

A later close of the removed leader uses DETACH_GROUP and can promote the
still-active siblings from this stale group state. The next schedule-in can
then add an already-linked active_list entry again, corrupting the PMU
context active list.

With DEBUG_LIST enabled, this is caught as a list_add double-add in
merge_sched_in().

Fix this by detaching group relationships when remove_on_exec removes an
event. This preserves the existing task-exit and revoke behavior, while
ensuring surviving siblings are ungrouped before the removed event leaves
the context.

Fixes: 2e498d0a74e5 ("perf: Add support for event removal on exec")
Signed-off-by: Taeyang Lee <0wn@theori.io>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/ai65GgZcC0LAlWLG@Taeyangs-MacBook-Pro.local

RDMA/cma: Fix hardware address comparison length in netevent callback

The cited commit hardcoded the hardware address comparison len to ETH_ALEN.

This breaks IPoIB, which uses 20-byte addresses. By truncating the
memcmp, the CMA may incorrectly assume the target address is
unchanged and fails to abort the stalled connection.

Fix this by replacing ETH_ALEN with the dynamic neigh->dev->addr_len
to correctly evaluate the full address regardless of the link layer.

Fixes: 925d046e7e52 ("RDMA/core: Add a netevent notifier to cma")
Signed-off-by: Or Gerlitz <ogerlitz@ddn.com>
Link: https://patch.msgid.link/20260617-fix-cma-ipoib-v1-1-03f869344304@ddn.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>

selftests/arm64: fix spelling errors in comments

Fix two spelling mistakes in arm64 selftest comments:
- "whcih" -> "which" (arm64/gcs/libc-gcs.c)
- "resutls" -> "results" (arm64/pauth/pac.c)

Signed-off-by: Wang Yan <wangyan01@kylinos.cn>
Signed-off-by: Will Deacon <will@kernel.org>

arm64/sysreg: Fix BWE field encoding in ID_AA64DFR2_EL1

Commit 93d7356e4b30 ("arm64: sysreg: Describe ID_AA64DFR2_EL1 fields")
encodes the FEAT_BWE2 value of the BWE field as '0b0002'. Binary
literals only accept the digits 0 and 1, so the intended value is 2,
i.e. 0b0010.

The macro generated by gen-sysreg.awk currently expands to
#define ID_AA64DFR2_EL1_BWE_FEAT_BWE2 UL(0b0002)
is not legal C and would fail to compile if any in-tree code referenced
it. At present no caller uses this enum value, so the kernel still
builds cleanly, but the bug is latent.

Fix the typo by using the correct binary literal 0b0010.

Cc: Bin Guo <guobin@linux.alibaba.com>
Fixes: 93d7356e4b30 ("arm64: sysreg: Describe ID_AA64DFR2_EL1 fields")
Signed-off-by: Jia He <justin.he@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Oliver Upton <oupton@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>

drm/xe/oa: Fix offset alignment for MERT WHITELIST_OA_MERT_MMIO_TRG

'head' argument for WHITELIST_OA_MERT_MMIO_TRG was previously wrong (not
multiple of 16). Fix this.

Fixes: ec02e49f21bc ("drm/xe/rtp: Whitelist OAMERT MMIO trigger registers")
Cc: stable@vger.kernel.org
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20260629172634.1100983-1-ashutosh.dixit@intel.com
(cherry picked from commit f6c23e4589bdc69a5d2f79aed5c5bddd5d406cbe)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/pt: prevent invalid cursor access for purged BOs

During a page table walk for binding, xe_pt_stage_bind() explicitly
skips initializing the xe_res_cursor for purged BOs, treating them
similarly to NULL VMAs by only setting the cursor size.

However, xe_pt_hugepte_possible() and xe_pt_scan_64K() did not check
if the BO was purged before attempting to walk the cursor using
xe_res_dma() and xe_res_next(). Because the cursor was left
uninitialized for purged BOs, this falls through and triggers
warnings like:

WARNING: drivers/gpu/drm/xe/xe_res_cursor.h:274 at xe_res_next

Fix this by explicitly checking if the BO is purged in both
xe_pt_hugepte_possible() and xe_pt_scan_64K(), returning early just
as we do for NULL VMAs, avoiding the invalid cursor accesses entirely.

As a precaution, also zero-initialize the cursor in xe_pt_stage_bind()
to ensure we don't pass garbage data into the page table walkers
if we ever hit a similar edge case in the future.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/8418
Fixes: ad9843aac91a ("drm/xe/madvise: Implement purgeable buffer object support")
Assisted-by: Copilot:gemini-3.1-pro-preview
Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Arvind Yadav <arvind.yadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Link: https://patch.msgid.link/20260625152054.450125-8-matthew.auld@intel.com
(cherry picked from commit 4c7b9c6ece32440e5a435a92076d049450cd2d2e)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: fix NPD in bo_meminfo()

When a buffer object is purged, its ttm.resource is set to NULL via the
TTM pipeline gutting flow. However, the BO remains in the client's
object list until userspace explicitly closes the GEM handle. If memory
stats are queried during this time, accessing bo->ttm.resource->mem_type
will result in a NULL pointer dereference.

Fix this by safely skipping purged BOs in bo_meminfo, as they no longer
consume any memory.

User is getting NPD on device resume, and possible theory is that in
bo_move(), if we need to evict something to SYSTEM to save the CCS state,
but the BO is marked as dontneed, this won't trigger a move but will
nuke the pages, leaving us with a NULL bo resource. And the meminfo()
doesn't look ready to handle a NULL resource.

v2 (Sashiko):
- There could potentially be other cases where we might end up with a
NULL resource, so make this a general NULL check for now.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/8419
Fixes: ad9843aac91a ("drm/xe/madvise: Implement purgeable buffer object support")
Assisted-by: Copilot:gemini-3.1-pro-preview
Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Arvind Yadav <arvind.yadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Link: https://patch.msgid.link/20260625152054.450125-6-matthew.auld@intel.com
(cherry picked from commit c9a8e7daa0afe3161111e27fd92176e608c7f186)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/pf: Don't attempt to process FAST_REQ or EVENT relays

Currently defined VF/PF relay actions use regular REQUEST messages
only and the PF shouldn't attempt to handle FAST_REQUEST nor EVENT
messages as this would result in breaking the VFPF ABI protocol
and also might trigger an assert on the PF side.

Fixes: 98e62805921c ("drm/xe/pf: Add SR-IOV GuC Relay PF services")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patch.msgid.link/20260527183735.22616-1-michal.wajdeczko@intel.com
(cherry picked from commit 1714d360fc5ae2e0886a69e979095d9c7ff3568a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/hw_engine: Fix double-free of managed BO in error path

The error path in hw_engine_init() explicitly frees a BO allocated
with xe_managed_bo_create_pin_map() via xe_bo_unpin_map_no_vm().
Since the managed BO already has a devm cleanup action registered,
this causes a double-free when devm unwinds during probe failure.

Remove the explicit free and let devm handle it, consistent with
all other xe_managed_bo_create_pin_map() callers.

Fixes: 0e1a47fcabc8 ("drm/xe: Add a helper for DRM device-lifetime BO create")
Assisted-by: Claude:claude-opus-4.6
Reviewed-by: Zongyao Bai <zongyao.bai@intel.com>
Link: https://patch.msgid.link/20260626210631.3887291-1-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit e459a3bdeb117be496d7f229e2ea1f6c9fe4080b)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/userptr: Drop bogus static from finish in force_invalidate

The local "finish" pointer in xe_vma_userptr_force_invalidate() is
unconditionally written before each read, so the static storage class
serves no purpose. Worse, it makes the variable a process-wide shared
slot: the function's per-VM asserts do not exclude concurrent callers
on different VMs, so two such callers can race on the slot and take
the wrong if (finish) branch.

The function is gated by CONFIG_DRM_XE_USERPTR_INVAL_INJECT
(developer/test option, default n), so production builds are
unaffected.

Drop the static.

Fixes: 18c4e536959e ("drm/xe/userptr: Convert invalidation to two-pass MMU notifier")
Assisted-by: Claude:claude-opus-4.7
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Zongyao Bai <zongyao.bai@intel.com>
Link: https://patch.msgid.link/20260625224452.3243231-1-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit ed382e3b07fae51a09d7290485bff0592f6b168b)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/userptr: Hold notifier_lock for write on inject test path

When CONFIG_DRM_XE_USERPTR_INVAL_INJECT=y, xe_pt_svm_userptr_pre_commit()
runs vma_check_userptr() with the svm notifier_lock taken for read. The
test injection causes vma_check_userptr() to call
xe_vma_userptr_force_invalidate(), which feeds into
xe_vma_userptr_do_inval() with drm_gpusvm_ctx.in_notifier=true. That
flag tells drm_gpusvm_unmap_pages() the caller already holds
notifier_lock for write and only asserts the mode. Because the caller
actually holds it for read, the assertion fires:

  WARNING: drivers/gpu/drm/drm_gpusvm.c:1669 at \
           drm_gpusvm_unmap_pages+0xd4/0x130 [drm_gpusvm_helper]
  Call Trace:
   xe_vma_userptr_do_inval+0x40d/0xfd0 [xe]
   xe_vma_userptr_invalidate_pass1+0x3e6/0x8d0 [xe]
   xe_vma_userptr_force_invalidate+0xde/0x290 [xe]
   vma_check_userptr.constprop.0+0x1c6/0x220 [xe]
   xe_pt_svm_userptr_pre_commit+0x6a3/0xc60 [xe]
   ...
   xe_vm_bind_ioctl+0x3a0a/0x4480 [xe]

Acquire notifier_lock for write in pre-commit when the inject Kconfig
is enabled, via new helpers xe_pt_svm_userptr_notifier_lock()/_unlock().
Rename xe_svm_assert_held_read() to
xe_svm_assert_held_read_or_inject_write() so it asserts the correct
mode under each build configuration. Production builds
(CONFIG_DRM_XE_USERPTR_INVAL_INJECT=n) keep the existing read-mode
behavior bit-for-bit.

Fixes: 9e9787414882 ("drm/xe/userptr: replace xe_hmm with gpusvm")
Assisted-by: Claude:claude-opus-4.7
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Zongyao Bai <zongyao.bai@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260625215615.3016892-1-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit 80ccbd97ffee8ad2e73167d826fe7be548364365)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/display: skip FORCE_WC and vm_bound check for external dma-bufs

Currently, xe_display_bo_framebuffer_init() unconditionally attempts to
apply XE_BO_FLAG_FORCE_WC to the buffer and rejects the FB creation with
-EINVAL if the BO is already VM_BINDed.

However, for imported dma-bufs (ttm_bo_type_sg), this check doesn't seem
to make much sense since CPU caching policy is entirely controlled by
the exporter. Plus there is no place to set this flag, in the first
place. Also this is not rejected if not yet vm_binded, but that seems
arbitrary since setting or not setting FORCE_WC should a noop either
way, at this stage, and whether it is currently VM_BINDed makes no
difference.

Currently if we run an app and offload rendering to an external dGPU,
like NV or another xe device, the dma-buf passed back to the compositor
(igpu) will be an actual external import from xe pov, and it will be
missing FORCE_WC, and if the compositor side did a VM_BIND before
turning into it into an fb the whole thing gets rejected.

So it looks like we either need to reject outright, no matter what, or
this usecase is valid and we need to loosen the restriction for sg
buffers. Proposing here to loosen the restriction.

Assisted-by: Gemini:gemini-3.1-pro-preview
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7919
Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://patch.msgid.link/20260612170501.550816-2-matthew.auld@intel.com
(cherry picked from commit 3e493f88c84088ccd7b53cdd23ac5c875c9a60dd)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Return error on non-migratable faults requiring devmem

Non-migratable faults that require devmem incorrectly jump to the 'out'
label, which squashes the error code intended to be returned to the
upper layers. Fix this by returning -EACCES instead.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Fixes: 4208fac3dce5 ("drm/xe: Add more SVM GT stats")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20260617135101.1245574-1-matthew.brost@intel.com
(cherry picked from commit c4508edb2c723de93717272488ea65b165637eac)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/rtp: Ensure locking/ref counting for OA whitelists

Since multiple OA streams might be open in parallel on a gt, ensure that
proper locking is in place. Also ensure that OA registers are whitelisted
when the first OA stream is open and de-whitelisted after the last OA
stream is closed.

Fixes: 828a8eaf37c3 ("drm/xe/oa: Add MMIO trigger support")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20260615224227.34880-10-ashutosh.dixit@intel.com
(cherry picked from commit 645f1a2589bd4782e25490e5ecc05b7043c36cbf)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/oa: (De-)whitelist OA registers on OA stream open/release

Whitelist OA registers on stream open and de-whitelist on stream
close/release. Whitelisting is only done when 'stream->sample' is
true. 'stream->sample' is only true when (a) xe_observation_paranoid is set
to false by system admin, or (b) the process is perfmon_capable(). This
therefore enforces the OA register whitelisting security requirements.

Fixes: 828a8eaf37c3 ("drm/xe/oa: Add MMIO trigger support")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20260615224227.34880-9-ashutosh.dixit@intel.com
(cherry picked from commit f8e6874f46f19a6a2a0f24a81689f90641bb402a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/rtp: (De-)whitelist OA registers for all hwe's for a gt

Whitelist or de-whitelist OA registers for all hwe's on the gt on which the
OA stream is opened. This simplifies the case where an oa unit has 0
attached hwe's (but which monitors OA events on the associated GT).

Fixes: 828a8eaf37c3 ("drm/xe/oa: Add MMIO trigger support")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20260615224227.34880-8-ashutosh.dixit@intel.com
(cherry picked from commit 6f73bf8fffa728aa5d5ee143ba318fa0744113a2)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/rtp: Toggle 'deny' bit to (de-)whitelist OA regs

Whitelist or de-whitelist OA registers by setting or resetting the 'deny'
bit in OA nonpriv registers and writing new register values to HW.

Fixes: 828a8eaf37c3 ("drm/xe/oa: Add MMIO trigger support")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20260615224227.34880-7-ashutosh.dixit@intel.com
(cherry picked from commit aeaa7d2bb017272ab9e18759fe00bf758cd3299f)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/rtp: Save OA nonpriv registers to register save/restore lists

Now we can save OA whitelisting nonpriv registers to register save/restore
lists. OA nonpriv registers are saved to both hwe->oa_sr as well as
hwe->reg_sr.

During probe, resume and gt-reset flows KMD will apply hwe->reg_sr,
ensuring OA registers are de-whitelisted after these events. For
engine-reset, hwe->reg_sr is registered with GuC and GuC will apply these
registers, ensuring OA registers are de-whitelisted after engine resets.

hwe->oa_sr is used for whitelisting or de-whitelisting OA registers during
OA operation, by toggling the 'deny' bit on oa stream open/close.

Fixes: 828a8eaf37c3 ("drm/xe/oa: Add MMIO trigger support")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20260615224227.34880-6-ashutosh.dixit@intel.com
(cherry picked from commit 3a3c3e56db2923daaf1a5353cd6463a4cdaf4ffa)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/rtp: Generalize whitelist_apply_to_hwe

Generalize whitelist_apply_to_hwe to construct both non-OA and OA
whitelist nonpriv registers.

Fixes: 828a8eaf37c3 ("drm/xe/oa: Add MMIO trigger support")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20260615224227.34880-5-ashutosh.dixit@intel.com
(cherry picked from commit c3ff77d7235ccef7a0883c2fd981f70ef3aafd21)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/rtp: Keep track of non-OA nonpriv slots

In order to dynamically whitelist/dewhitelist OA registers on OA stream
open/close, we need to keep track of nonpriv slots occupied by non-OA
register whitelists.

Fixes: 828a8eaf37c3 ("drm/xe/oa: Add MMIO trigger support")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20260615224227.34880-4-ashutosh.dixit@intel.com
(cherry picked from commit 15739920b71ef3c56868973b4e7e3164a793d09d)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/rtp: Maintain OA whitelists separately

OA registers are dynamically whitelisted (and again dewhitelisted) on OA
stream open/close. Maintaining OA whitelists separately from non-OA
register whitlists simplifies this management of OA register
whitelisting/dewhitelisting.

Fixes: 828a8eaf37c3 ("drm/xe/oa: Add MMIO trigger support")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20260615224227.34880-3-ashutosh.dixit@intel.com
(cherry picked from commit c478244a9e2d14b3f1f92e8bd293919e554622a5)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/rtp: Fix build error with clang < 21 and non-const initializers

Clang < 21 treats const-qualified compound literals at function scope as
having static storage duration, which requires all initializer elements
to be compile-time constants.  When xe_hw_engine.c initializes a local
struct xe_rtp_table_sr using XE_RTP_TABLE_SR(), the compound literals in
XE_RTP_TABLE_SR end up containing runtime values (e.g. blit_cctl_val
derived from gt->mocs.uc_index), triggering:

  xe_hw_engine.c:361: error: initializer element is not a compile-time constant
  xe_hw_engine.c:416: error: initializer element is not a compile-time constant

ARRAY_SIZE() cannot be used as a replacement because it expands through
__must_be_array() -> __BUILD_BUG_ON_ZERO_MSG() -> _Static_assert inside
sizeof(struct{}), which clang < 21 also rejects in the same context.

Replace ARRAY_SIZE() with an open-coded sizeof(arr)/sizeof(elem) in
XE_RTP_TABLE_SR and XE_RTP_TABLE to avoid both issues.

Fixes: e23fafb8594e ("drm/xe/rtp: Add struct types for RTP tables")
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Violet Monti <violet.monti@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: intel-xe@lists.freedesktop.org
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/intel-xe/bfb0dee8-b243-47ba-a89d-71472b0d51c5@sirena.org.uk/
Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20260605093305.110598-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit a57011eff45e7265dc42a7adad68b84605d8f828)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/imagination: Fix user array stride in pvr_set_uobj_array()

pvr_set_uobj_array() copies an array of kernel objects to a userspace
array whose element size is described by out->stride. When out->stride
is different from the kernel object size, the slow path advances the
userspace pointer by the kernel object size and the kernel pointer by the
userspace stride.

This reverses the intended layout. For larger userspace strides, later
copies read from the wrong kernel addresses. For smaller userspace
strides, later copies are written at the wrong userspace offsets. The
padding clear is also done only for the first element instead of the
padding area for each element.

Advance the userspace pointer by out->stride and the kernel pointer by
obj_size, and clear per-element padding while the current userspace
pointer is still available.

Fixes: f99f5f3ea7ef ("drm/imagination: Add GPU ID parsing and firmware loading")
Cc: stable@vger.kernel.org # v6.8+
Reviewed-by: Alessio Belle <alessio.belle@imgtec.com>
Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com>
Link: https://patch.msgid.link/6a456012.eb165e5c.113c2a.b71d@mx.google.com
Signed-off-by: Alessio Belle <alessio.belle@imgtec.com>

drm/imagination: Fix returned size for DRM_IOCTL_PVR_DEV_QUERY

For a few subtypes of DRM_IOCTL_PVR_DEV_QUERY, driver was overriding
the returned size unconditionally. This would have resulted in
increase of reported size beyond the amount of data returned to
userspace when args->size < size of query structure.

Updated behaviour matches with the description of
drm_pvr_ioctl_dev_query_args.size and written byte length.
None of the structures of DRM_IOCTL_PVR_DEV_QUERY changed after addition,
so change will not break any compatibility with earlier version.

Fixes: f99f5f3ea7ef ("drm/imagination: Add GPU ID parsing and firmware loading")
Fixes: ff5f643de0bf ("drm/imagination: Add GEM and VM related code")
Signed-off-by: Brajesh Gupta <brajesh.gupta@imgtec.com>
Reviewed-by: Alessio Belle <alessio.belle@imgtec.com>
Link: https://patch.msgid.link/20260701-b4-b4-query-v2-1-a1b491387875@imgtec.com
Signed-off-by: Alessio Belle <alessio.belle@imgtec.com>

drm/imagination: Fix double call to drm_sched_entity_fini()

Call sequence of double call:
pvr_context_destroy
pvr_context_kill_queues
pvr_queue_kill
drm_sched_entity_destroy
drm_sched_entity_fini // here
pvr_context_put
kref_put(..., pvr_context_release)
pvr_context_destroy_queues
pvr_queue_destroy
drm_sched_entity_fini // here

Call to drm_sched_entity_destroy() from pvr_context_kill_queues() calls
drm_sched_entity_flush() + drm_sched_entity_fini().
drm_sched_entity_flush() ensures all pending jobs are completed and
drm_sched_entity_fini() ensures no further submission is allowed as
per expectation from pvr_context_kill_queues(). Double call to
drm_sched_entity_fini() is misuse of the API so keep call only in
pvr_context_create() failure path.

Stack trace for issue with addition of refcounting for DRM entity
stats in commit fd177135f0e6 ("drm/sched: Account entity GPU time"):

[  789.490527] ------------[ cut here ]------------
[  789.490559] refcount_t: underflow; use-after-free.
[  789.490657] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0xf4/0x144, CPU#0: kworker/u16:1/440
[  789.490695] Modules linked in: powervr drm_gpuvm drm_exec gpu_sched drm_shmem_helper xhci_plat_hcd xhci_hcd dwc3 usbcore usb_common snd_soc_simple_card snd_soc_simple_card_utils sa2ul sha512 sha256 dwc3_am62 sha1 authenc rti_wdt libsha512 at24 sch_fq_codel fuse dm_mod ipv6
[  789.490798] CPU: 0 UID: 0 PID: 440 Comm: kworker/u16:1 Not tainted 7.0.0-rc7-02049-g5e2c0700091b #22 PREEMPT
[  789.490809] Hardware name: Texas Instruments AM625 SK (DT)
[  789.490815] Workqueue: powervr-sched pvr_queue_fence_release_work [powervr]
[  789.490868] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  789.490876] pc : refcount_warn_saturate+0xf4/0x144
[  789.490884] lr : refcount_warn_saturate+0xf4/0x144
[  789.490892] sp : ffff8000822cbcc0
[  789.490895] x29: ffff8000822cbcc0 x28: 0000000000000000 x27: 0000000000000000
[  789.490909] x26: 0000000000000000 x25: ffff800081b1e338 x24: ffff000004541405
[  789.490922] x23: ffff000004bea950 x22: ffff00000042e400 x21: ffff000007123e30
[  789.490935] x20: ffff000007123000 x19: ffff000007a80d50 x18: fffffffffffe7768
[  789.490948] x17: 74736574202c6e6f x16: 697461746e656d65 x15: ffff800081b269f0
[  789.490962] x14: 0000000000000030 x13: ffff800081b26a70 x12: 0000000000000211
[  789.490975] x11: 00000000000000c0 x10: 0000000000000b50 x9 : ffff8000822cbb30
[  789.490988] x8 : ffff0000014e7bb0 x7 : ffff00007725e780 x6 : 0000000372a05f49
[  789.491001] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000010
[  789.491013] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000014e7000
[  789.491027] Call trace:
[  789.491032]  refcount_warn_saturate+0xf4/0x144 (P)
[  789.491043]  drm_sched_entity_fini+0x164/0x18c [gpu_sched]
[  789.491081]  pvr_queue_destroy+0x64/0x134 [powervr]
[  789.491110]  pvr_context_destroy_queues+0x34/0x64 [powervr]
[  789.491138]  pvr_context_release+0x70/0xac [powervr]
[  789.491166]  pvr_context_put.part.0+0x5c/0x7c [powervr]
[  789.491193]  pvr_context_put+0x14/0x24 [powervr]
[  789.491221]  pvr_queue_fence_release_work+0x20/0x38 [powervr]
[  789.491249]  process_one_work+0x160/0x4c4
[  789.491264]  worker_thread+0x188/0x310
[  789.491276]  kthread+0x130/0x13c
[  789.491287]  ret_from_fork+0x10/0x20
[  789.491300] ---[ end trace 0000000000000000 ]---

Fixes: eaf01ee5ba28 ("drm/imagination: Implement job submission and scheduling")
Cc: stable@vger.kernel.org
Signed-off-by: Brajesh Gupta <brajesh.gupta@imgtec.com>
Reviewed-by: Alessio Belle <alessio.belle@imgtec.com>
Link: https://patch.msgid.link/20260630-b4-sched_fix-v7-1-71aa39c62627@imgtec.com
Signed-off-by: Alessio Belle <alessio.belle@imgtec.com>

Merge tag 'batadv-net-pullrequest-20260630' of https://git.open-mesh.org/batadv

Simon Wunderlich says:

====================
Here are some batman-adv bugfix, all by Sven Eckelmann:

- fix pointers after potential skb reallocs (5 patches)

- dat: ensure accessible eth_hdr proto field

* tag 'batadv-net-pullrequest-20260630' of https://git.open-mesh.org/batadv:
  batman-adv: dat: ensure accessible eth_hdr proto field
  batman-adv: bla: reacquire gw address after skb realloc
  batman-adv: dat: acquire ARP hw source only after skb realloc
  batman-adv: gw: acquire ethernet header only after skb realloc
  batman-adv: access unicast_ttvn skb->data only after skb realloc
  batman-adv: retrieve ethhdr after potential skb realloc on RX
====================

Link: https://patch.msgid.link/20260630134430.85786-1-sw@simonwunderlich.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

MAINTAINERS: Add a mailing list entry to MFD

This is to be included by all contributors and will be leaned on for
Sashiko's "reply to author" support.

Signed-off-by: Lee Jones <lee@kernel.org>

net/mlx5: HWS, fix matcher leak on resize target setup failure

hws_bwc_matcher_move() allocates a replacement matcher before setting it
as the resize target. If mlx5hws_matcher_resize_set_target() fails, the
replacement matcher is not attached anywhere and is leaked.

Fix the leak by destroying the replacement matcher before returning from
the resize-target failure path.

The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still
present in v7.1.1.

An x86_64 allyesconfig build showed no new warnings. As we do not have a
mlx5 HWS-capable device to test with, no runtime testing was able to be
performed.

Fixes: 2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling")
Cc: stable@vger.kernel.org
Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260629064049.3852759-1-dawei.feng@seu.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

xfrm: reject optional IPTFS templates in outbound policies

syzbot reported a stack-out-of-bounds read in xfrm_state_find()
which flows from xfrm_tmpl_resolve_one().

Commit 3d776e31c841 ("xfrm: Reject optional tunnel/BEET mode
templates in outbound policies") disallowed optional tunnel and
BEET in outbound policies to prevent this. Later when IPTFS
added, it was not covered by that fix and can still trigger
the out-of-bounds read;

Extend the check to disallow optional IPTFS in outbound policies
as well. IPTFS should be identical to tunnel mode.
IN and FWD policies are not affected: xfrm_tmpl_resolve_one()
is only reachable via the outbound path.

Reproducer, before:

ip link add dummy0 type dummy
ip link set dummy0 up
ip addr add 10.1.1.1/24 dev dummy0
ip xfrm policy add src 10.1.1.1/32 dst 10.1.1.2/32 dir out tmpl
  src fc00::dead:1 dst fc00::dead:2 proto esp reqid 1 mode iptfs
  level use tmpl src fc00::dead:1 dst fc00::dead:2 proto esp reqid
  2 mode transport
ping -W 1 -c 1 10.1.1.2
PING 10.1.1.2 (10.1.1.2) 56(84) bytes of data.

[   64.168420] ==================================================================
[   64.169977] BUG: KASAN: stack-out-of-bounds in __xfrm6_addr_hash+0x11e/0x170
[   64.169977] Read of size 4 at addr ffff88800e1ffd20 by task ping/2844

[   64.169977] CPU: 2 UID: 0 PID: 2844 Comm: ping Not tainted 7.1.0-rc7-00180-geb23b588430a #98 PREEMPT(full)
[   64.169977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   64.169977] Call Trace:
[   64.169977]  <TASK>
[   64.169977]  dump_stack_lvl+0x47/0x70
[   64.169977]  ? __xfrm6_addr_hash+0x11e/0x170
[   64.169977]  print_report+0x152/0x4b0
[   64.169977]  ? ksys_mmap_pgoff+0x6d/0xa0
[   64.169977]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   64.169977]  ? rcu_read_unlock_sched+0xa/0x20
[   64.169977]  ? __virt_addr_valid+0x21b/0x230
[   64.169977]  ? __xfrm6_addr_hash+0x11e/0x170
[   64.169977]  kasan_report+0xa8/0xd0
[   64.169977]  ? __xfrm6_addr_hash+0x11e/0x170
[   64.169977]  __xfrm6_addr_hash+0x11e/0x170
[   64.169977]  __xfrm_dst_hash+0x24/0xc0
[   64.169977]  xfrm_state_find+0xa2d/0x2f90
[   64.169977]  ? __pfx_xfrm_state_find+0x10/0x10
[   64.169977]  ? __pfx_ftrace_graph_ret_addr+0x10/0x10
[   64.169977]  ? __pfx_ftrace_graph_ret_addr+0x10/0x10
[   64.169977]  xfrm_tmpl_resolve_one+0x210/0x570
[   64.169977]  ? __pfx_xfrm_tmpl_resolve_one+0x10/0x10
[   64.169977]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[   64.169977]  ? kernel_text_address+0x5b/0x80
[   64.169977]  ? __kernel_text_address+0xe/0x30
[   64.169977]  ? unwind_get_return_address+0x5e/0x90
[   64.169977]  ? arch_stack_walk+0x8c/0xe0
[   64.169977]  xfrm_tmpl_resolve+0x130/0x200
[   64.169977]  ? __pfx_xfrm_tmpl_resolve+0x10/0x10
[   64.169977]  ? __pfx_xfrm_policy_inexact_lookup_rcu+0x10/0x10
[   64.169977]  ? __refcount_add_not_zero.constprop.0+0xb2/0x110
[   64.169977]  ? __pfx___refcount_add_not_zero.constprop.0+0x10/0x10
[   64.169977]  xfrm_resolve_and_create_bundle+0xd5/0x310
[   64.169977]  ? __pfx_xfrm_resolve_and_create_bundle+0x10/0x10
[   64.169977]  ? __pfx_xfrm_policy_lookup_bytype+0x10/0x10
[   64.169977]  ? __pfx_xfrm_policy_lookup_bytype+0x10/0x10
[   64.169977]  xfrm_lookup_with_ifid+0x3d8/0xb80
[   64.169977]  ? __pfx_xfrm_lookup_with_ifid+0x10/0x10
[   64.169977]  ? ip_route_output_key_hash+0xc6/0x110
[   64.169977]  ? kasan_save_track+0x10/0x30
[   64.169977]  xfrm_lookup_route+0x18/0xe0
[   64.169977]  ip4_datagram_release_cb+0x4c9/0x530
[   64.169977]  ? __pfx_ip4_datagram_release_cb+0x10/0x10
[   64.169977]  ? do_raw_spin_lock+0x71/0xc0
[   64.169977]  ? __pfx_do_raw_spin_lock+0x10/0x10
[   64.169977]  release_sock+0xb0/0x170
[   64.169977]  udp_connect+0x43/0x50
[   64.169977]  __sys_connect+0xa6/0x100
[   64.169977]  ? alloc_fd+0x2e9/0x300
[   64.169977]  ? __pfx___sys_connect+0x10/0x10
[   64.169977]  ? preempt_latency_start+0x1f/0x70
[   64.169977]  ? fd_install+0x7e/0x150
[   64.169977]  ? rcu_read_unlock_sched+0xa/0x20
[   64.169977]  ? __sys_socket+0xdf/0x130
[   64.169977]  ? __pfx___sys_socket+0x10/0x10
[   64.169977]  ? vma_refcount_put+0x43/0xa0
[   64.169977]  __x64_sys_connect+0x7e/0x90
[   64.169977]  do_syscall_64+0x11b/0x2b0
[   64.169977]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   64.169977] RIP: 0033:0x7f4851ecb570
[   64.169977] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d f9 ca 0d 00 00 74 17 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 54
[   64.169977] RSP: 002b:00007ffc830e3498 EFLAGS: 00000202 ORIG_RAX: 000000000000002a
[   64.169977] RAX: ffffffffffffffda RBX: 00007ffc830e34d0 RCX: 00007f4851ecb570
[   64.169977] RDX: 0000000000000010 RSI: 00007ffc830e34d0 RDI: 0000000000000005
[   64.169977] RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
[   64.169977] R10: 0000000000000006 R11: 0000000000000202 R12: 0000000000000005
[   64.169977] R13: 0000000000000000 R14: 00005619a863f340 R15: 0000000000000000
[   64.169977]  </TASK>

[   64.169977] The buggy address belongs to stack of task ping/2844
[   64.169977]  and is located at offset 88 in frame:
[   64.169977]  ip4_datagram_release_cb+0x0/0x530

[   64.169977] This frame has 1 object:
[   64.169977]  [32, 88) 'fl4'

[   64.169977] The buggy address belongs to the physical page:
[   64.169977] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xe1ff
[   64.169977] flags: 0x4000000000000000(zone=1)
[   64.169977] raw: 4000000000000000 0000000000000000 ffffea0000387fc8 0000000000000000
[   64.169977] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   64.169977] page dumped because: kasan: bad access detected

[   64.169977] Memory state around the buggy address:
[   64.169977]  ffff88800e1ffc00: f2 f2 00 00 f3 f3 00 00 00 00 00 00 00 00 00 00
[   64.169977]  ffff88800e1ffc80: 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00
[   64.169977] >ffff88800e1ffd00: 00 00 00 00 f3 f3 f3 f3 f3 00 00 00 00 00 00 00
[   64.169977]                                ^
[   64.169977]  ffff88800e1ffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
[   64.169977]  ffff88800e1ffe00: f1 f1 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   64.169977] ==================================================================
[   64.245153] Disabling lock debugging due to kernel taint

After the fix:

ip xfrm policy add src 10.1.1.1/32 dst 10.1.1.2/32 dir out tmpl \
src fc00::dead:1 dst fc00::dead:2 proto esp reqid 1 mode iptfs \
level use tmpl src fc00::dead:1 dst fc00::dead:2 proto esp reqid 2 \
mode transport

Error: Mode in optional template not allowed in outbound policy.

Fixes: d1716d5a44c3 ("xfrm: add generic iptfs defines and functionality")
Reported-by: syzbot+0ac4d84afe1066a1f3e9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6a3ceb94.43b4ff68.30a095.0004.GAE@google.com/T/
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: cache the offload ifindex for netlink dumps

copy_to_user_state_extra() only holds a reference to the outer xfrm_state.
That does not pin x->xso.dev. NETDEV_DOWN and NETDEV_UNREGISTER can race
through xfrm_dev_state_flush(), xfrm_state_delete(), and
xfrm_dev_state_free(), which clears xso->dev and drops the netdev
reference before the GETSA dump reaches xso_to_xuo() and reads
xso->dev->ifindex.

The buggy scenario involves two paths, with each column showing the order
within that path:

XFRM_MSG_GETSA dump path:           NETDEV teardown path:
1. xfrm_get_sa() gets xfrm_state    1. xfrm_dev_state_flush() finds x
2. copy_to_user_state_extra() sees  2. xfrm_state_delete() removes x
   x->xso.dev                          from the SAD
3. copy_user_offload() calls        3. xfrm_dev_state_free() clears
   xso_to_xuo()                        xso->dev
4. xso->dev->ifindex dereferences   4. netdev_put() drops the device
   a detached net_device               reference

Avoid following the live net_device from the dump paths. Cache the
attached ifindex in xfrm_dev_offload when state or policy offload is bound
to a device, and serialize that snapshot instead. This preserves the
user-visible XFRMA_OFFLOAD_DEV value without depending on the embedded
net_device lifetime.

Validation reproduced this kernel report:
Oops: general protection fault

Call Trace:
<TASK>
copy_to_user_state_extra+0xb8d/0x1370 [xfrm_user]
? __pfx_copy_to_user_state_extra+0x10/0x10 [xfrm_user]
? __asan_memset+0x23/0x50
? srso_alias_return_thunk+0x5/0xfbef5
? __alloc_skb+0x342/0x960
? srso_alias_return_thunk+0x5/0xfbef5
? __asan_memset+0x23/0x50
? srso_alias_return_thunk+0x5/0xfbef5
? __nlmsg_put+0x147/0x1b0
dump_one_state+0x1c7/0x3e0 [xfrm_user]
xfrm_state_netlink+0xcb/0x130 [xfrm_user]
? __pfx_xfrm_state_netlink+0x10/0x10 [xfrm_user]
? srso_alias_return_thunk+0x5/0xfbef5
? xfrm_user_state_lookup.constprop.0+0x230/0x310 [xfrm_user]
xfrm_get_sa+0x102/0x250 [xfrm_user]
? __pfx_xfrm_get_sa+0x10/0x10 [xfrm_user]
xfrm_user_rcv_msg+0x504/0xaa0 [xfrm_user]
? __pfx_xfrm_user_rcv_msg+0x10/0x10 [xfrm_user]
? srso_alias_return_thunk+0x5/0xfbef5
? stack_trace_save+0x8e/0xc0
? __pfx_stack_trace_save+0x10/0x10
netlink_rcv_skb+0x11f/0x350
? __pfx_xfrm_user_rcv_msg+0x10/0x10 [xfrm_user]
? __pfx_netlink_rcv_skb+0x10/0x10
? __pfx_mutex_lock+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
xfrm_netlink_rcv+0x65/0x80 [xfrm_user]
netlink_unicast+0x600/0x870
? __pfx_netlink_unicast+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? __pfx_stack_trace_save+0x10/0x10
netlink_sendmsg+0x75d/0xc10
? __pfx_netlink_sendmsg+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
____sys_sendmsg+0x77a/0x900
? srso_alias_return_thunk+0x5/0xfbef5
? __pfx_____sys_sendmsg+0x10/0x10
? __pfx_copy_msghdr_from_user+0x10/0x10
? release_sock+0x1a/0x1d0
? srso_alias_return_thunk+0x5/0xfbef5
? netlink_insert+0x143/0xec0
___sys_sendmsg+0xff/0x180
? __pfx____sys_sendmsg+0x10/0x10
? _raw_spin_lock_irqsave+0x85/0xe0
? do_getsockname+0xf9/0x170
? srso_alias_return_thunk+0x5/0xfbef5
? fdget+0x53/0x3b0
__sys_sendmsg+0x111/0x1a0
? __pfx___sys_sendmsg+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? __sys_getsockname+0x8c/0x100
do_syscall_64+0x102/0x5a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 07b87f9eea0c ("xfrm: Fix unregister netdevice hang on hardware offload.")
Assisted-by: Codex:gpt-5.5
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

xfrm: fix sk_dst_cache double-free in xfrm_user_policy()

xfrm_user_policy() clears the socket dst cache with __sk_dst_reset(),
i.e. the non-atomic __sk_dst_set(sk, NULL): it reads sk_dst_cache with
rcu_dereference_protected(), stores NULL and dst_release()s the old dst.
That is only safe if no other thread modifies sk_dst_cache concurrently.

For a connected UDP socket that does not hold: the transmit fast path
(udp_sendmsg -> sk_dst_check -> sk_dst_reset) resets the cache locklessly
with an atomic xchg(). A per-socket policy change racing a send can make
both sides observe the same old dst and each dst_release() it, dropping
the socket's single reference twice and freeing the xfrm_dst bundle while
it is still referenced:

  BUG: KASAN: slab-use-after-free in dst_release
  Write of size 4 at addr ffff88801897b6c0 by task exploit/155
  Call Trace:
   ...
   dst_release (... ./include/linux/rcuref.h:109)
   xfrm_user_policy (./include/net/sock.h:2239 ./include/net/sock.h:2256 net/xfrm/xfrm_state.c:3053)
   do_ip_setsockopt (net/ipv4/ip_sockglue.c:1347)
   ip_setsockopt (net/ipv4/ip_sockglue.c:1417)
   do_sock_setsockopt (net/socket.c:2368)
   __sys_setsockopt (net/socket.c:2393)
   __x64_sys_setsockopt (net/socket.c:2396)
   do_syscall_64 (arch/x86/entry/syscall_64.c:94)
   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)

Reachable by an unprivileged user via a user+network namespace.

Use the atomic sk_dst_reset() so the cache is cleared and released with a
single xchg(): whichever side wins releases the dst once, the other sees
NULL and does nothing. Behaviour is otherwise unchanged.

Fixes: 2b06cdf3e688 ("xfrm: Clear sk_dst_cache when applying per-socket policy.")
Fixes: be8f8284cd89 ("net: xfrm: allow clearing socket xfrm policies.")
Reported-by: AutonomousCodeSecurity@microsoft.com
Signed-off-by: Xiang Mei (Microsoft) <xmei5@asu.edu>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

x86/Xen: correct commentary and parameter naming of xen_exchange_memory()

As documented in comments in struct xen_memory_exchange, the input to the
hypercall is a set of MFNs which are to be removed from the domain, plus a
set of PFNs where the newly allocated MFNs are to appear. Present comment
and parameter naming don't correctly reflect that.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <7e0c8795-cc60-4b78-8601-6a999739467a@suse.com>

tools/include: include stdint.h for SIZE_MAX in overflow.h

tools/include/linux/overflow.h uses SIZE_MAX in its size helper functions.

Include stdint.h so tools users that include overflow.h without another
SIZE_MAX provider can build.

Link: https://lore.kernel.org/20260629022124.131894-3-chenyichong@uniontech.com
Signed-off-by: Yichong Chen <chenyichong@uniontech.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

tools/virtio: add missing compat definitions for vhost_net_test

Patch series "tools: Fix tools/virtio test build", v2.

This series fixes build failures hit by:

  make -C tools/virtio test

Patch 1 adds tools/virtio compatibility definitions needed by current
virtio headers when building the tools/virtio tests.  Patch 2 makes
tools/include/linux/overflow.h include stdint.h for SIZE_MAX, which is
used by its size helper functions.

With the series applied, make -C tools/virtio test builds virtio_test,
vringh_test and vhost_net_test successfully.

Tested on x86_64 and arm64 with:

  make -C tools/virtio clean
  make -C tools/virtio test

This patch (of 2):

vhost_net_test builds virtio_ring.c in userspace.

Recent virtio headers pull in helper headers that are not provided by the
tools/virtio compatibility layer, including asm/percpu_types.h,
linux/completion.h, linux/mod_devicetable.h and linux/virtio_features.h.

Add the missing compat definitions and the DMA attribute used by the
current virtio ring code.

Link: https://lore.kernel.org/20260629022124.131894-1-chenyichong@uniontech.com
Link: https://lore.kernel.org/20260629022124.131894-2-chenyichong@uniontech.com
Signed-off-by: Yichong Chen <chenyichong@uniontech.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Cc: chenyichong <chenyichong@uniontech.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: do file ownership checks with the proper mount idmap

Ever since idmapped mounts were introduced, inode ownership checks (for
side-channel protection) in mincore() and madvise(MADV_PAGEOUT) were done
against the nop_mnt_idmap, which completely ignores the file's mount's
idmap. This results in odd edgecases like:

1) mount/bind-mount with an idmap userA:userB:1
2) userB runs an owner_or_capable() check on file that is owned by userA
on-disk/in-memory, but owned by userB after idmap translation
3) owner_or_capable() mysteriously fails as the correct idmap wasn't supplied

In the case of mincore/madvise MADV_PAGEOUT, this is usually benign,
because file_permission(file, MAY_WRITE) will probably succeed, as it uses
the proper idmap internally, but it does not need to be the case on e.g a
0444 file where even the owner itself doesn't have permissions to write to
it.

Since this is clearly not trivial to get right, introduce a
file_owner_or_capable() that can carry the correct semantics, and switch
the various users in mm to it.

The issue was found by manual code inspection & an off-list discussion
with Jan Kara.

Link: https://lore.kernel.org/20260625153853.913949-1-pfalcato@suse.de
Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP")
Signed-off-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner (Amutable) <brauner@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

samples/damon/mtier: fail early if address range parameters are invalid

The comment on top of `struct damon_region` clearly says that

    For any use case, @ar should be non-zero positive size.

which is now verified in damon_verify_new_region() if the kernel is built
with DAMON_DEBUG_SANITY.

The WARN_ONCE() can be triggered if the mtier sample module is enabled
before node{0,1}_{start,end}_addr have been properly initialized, which is
obviously not good.

------------[ cut here ]------------
start 0 >= end 0
WARNING: mm/damon/core.c:217 at damon_new_region+0xf4/0x118, CPU#59: bash/341468
Call trace:
  damon_new_region+0xf4/0x118 (P)
  damon_set_regions+0xfc/0x3c0
  damon_sample_mtier_build_ctx+0xe8/0x3a8
  damon_sample_mtier_start+0x1c/0x90
  damon_sample_mtier_enable_store+0x98/0xb0
  param_attr_store+0xb4/0x128
  module_attr_store+0x2c/0x50
  sysfs_kf_write+0x58/0x90
  kernfs_fop_write_iter+0x16c/0x238
  vfs_write+0x2c0/0x370
  ksys_write+0x74/0x118
  __arm64_sys_write+0x24/0x38
  invoke_syscall+0xa8/0x118
  el0_svc_common.constprop.0+0x48/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x54/0x370
  el0t_64_sync_handler+0xa0/0xe8
  el0t_64_sync+0x1ac/0x1b0
---[ end trace 0000000000000000 ]---

Note that the same issue can happen if detect_node_addresses is true, and
node 0 or 1 is memoryless.  Fix it together by checking the validity of
parameters right before damon_new_region() and fail early if they're
invalid.

Link: https://lore.kernel.org/20260629144432.133962-1-sj@kernel.org
Fixes: 82a08bde3cf7 ("samples/damon: implement a DAMON module for memory tiering")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: SJ Park <sj@kernel.org>
Reviewed-by: SJ Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.16.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: a second pagecache maintainer

As MM is slowly transitioning towards a more distributed maintainership
model, we agreed with Matthew that I will be a co-maintainer in case he is
not available.

Link: https://lore.kernel.org/20260629135927.2586391-2-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Lorenzo Stoakes <ljs@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon: add a kernel-doc comment for damon_ctx->rnd_state

Fix below kernel document build warning:

WARNING: ../include/linux/damon.h:909 struct member 'rnd_state' not described in 'damon_ctx'

Link: https://lore.kernel.org/20260628220808.98931-3-sj@kernel.org
Fixes: 9012c4e647df ("mm/damon: replace damon_rand() with a per-ctx lockless PRNG")
Signed-off-by: SJ Park <sj@kernel.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/4df95955-b255-4e5a-90c4-35db02f3111f@infradead.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon: add a kernel-doc comment for damon_ctx->probes

The two fields of damon_ctx struct dont have their kernel-doc comments.
That causes kernel document builds to warn. Fix those.

This patch (of 2):

Fix below document build warning:

WARNING: ../include/linux/damon.h:909 struct member 'probes' not described in 'damon_ctx'

Link: https://lore.kernel.org/20260628220808.98931-1-sj@kernel.org
Link: https://lore.kernel.org/20260628220808.98931-2-sj@kernel.org
Fixes: 18c777859f28 ("mm/damon/core: embed damon_probe objects in damon_ctx")
Signed-off-by: SJ Park <sj@kernel.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/4df95955-b255-4e5a-90c4-35db02f3111f@infradead.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mailmap: add entries for Radu Rendec

I have used multiple email addresses for my kernel contributions, and some
of them are no longer active. Add all to .mailmap for clarity.

Link: https://lore.kernel.org/20260628150203.4105796-1-radu@rendec.net
Signed-off-by: Radu Rendec <radu@rendec.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/mm: hmm-tests: include linux/mman.h to access MADV_COLLAPSE

The following compilation error occurs with an old version of glibc due to
a recent commit adding MADV_COLLAPSE testing:

[root@localhost mm]# getconf GNU_LIBC_VERSION
glibc 2.34
[root@localhost mm]# make
   CC       hmm-tests
hmm-tests.c: In function 'hmm_migrate_anon_huge_fault':
hmm-tests.c:2355:27: error: 'MADV_COLLAPSE' undeclared (first use in this function); did you mean 'MADV_COLD'?
  2355 |  ret = madvise(map, size, MADV_COLLAPSE);
       |                           ^~~~~~~~~~~~~
       |                           MADV_COLD
hmm-tests.c:2355:27: note: each undeclared identifier is reported only once for each function it appears in
make: *** [../lib.mk:225: /root/code/linux/tools/testing/selftests/mm/hmm-tests] Error 1

Include linux/mman.h (which provides the definition of MADV_COLLAPSE) to
fix the build error.

Link: https://lore.kernel.org/20260628143111.36863-1-zenghui.yu@linux.dev
Fixes: e3d8707358ea ("selftests/mm/hmm-tests: test pagemap reads of PMD device-private entries")
Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/mm: pagemap_ioctl: use the correct page size for transact_test()

There are several places in transact_test() where we use the hardcoded
0x1000 (4k) as page size, which is not always correct for architectures
supporting multiple page sizes.

Switch to use the correct page size. Otherwise ./ksft_pagemap.sh on a
16k-page-size arm64 box fails with

$ ./ksft_pagemap.sh
[...]
# ok 96 mprotect_tests Both pages written after remap and mprotect
# ok 97 mprotect_tests Clear and make the pages written
# Bail out! ioctl failed
# # Planned tests != run tests (117 != 97)
# # Totals: pass:97 fail:0 xfail:0 xpass:0 skip:0 error:0
# [FAIL]
not ok 1 pagemap_ioctl # exit=1
# SUMMARY: PASS=0 SKIP=0 FAIL=1
1..1

Link: https://lore.kernel.org/20260628101118.35861-1-zenghui.yu@linux.dev
Fixes: 46fd75d4a3c9 ("selftests: mm: add pagemap ioctl tests")
Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Cc: Muhammad Usama Anjum <usama.anjum@arm.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Zenghui Yu <zenghui.yu@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fs/proc: fix KPF_KSM reported for all anonymous pages

Reading /proc/kpageflags for any anonymous page returns KPF_KSM set, even
when KSM is not in use. As a result, tools misclassify all anonymous
pages as KSM merged.

In stable_page_flags(), if the page is anonymous, then use (mapping &
FOLIO_MAPPING_KSM) check to identify if the anonymous page is KSM page.
However, FOLIO_MAPPING_KSM is FOLIO_MAPPING_ANON | FOLIO_MAPPING_ANON_KSM,
(mapping & FOLIO_MAPPING_KSM) check returns true for all anonymous pages.

To fix it, use FOLIO_MAPPING_ANON_KSM instead.

Link: https://lore.kernel.org/20260629033122.774318-1-tujinjiang@huawei.com
Link: https://lore.kernel.org/20260626013252.2846774-1-tujinjiang@huawei.com
Fixes: dee3d0bef2b0 ("proc: rewrite stable_page_flags()")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Xu Xin <xu.xin16@zte.com.cn>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Luiz Capitulino <luizcap@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Svetly Todorov <svetly.todorov@memverge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: page_ext: add count limit to page_ext_iter_next to prevent invalid PFN access

The page_ext iteration API does not validate if the PFN still belongs to a
valid section while advancing the iterator.  When dynamically adding
memory in the hotplug path, it can lead to a NULL pointer dereference
during page_ext_lookup at the boundary of the last valid section when
iterator count equals __pgcount.

The for_each_page_ext() macro calls page_ext_iter_next() as its loop
increment.  for_each_page_ext() does a "__page_ext =
page_ext_iter_next(&__iter)" at the end.  This causes page_ext_iter_next()
to increment iter->index past __pgcount and call page_ext_lookup(start_pfn
+ __pgcount).  During memory hotplug (online), the PFN at start_pfn +
__pgcount may belong to a section that has not yet been initialized,
causing page_ext_lookup() to trigger a NULL pointer dereference.

[   14.555124][  T846] Call trace:
[   14.555125][  T846]  lookup_page_ext+0x6c/0x108 (P)
[   14.555127][  T846]  page_ext_lookup+0x30/0x3c
[   14.555129][  T846]  __reset_page_owner+0x11c/0x260
[   14.571201][  T846]  __free_pages_ok+0x5e8/0x8e0
[   14.571204][  T846]  __free_pages_core+0x78/0xf0
[   14.571206][  T846]  generic_online_page+0x14/0x24
[   14.597782][  T846]  online_pages+0x178/0x30c
[   14.597784][  T846]  memory_block_change_state+0x284/0x32c
[   14.597787][  T846]  memory_subsys_online+0x4c/0x64
[   14.597789][  T846]  device_online+0x88/0xb0
[   14.597791][  T846]  online_memory_block+0x30/0x40
[   14.597793][  T846]  walk_memory_blocks+0xac/0xe8
[   14.597794][  T846]  add_memory_resource+0x280/0x298
[   14.656161][  T846]  add_memory+0x60/0x98

Move the iteration boundary enforcement inside the iterator functions, so
callers cannot inadvertently access beyond the requested range.

Link: https://lore.kernel.org/20260623-page_ext-v3-1-a89799a5367c@oss.qualcomm.com
Fixes: 9039b9096ea2 ("mm: page_ext: add an iteration API for page extensions")
Signed-off-by: Ketan Kishore <ketan.kishore@oss.qualcomm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Luiz Capitulino <luizcap@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/ops-common: handle extreme intervals in damon_hot_score()

Fix three issues in damon_hot_score() that comes from wrong handling of
extreme (zero or too high) monitoring intervals user setup.

When the user sets sampling interval zero, damon_max_nr_accesses(), which
is called from damon_hot_score(), causes a divide-by-zero.  Needless to
say, it is a problem.

When the user sets the aggregation interval zero, the function returns
zero.  It is wrong, since the real maximum nr_acceses in the setup should
be one.  Worse yet, it can cause another divide-by-zero from its caller,
damon_hot_score(), since it uses damon_max_nr_accesses() return value as a
denominator.

When the user sets the aggregation interval very high, damon_hot_score()
could return a value out of [0, DAMOS_MAX_SCORE] range.  Since the return
value is used as an index to the regions_score_histogram array, which is
DAMOS_MAX_SCORE+1 size, it causes out of bounds array access.

The issues can be relatively easily reproduced like below.  The sysfs
write permission is required, though.

    # ./damo start --damos_action lru_prio --damos_quota_space 100M \
            --damos_quota_interval 1s
    # cd /sys/kernel/mm/damon/admin/kdamonds/0
    # echo 0 > contexts/0/monitoring_attrs/intervals/sample_us
    # echo 0 > contexts/0/monitoring_attrs/intervals/aggr_us
    # echo commit > state
    # dmesg
    [...]
    [  131.329762] Oops: divide error: 0000 [#1] SMP NOPTI
    [...]
    [  131.336089] RIP: 0010:damon_hot_score+0x27/0xd0
    [...]

Fix the divide-by-zero intervals problems by explicitly handling the zero
intervals in damon_max_nr_accesses().  Fix the out-of-bound array access
by applying [0, DAMOS_MAX_SCORE] bounds before returning from
damon_hot_score().

The issue was discovered [1] by Sashiko.

Link: https://lore.kernel.org/20260623135834.67189-1-sj@kernel.org
Link: https://lore.kernel.org/20260619202459.145010-1-sj@kernel.org
Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 5.16.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: add Lance as an rmap reviewer

Lance has been doing excellent work reviewing rmap series and has proven
himself to be a great member of the community in general, so add him as an
rmap reviewer.

Link: https://lore.kernel.org/20260622155913.280355-1-ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
Acked-by: Dev Jain <dev.jain@arm.com>
Acked-by: Barry Song <baohua@kernel.org>
Acked-by: Lance Yang <lance.yang@linux.dev>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/compaction: handle free_pages_prepare() properly in compaction_free()

free_pages_prepare() can fail but compaction_free() does not handle the
failure case. Failed pages should not be added back to cc->freepages for
future use, since they can be either PageHWPoison or free_page_is_bad()
and might cause data corruption.

Link: https://lore.kernel.org/20260622-handle_free_pages_prepare_in_compaction_free-v1-1-fcf3b14abcf7@nvidia.com
Fixes: 733aea0b3a7b ("mm/compaction: add support for >0 order folio memory compaction.")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/sysfs-schemes: put stats for scheme_add_dirs() internal error

damon_sysfs_scheme_add_dirs() setup the tried_regions directory after the
stats directory setup is completed.  When the tried_regions directory
setup is failed, the setup function ensures the reference for the tried
regions directory is released.  Hence the error path should put references
on setup succeeded directory objects, starting from the stats directory.
However, the error path is putting the tried_regions directory instead of
the stats directory.

As a direct result, the stats directory object is leaked.  Worse yet, if
the tried_regions directory setup failed from the initial allocation, the
scheme->tried_regions field remains uninitialized.  The following
kobject_put(&scheme->tried_regions->kobj) call in the error path will
dereference the uninitialized memory.  The setup failures should not be
common.  But once it happens, the consequence is quite bad.

Fix this issue by correctly putting the stats directory instead of the
tried_regions directory.

The issue was discovered [1] by Sashiko.

Link: https://lore.kernel.org/20260618005650.83868-3-sj@kernel.org
Link: https://lore.kernel.org/20260617005223.96813-1-sj@kernel.org
Fixes: 5181b75f438d ("mm/damon/sysfs-schemes: implement schemes/tried_regions directory")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.2.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/sysfs-schemes: fix dir put orders in access_pattern_add_dirs()

Patch series "mm/damon/sysfs-schemes: fix wrong directories put orders in
error paths".

Error paths of damon_sysfs_access_pattern_add_dirs() and
damon_sysfs_scheme_add_dirs() functions put references to directories in
wrong orders.  As a result, uninitialized memory dereference and/or
memory leak can happen.  Fix those.

This patch (of 2):

In access_pattern_add_dirs(), error handling path puts references starting
from setup failed directories.  If the failure happpened from the initial
allication in the setup functions, uninitialized memory dereference
happen.  The allocation failures will not commonly happen, but the
consequence is quite bad.  Fix the wrong reference put orders.

The issue was discovered [1] by Sashiko.

Link: https://lore.kernel.org/20260618005650.83868-2-sj@kernel.org
Link: https://lore.kernel.org/20260617060005.86852-1-sj@kernel.org
Fixes: 7e84b1f8212a ("mm/damon/sysfs: support DAMON-based Operation Schemes")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 5.18.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: shrinker: fix NULL pointer dereference in debugfs

shrinker_debugfs_add() creates both "count" and "scan" debugfs files
unconditionally.

That assumes every shrinker implements both count_objects() and
scan_objects(), which is not guaranteed. For example, the xen-backend
shrinker sets count_objects() but leaves scan_objects() NULL, so writing
to its scan file calls through a NULL function pointer and panics the
kernel:

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
Call Trace:
<TASK>
shrinker_debugfs_scan_write+0x12e/0x270
full_proxy_write+0x5f/0x90
vfs_write+0xde/0x420
? filp_flush+0x75/0x90
? filp_close+0x1d/0x30
? do_dup2+0xb8/0x120
ksys_write+0x68/0xf0
? filp_flush+0x75/0x90
do_syscall_64+0xb3/0x5b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e

The count path has the same issue in principle if a shrinker omits
count_objects().

To fix it, only create "count" and "scan" debugfs files when the
corresponding callbacks are present.

Link: https://lore.kernel.org/20260617090052.27325-1-qi.zheng@linux.dev
Fixes: bbf535fd6f06 ("mm: shrinkers: add scan interface for shrinker debugfs")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: shrinker: fix shrinker_info teardown race with expansion

expand_shrinker_info() iterates all visible memcgs under shrinker_mutex,
including memcgs that have not finished ->css_online() yet.

Once pn->shrinker_info has been published, teardown must stay serialized
with expand_shrinker_info() until that memcg is either fully online or no
longer visible to iteration.  Today alloc_shrinker_info() breaks that rule
by dropping shrinker_mutex before freeing a partially initialized
shrinker_info array, which may cause the following race:

CPU0                   CPU1
====                   ====

css_create
--> list_add_tail_rcu(&css->sibling, &parent_css->children);
    online_css
    --> mem_cgroup_css_online
        --> alloc_shrinker_info
            --> alloc node0 info
                rcu_assign_pointer(C->node0->shrinker_info, old0)
                alloc node1 info -> FAIL -> goto err
                mutex_unlock(shrinker_mutex)

                       shrinker_alloc()
                       --> shrinker_memcg_alloc
                           --> mutex_lock(shrinker_mutex)
                               expand_shrinker_info
                               --> mem_cgroup_iter see the memcg
                                   expand_one_shrinker_info
                                   --> old0 = C->node0->shrinker_info
                                       memcpy(new->unit, old0->unit, ...);

                free_shrinker_info
                --> kvfree(old0);

                                       /* double free !! */
                                       kvfree_rcu(old0, rcu);

The same problem exists later in mem_cgroup_css_online().  If
alloc_shrinker_info() succeeds but a subsequent objcg allocation fails,
the free_objcg -> free_shrinker_info() unwind path tears down the already
published pn->shrinker_info arrays without shrinker_mutex.  The
expand_one_shrinker_info() can race with that teardown in the same way,
leading to use-after-free or double-free of the old shrinker_info.

Fix this by serializing shrinker_info teardown with shrinker_mutex, and by
keeping alloc_shrinker_info() error cleanup inside the locked section.

Link: https://lore.kernel.org/20260617085658.27096-1-qi.zheng@linux.dev
Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/mm: fix ksft_process_madv.sh test category

ksft_process_madv.sh currently runs run_vmtests.sh with the mmap category.
Update it to run the process_madv category, since ksft_mmap.sh already
runs the mmap category tests.

This avoids running mmap tests twice and ensures that process_madv tests
are run through the kselftest harness.

Link: https://lore.kernel.org/20260608103224.344101-1-sarthak.sharma@arm.com
Fixes: 6ce964c02f1c ("selftests/mm: have the harness run each test category separately")
Signed-off-by: Sarthak Sharma <sarthak.sharma@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

cifs: update internal module version number

to 2.60

Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: use unaligned reads in parse_posix_ctxt()

The server controls create-context DataOffset, so the POSIX context data
pointer may be misaligned on strict-alignment architectures. Use
get_unaligned_le32() when reading nlink, reparse_tag, and mode.

Fixes: 69dda3059e7a ("cifs: add SMB2_open() arg to return POSIX data")
Cc: stable@vger.kernel.org
Signed-off-by: Zihan Xi <xizh2024@lzu.edu.cn>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: harden POSIX SID length parsing

posix_info_sid_size() reads sid[1] to obtain the subauthority count,
but its existing boundary check still accepts buffers with only one
remaining byte. Require two bytes before reading sid[1] so all client
paths that reuse the helper reject truncated POSIX SIDs safely.

Fixes: 349e13ad30b4 ("cifs: add smb2 POSIX info level")
Cc: stable@vger.kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Assisted-by: Codex:gpt-5.4
Signed-off-by: Zihan Xi <xizh2024@lzu.edu.cn>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

block: Make WBT latency writes honor enable state

queue/wbt_lat_usec controls both the stored WBT latency target and the
effective WBT enable state.

The old no-op check skipped updates whenever the converted latency
matched the stored min_lat_nsec. That check ignored whether the current
WBT state already matched the state requested by the write. For a queue
disabled by default, attempting to enable WBT by writing the default
value through sysfs could return success while the enable state was left
unchanged.

Treat a write as a no-op only when both the stored latency and the
effective WBT enabled state already match the converted value.

Signed-off-by: Guzebing <guzebing1612@gmail.com>
Link: https://patch.msgid.link/20260621014030.1625306-1-guzebing1612@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'bootconfig-fixes-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull bootconfig fix from Masami Hiramatsu:

- bootconfig: Fix NULL-pointer arithmetic

   Fix undefined pointer arithmetic in xbc_snprint_cmdline() when
   probing the buffer length with NULL and size 0. Track the written
   length as a size_t instead to prevent build-time UBSan/FORTIFY_SOURCE
   failures.

* tag 'bootconfig-fixes-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()

selinux: avoid sk_socket dereference in selinux_sctp_bind_connect()

selinux_sctp_bind_connect() dereferences sk->sk_socket to pass a
struct socket * to selinux_socket_bind() and
selinux_socket_connect_helper().  However, when the hook is invoked
from the ASCONF softirq path (sctp_process_asconf), there is no file
reference guaranteeing that sk->sk_socket is non-NULL.  The setsockopt
callers (bindx, connectx, set_primary, sendmsg connect) hold a file
reference and are not affected.

Both selinux_socket_bind() and selinux_socket_connect_helper()
immediately resolve sock->sk, never using the struct socket * for
anything else.  Refactor the inner logic into helpers that take a
struct sock * directly so that selinux_sctp_bind_connect() never needs
to touch sk->sk_socket at all.

Cc: stable@vger.kernel.org
Fixes: d452930fd3b9 ("selinux: Add SCTP support")
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

accel/amdxdna: Fix use-after-free in debug BO command handling

When a debug BO command completes, job->drv_cmd may already have been
freed. Accessing it from aie2_sched_drvcmd_resp_handler() can result in
a use-after-free and memory corruption.

Fix this by introducing reference counting for drv_cmd objects and
transferring ownership to the job while it is in flight. This ensures
that the command remains valid until the completion handler finishes
processing it.

Fixes: 7ea046838021 ("accel/amdxdna: Support firmware debug buffer")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260701155556.663541-1-lizhi.hou@amd.com

iio: adc: ti-ads124s08: Return reset GPIO lookup errors

devm_gpiod_get_optional() returns NULL when the optional GPIO is absent,
but returns an ERR_PTR when the GPIO provider lookup fails, including
probe deferral.

Probe currently logs the ERR_PTR case as if the reset GPIO were simply
absent and keeps the error pointer in reset_gpio. Later ads124s_reset()
treats any non-NULL reset_gpio as a valid descriptor and passes it to
gpiod_set_value_cansleep().

Return the lookup error instead of retaining the ERR_PTR.

Fixes: e717f8c6dfec ("iio: adc: Add the TI ads124s08 ADC code")
Cc: stable@vger.kernel.org
Reviewed-by: Joshua Crofts <joshua.crofts1@gmail.com>
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>

iio: temperature: Build mlx90635 with CONFIG_MLX90635

drivers/iio/temperature/Kconfig has a dedicated MLX90635 option, but
the Makefile currently builds mlx90635.o under CONFIG_MLX90632.

This means enabling CONFIG_MLX90635 alone does not carry its provider
object into the build, while enabling CONFIG_MLX90632 unexpectedly also
builds mlx90635.o.

Gate mlx90635.o on the matching generated Kconfig symbol.

Fixes: a1d1ba5e1c28 ("iio: temperature: mlx90635 MLX90635 IR Temperature sensor")
Cc: stable@vger.kernel.org
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Acked-by: Crt Mori <cmo@melexis.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>

selinux: check connect-related permissions on TCP Fast Open

Similar to Landlock, SELinux was not updated when TCP Fast Open
support was introduced to ensure connect-related permissions are
checked when using TCP Fast Open. Update its socket_sendmsg() hook to
call selinux_socket_connect() when MSG_FASTOPEN is passed.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-security-module/20260616201615.275032-1-hexlabsecurity@proton.me/
Link: https://lore.kernel.org/linux-security-module/20260617180526.15627-2-matthieu@buffet.re/
Reported-by: Bryam Vargas <hexlabsecurity@proton.me>
Reported-by: Matthieu Buffet <matthieu@buffet.re>
Reported-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Bryam Vargas <hexlabsecurity@proton.me>
Signed-off-by: Paul Moore <paul@paul-moore.com>

x86,fs/resctrl: Prevent out-of-bounds access while offlining CPU when SNC enabled

The architecture updates the cpu_mask in a domain's header to track which
online CPUs are associated with the domain. When this mask becomes empty
the architecture initiates offline of the domain that includes calling
on resctrl fs to offline the domain. If it is a monitoring domain in
which LLC occupancy is tracked resctrl fs forces the limbo handler to
clear all busy RMID state associated with the domain.

The limbo handler always reads the current event value associated with a
busy RMID irrespective of it being checked as part of regular "is it still
busy" check or whether it will be forced released anyway. When reading an
RMID on a system with SNC enabled the "logical RMID" is converted to the
"physical RMID" and this conversion requires the NUMA node ID of the
resctrl monitoring domain that is in turn determined by querying the NUMA
node ID of any CPU belonging to the monitoring domain.

When the monitoring domain is going offline its cpu_mask is empty causing
the NUMA node ID query via cpu_to_node() to be done with "nr_cpu_ids" as
argument resulting in an out-of-bounds access.

Refactor the limbo handler to skip reading the RMID when the RMID will
just be forced to no longer be dirty in the domain anyway. Add a safety
check to the architecture's RMID reader to protect against this scenario.

Fixes: e13db55b5a0d ("x86/resctrl: Introduce snc_nodes_per_l3_cache")
Closes: https://sashiko.dev/#/patchset/cover.1780456704.git.reinette.chatre%40intel.com?part=9
Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/16137433df42f85013b2f7a53626795cbd6637b9.1781029125.git.reinette.chatre@intel.com

drm/xe/rtp: Add struct types for RTP tables

We currently have a mixture of styles for our RTP tables with respect of
how we define the number of entries:

  * xe_rtp_process_to_sr() expects to receive the number of entries as
    arguments;
  * xe_rtp_process() expects the array to have a sentinel at the end of
    the array;
  * in xe_rtp_test.c, even though xe_rtp_process_to_sr() does not
    require a sentinel value, we need to rely on that technique to be
    able to count xe_rtp_entry_sr entries because simply using
    ARRAY_SIZE() is not possible.

The style used by xe_rtp_process_to_sr() makes it hard to share the
tables with other compilation units (e.g. kunit tests), since the number
of entries is calculated with ARRAY_SIZE(), which is done at compile
time.

Since we use the size of the tables to create some bitmasks, using a
sentinel style doesn't seem great either.

A way to reconcile things into a single style is to have a struct type
that would hold the entries array and the number of entries.  Since we
have xe_rtp_entry and xe_rtp_entry_sr, we would have one type for each.

The advantage of the proposed approach is that now we have a nice way to
share the tables directly to kunit tests with information about their
size.

v6:
    - Removed sentinels that are not needed

v5:
    - Removed added code from conflict resolution issues

v4:
    - Removed conflicts with main branch

v3:
    - No changes

v2:
    - Add compatibility with new xe_rtp_table_sr format for
      "bad-mcr-reg-forced-to-regular" and
      "bad-regular-reg-forced-to-mcr"

Fixes: 828a8eaf37c3 ("drm/xe/oa: Add MMIO trigger support")
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Violet Monti <violet.monti@intel.com>
Link: https://patch.msgid.link/20260601200947.2032784-7-violet.monti@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 5ff004fdc7377905f2fe5264b8829d35e14608b8)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

ASoC: codecs: tas675x: misc bugfixes and minor changes

Sen Wang <sen@ti.com> says:

Few miscellaneous bug fixes after the initial merge of TAS675x driver, of
which includes:

- Adding READ_ONCE for all concurrent read params
- Corrected kcontrol bits for temperature range
- Corrected conversion notes in the driver documentation

Link: https://patch.msgid.link/20260630183126.2588322-1-sen@ti.com

Documentation: sound: tas675x: Fix temperature range and impedance documentation

Two corrections against the TRM (SLOU589A):
- Corrected channel temperature range
- Corrected conversion formula for global temperature

Fixes: ba46edca354e ("Documentation: sound: Add TAS675x codec mixer controls documentation")
Signed-off-by: Sen Wang <sen@ti.com>
Link: https://patch.msgid.link/20260630183126.2588322-4-sen@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: tas675x: Fix CHx temperature range register bit fields

The initial merged patch mixed up the bits for temp reg with LDG report,
now fixing to the right bits according to TRM (SLOU589A).

Fixes: 133c81f84471 ("ASoC: codecs: Add TAS67524 quad-channel audio amplifier driver")
Signed-off-by: Sen Wang <sen@ti.com>
Link: https://patch.msgid.link/20260630183126.2588322-3-sen@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: tas675x: use READ_ONCE for params to be used concurrently

active_playback_dais and active_capture_dais are written atomically via
set_bit()/clear_bit() and can be read concurrently from the
fault_check_work delayed work handler.

fault_check_work already uses READ_ONCE; extend the same guard to all other
reads in tas675x_hw_params() and tas675x_mute_stream().

Fixes: 133c81f84471 ("ASoC: codecs: Add TAS67524 quad-channel audio amplifier driver")
Signed-off-by: Sen Wang <sen@ti.com>
Link: https://patch.msgid.link/20260630183126.2588322-2-sen@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>

selftests/hid: multitouch: test a large ContactCountMaximum

Add a regression test for the out-of-bounds bit operations on
struct mt_device.mt_io_flags.

A HID multitouch device can advertise a ContactCountMaximum far larger
than the number of contacts a single report describes, up to 255. The
driver used to keep the per-slot active state in the bits of a single
unsigned long and index set_bit()/clear_bit() by the slot number, so such
a device drove those operations out of bounds. The sticky-fingers release
timer made it fatal: mt_release_contacts() cleared one bit per slot and
overwrote the adjacent members of struct mt_device.

The new device advertises a ContactCountMaximum of 250 while exposing only
a few finger collections (a large contact count cannot be expressed with
one finger collection per contact within the HID descriptor size limit).
The test sends a single contact and lets the 100ms sticky-fingers timer
release it. A kernel without the fix panics in mt_release_contacts(); a
fixed kernel reports the release cleanly.

Signed-off-by: Trung Nguyen <trungnh@cystack.net>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>

HID: multitouch: fix out-of-bounds bit access on mt_io_flags

mt_io_flags is a single unsigned long, but mt_process_slot(),
mt_release_pending_palms() and mt_release_contacts() use it as a
per-slot bitmap indexed by the slot number. That slot number is only
bounded by td->maxcontacts, which is taken from the device's
ContactCountMaximum feature report and can be up to 255, not by
BITS_PER_LONG.

As a result, a multitouch device that advertises a large contact count
makes set_bit()/clear_bit() operate past the mt_io_flags word and
corrupt the adjacent members of struct mt_device. The sticky-fingers
release timer is the easiest way to reach this. mt_release_contacts()
runs

for (i = 0; i < mt->num_slots; i++)
clear_bit(i, &td->mt_io_flags);

with num_slots == maxcontacts. For maxcontacts around 250 the loop
clears the bits that overlap td->applications.next, zeroing that list
head, and the list_for_each_entry() that immediately follows then
dereferences NULL. The kernel panics from timer (softirq) context. On a
KASAN build this shows up as a general protection fault in
mt_release_contacts() with a null-ptr-deref at offset 0x58, which is
offsetof(struct mt_application, num_received).

The state is reachable from an untrusted USB or Bluetooth HID
multitouch device; no local privileges are required.

Store the per-slot active state in a separately allocated bitmap sized
for maxcontacts, the same pattern already used for pending_palm_slots,
and keep only MT_IO_FLAGS_RUNNING in mt_io_flags. The two
"mt_io_flags & MT_IO_SLOTS_MASK" arming checks become
bitmap_empty(td->active_slots, td->maxcontacts).

Move MT_IO_FLAGS_RUNNING back to bit 0. It was bumped to bit 32 by the
same commit to leave the low byte for the slot bits; with the slot bits
gone it fits in bit 0 again, which also keeps it within the unsigned
long on 32-bit.

Fixes: 46f781e0d151 ("HID: multitouch: fix sticky fingers")
Cc: stable@vger.kernel.org
Signed-off-by: Trung Nguyen <trungnh@cystack.net>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>

drm/amdgpu/jpeg: fix jpeg_v4_0_3_is_idle detection

jpeg_v4_0_3_is_idle() initializes ret to false and then accumulates ring
idle status using &=. Since false & condition always remains false, the
function can never report the JPEG block as idle.

Initialize ret to true so the function returns true only when all JPEG
rings report RB_JOB_DONE.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e9df8e9d04e0593d17ddb069f3b7958991cd18c9)
Cc: stable@vger.kernel.org

drm/amdgpu: Fix kernel panic during driver load failure

Avoid kernel panic if MES init fails during driver load. The KIQ ring is
falsely marked as ready as ASICs that use MES, KIQ is owned by MES.

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:gfx_v12_1_wait_reg_mem+0x5a/0x1f0 [amdgpu]
Call Trace:
gfx_v12_1_ring_emit_reg_write_reg_wait+0x1f/0x30 [amdgpu]
amdgpu_gmc_fw_reg_write_reg_wait+0xb2/0x190 [amdgpu]
amdgpu_gmc_flush_gpu_tlb+0x1cc/0x230 [amdgpu]
amdgpu_gart_invalidate_tlb+0x81/0xa0 [amdgpu]
amdgpu_gart_unbind+0x72/0x90 [amdgpu]
amdgpu_ttm_backend_unbind+0xa4/0xb0 [amdgpu]
amdgpu_ttm_tt_unpopulate+0x13/0xd0 [amdgpu]
amdttm_tt_unpopulate+0x29/0x70 [amdttm]
ttm_bo_put+0x1eb/0x360 [amdttm]
amdgpu_bo_free_kernel+0xf9/0x1f0 [amdgpu]
amdgpu_ih_ring_fini+0x5a/0x90 [amdgpu]
amdgpu_irq_fini_hw+0x58/0x80 [amdgpu]
amdgpu_device_fini_hw+0x4e0/0x5b0 [amdgpu]
amdgpu_driver_load_kms+0x60/0xa0 [amdgpu]
amdgpu_pci_probe+0x28e/0x6d0 [amdgpu]
pci_device_probe+0x19f/0x220
really_probe+0x1ed/0x340
driver_probe_device+0x1e/0x80
__driver_attach+0xd3/0x1a0
bus_for_each_dev+0x68/0xa0
bus_add_driver+0x19f/0x270
driver_register+0x5d/0xf0
do_one_initcall+0xac/0x200
do_init_module+0x1ec/0x280
__se_sys_finit_module+0x2de/0x310
do_syscall_64+0x6a/0x250
entry_SYSCALL_64_after_hwframe+0x4b/0x53

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4623b958dd6da0f4c3026afdf330626a09ecb0f0)
Cc: stable@vger.kernel.org

drm/amd/display: detect_link_and_local_sink: DP alt mode timeout path leaks prev_sink reference

prev_sink is unconditionally retained via dc_sink_retain at function
  entry, but the DP alt mode timeout path inside SIGNAL_TYPE_DISPLAY_PORT
  returns false without releasing prev_sink. All other return paths in the
  function correctly call dc_sink_release(prev_sink), making this the only
  missing cleanup.

Fixes: 54618888d1ea ("drm/amd/display: break down dc_link.c")
Signed-off-by: WenTao Liang <vulab@iscas.ac.cn>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/20260626124555.36910-1-vulab@iscas.ac.cn
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 45510cf662dcf46b5d8926d454f338809f107b9d)
Cc: stable@vger.kernel.org

drm/amd/pm: fix smu13 power limit range calculation

SMU13 reports SocketPowerLimitAc/Dc as the default power limit, but
MsgLimits.Power may carry a different firmware bound for the same PPT
throttler. Using only the socket limit for both min and max can therefore
expose an incorrect power range.

Keep the socket limit as the default, but derive the range from both values:
use the lower value for the min base and the higher value for the max base
before applying OD percentages. Keep the current limit query independent
from the cap calculation.

Fixes: 1eaf26db9590 ("drm/amd/pm: fix smu13 power limit default/cap calculation")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5419
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f45bbf0f62f266ed8422d84f347d75d5fca846a7)
Cc: stable@vger.kernel.org

drm/amdgpu: flush pending RCU callbacks on module unload

Call rcu_barrier() in module exit to wait for outstanding call_rcu() callbacks
before freeing module text, preventing late callback execution in freed memory.

BUG: unable to handle page fault for address: ffffffffc1d59c40
PGD 6a12067 P4D 6a12067 PUD 6a14067 PMD 13698b067 PTE 0
Oops: 0010 [#1] SMP NOPTI
RIP: 0010:0xffffffffc1d59c40
Code: Unable to access opcode bytes at RIP 0xffffffffc1d59c16.
RSP: 0018:ffffc900198c0f28 EFLAGS: 00010286
RAX: ffffffffc1d59c40 RBX: ffff897c7d6b61c0 RCX: ffff88826aff4590
RDX: ffff8884d8b35490 RSI: ffffc900198c0f30 RDI: ffff88812af67290
RBP: 000000000000000a (DONE segment entries) R08: 0000000000000000 R09: 0000000000000100
R10: 0000000000000000 R11: ffffffff82a06100 R12: ffff88811a4e3700
R13: 0000000000000000 R14: ffff897c7d6b6270 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff897c7d680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc1d59c16 CR3: 00000104a980a001 CR4: 0000000002770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? rcu_do_batch+0x163/0x450
? rcu_core+0x177/0x1c0
? __do_softirq+0xc1/0x280
? asm_call_irq_on_stack+0xf/0x20
</IRQ>
? do_softirq_own_stack+0x37/0x50
? irq_exit_rcu+0xc4/0x100
? sysvec_apic_timer_interrupt+0x36/0x80
? asm_sysvec_apic_timer_interrupt+0x12/0x20
? cpuidle_enter_state+0xd4/0x360
? cpuidle_enter+0x29/0x40
? cpuidle_idle_call+0x108/0x1a0
? do_idle+0x77/0xf0
? cpu_startup_entry+0x19/0x20
? secondary_startup_64_no_verify+0xbf/0xcb

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit feaa5039f6c12acc9aa934c2d45dcd251a12c69f)

drm/amdgpu: Fix AMDGPU_GTT_MAX_TRANSFER_SIZE for non-4K systems

Running RCCL unit tests on a system with a 64K PAGE_SIZE triggers
the following warning and causes the test to terminate on latest
upstream kernel:

WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1335 at
amdgpu_bo_release_notify+0x1bc/0x280 [amdgpu],
CPU#18: rccl-UnitTests/33151

Call trace:
amdgpu_bo_release_notify
ttm_bo_release
amdgpu_gem_object_free
drm_gem_object_free
amdgpu_bo_unref
amdgpu_bo_create
amdgpu_bo_create_user
amdgpu_gem_object_create
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu
kfd_ioctl_alloc_memory_of_gpu
kfd_ioctl
sys_ioctl

The warning is triggered because
amdgpu_ttm_next_clear_entity() returns NULL when a clear buffer
operation is requested. This happens because the GART window
allocation for the default_entity, clear_entity and move_entity
fails during initialization.

Commit [1] introduced separate GART windows for the
default_entity, clear_entity and move_entity of each SDMA
instance. Their sizes are derived from
AMDGPU_GTT_MAX_TRANSFER_SIZE, which is currently defined as 1024
pages. This implicitly assumes a 4K PAGE_SIZE, where 1024 pages
correspond to a 4MB transfer. On a 64K PAGE_SIZE system, however,
the same value expands to 64MB.

The default_entity and clear_entity each allocate one
AMDGPU_GTT_MAX_TRANSFER_SIZE GART window, while the move_entity
allocates two such windows. This results in 16MB of GART space
per SDMA instance on a 4K PAGE_SIZE system, but 256MB per SDMA
instance on a 64K PAGE_SIZE system.

On an MI210 system with five SDMA instances and a 512MB GART
aperture, the total GART space required becomes 1.25GB,
exceeding the available GART aperture. Consequently, GART window
allocation fails, amdgpu_ttm_next_clear_entity() returns NULL,
and the above warning is triggered.

Redefine AMDGPU_GTT_MAX_TRANSFER_SIZE in bytes instead of page
units. Where a page count is required, convert it using
PAGE_SHIFT. This preserves the existing 4MB transfer size across
all PAGE_SIZE configurations while keeping GART window
allocations within the available GART aperture.

[1] https://lore.kernel.org/all/20260408100327.1372-3-pierre-eric.pelloux-prayer@amd.com/#t

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5435
Fixes: 897ee11ec020 ("drm/amdgpu: create multiple clear/move ttm entities")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 27213b776a666d3030de5acc3cd75278197b0494)
Cc: stable@vger.kernel.org

drm/amdkfd: Use kvcalloc to allocate arrays

There were a few instances in kfd_chardev.c of kvzalloc being
used to allocate memory for an array.

Switch those to kvcalloc, which
- is the standard way of allocating a zero-initialized array
- does a check for the mul overflowing

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 60b048c93f7a3add39757ad65fe2bb6e58eeae23)
Cc: stable@vger.kernel.org

drm/amdgpu: add support for GC IP version 11.7.1

Initialize GC IP 11_7_1

Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a928d8d81ec5cdb5a8944d08136720811efad0f6)

drm/amdgpu: add support for GC IP version 11.7.0

Initialize GC IP 11_7_0

Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit cf591e67c095542a16475df293ec7bc9a118e4ee)

drm/amdgpu: add the doorbell index input for suspending userq

It requires inputing the doorbell offset for MES firmware preempts the
userq, and adding the doorbell offset also keep aliging with the
union MESAPI__SUSPEND in MES firmware.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bc434335ab3c096a33a9e88c7951b4ac574db458)
Cc: stable@vger.kernel.org

drm/amdgpu/mes12: set doorbell offset for suspending userq

Updating the union MESAPI__SUSPEND and union MESAPI__RESUME to
add the doorbell offset for suspending userq.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5b58a2c120063544869d0284d3b355527f9f04f5)
Cc: stable@vger.kernel.org