git.ipfire.org Git - thirdparty/linux.git/log

mm: page_ext: add count limit to page_ext_iter_next to prevent invalid PFN access

The page_ext iteration API does not validate if the PFN still belongs to a
valid section while advancing the iterator.  When dynamically adding
memory in the hotplug path, it can lead to a NULL pointer dereference
during page_ext_lookup at the boundary of the last valid section when
iterator count equals __pgcount.

The for_each_page_ext() macro calls page_ext_iter_next() as its loop
increment.  for_each_page_ext() does a "__page_ext =
page_ext_iter_next(&__iter)" at the end.  This causes page_ext_iter_next()
to increment iter->index past __pgcount and call page_ext_lookup(start_pfn
+ __pgcount).  During memory hotplug (online), the PFN at start_pfn +
__pgcount may belong to a section that has not yet been initialized,
causing page_ext_lookup() to trigger a NULL pointer dereference.

[   14.555124][  T846] Call trace:
[   14.555125][  T846]  lookup_page_ext+0x6c/0x108 (P)
[   14.555127][  T846]  page_ext_lookup+0x30/0x3c
[   14.555129][  T846]  __reset_page_owner+0x11c/0x260
[   14.571201][  T846]  __free_pages_ok+0x5e8/0x8e0
[   14.571204][  T846]  __free_pages_core+0x78/0xf0
[   14.571206][  T846]  generic_online_page+0x14/0x24
[   14.597782][  T846]  online_pages+0x178/0x30c
[   14.597784][  T846]  memory_block_change_state+0x284/0x32c
[   14.597787][  T846]  memory_subsys_online+0x4c/0x64
[   14.597789][  T846]  device_online+0x88/0xb0
[   14.597791][  T846]  online_memory_block+0x30/0x40
[   14.597793][  T846]  walk_memory_blocks+0xac/0xe8
[   14.597794][  T846]  add_memory_resource+0x280/0x298
[   14.656161][  T846]  add_memory+0x60/0x98

Move the iteration boundary enforcement inside the iterator functions, so
callers cannot inadvertently access beyond the requested range.

Link: https://lore.kernel.org/20260623-page_ext-v3-1-a89799a5367c@oss.qualcomm.com
Fixes: 9039b9096ea2 ("mm: page_ext: add an iteration API for page extensions")
Signed-off-by: Ketan Kishore <ketan.kishore@oss.qualcomm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Luiz Capitulino <luizcap@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/ops-common: handle extreme intervals in damon_hot_score()

Fix three issues in damon_hot_score() that comes from wrong handling of
extreme (zero or too high) monitoring intervals user setup.

When the user sets sampling interval zero, damon_max_nr_accesses(), which
is called from damon_hot_score(), causes a divide-by-zero.  Needless to
say, it is a problem.

When the user sets the aggregation interval zero, the function returns
zero.  It is wrong, since the real maximum nr_acceses in the setup should
be one.  Worse yet, it can cause another divide-by-zero from its caller,
damon_hot_score(), since it uses damon_max_nr_accesses() return value as a
denominator.

When the user sets the aggregation interval very high, damon_hot_score()
could return a value out of [0, DAMOS_MAX_SCORE] range.  Since the return
value is used as an index to the regions_score_histogram array, which is
DAMOS_MAX_SCORE+1 size, it causes out of bounds array access.

The issues can be relatively easily reproduced like below.  The sysfs
write permission is required, though.

    # ./damo start --damos_action lru_prio --damos_quota_space 100M \
            --damos_quota_interval 1s
    # cd /sys/kernel/mm/damon/admin/kdamonds/0
    # echo 0 > contexts/0/monitoring_attrs/intervals/sample_us
    # echo 0 > contexts/0/monitoring_attrs/intervals/aggr_us
    # echo commit > state
    # dmesg
    [...]
    [  131.329762] Oops: divide error: 0000 [#1] SMP NOPTI
    [...]
    [  131.336089] RIP: 0010:damon_hot_score+0x27/0xd0
    [...]

Fix the divide-by-zero intervals problems by explicitly handling the zero
intervals in damon_max_nr_accesses().  Fix the out-of-bound array access
by applying [0, DAMOS_MAX_SCORE] bounds before returning from
damon_hot_score().

The issue was discovered [1] by Sashiko.

Link: https://lore.kernel.org/20260623135834.67189-1-sj@kernel.org
Link: https://lore.kernel.org/20260619202459.145010-1-sj@kernel.org
Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 5.16.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: add Lance as an rmap reviewer

Lance has been doing excellent work reviewing rmap series and has proven
himself to be a great member of the community in general, so add him as an
rmap reviewer.

Link: https://lore.kernel.org/20260622155913.280355-1-ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Harry Yoo (Oracle) <harry@kernel.org>
Acked-by: Dev Jain <dev.jain@arm.com>
Acked-by: Barry Song <baohua@kernel.org>
Acked-by: Lance Yang <lance.yang@linux.dev>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/compaction: handle free_pages_prepare() properly in compaction_free()

free_pages_prepare() can fail but compaction_free() does not handle the
failure case. Failed pages should not be added back to cc->freepages for
future use, since they can be either PageHWPoison or free_page_is_bad()
and might cause data corruption.

Link: https://lore.kernel.org/20260622-handle_free_pages_prepare_in_compaction_free-v1-1-fcf3b14abcf7@nvidia.com
Fixes: 733aea0b3a7b ("mm/compaction: add support for >0 order folio memory compaction.")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/sysfs-schemes: put stats for scheme_add_dirs() internal error

damon_sysfs_scheme_add_dirs() setup the tried_regions directory after the
stats directory setup is completed.  When the tried_regions directory
setup is failed, the setup function ensures the reference for the tried
regions directory is released.  Hence the error path should put references
on setup succeeded directory objects, starting from the stats directory.
However, the error path is putting the tried_regions directory instead of
the stats directory.

As a direct result, the stats directory object is leaked.  Worse yet, if
the tried_regions directory setup failed from the initial allocation, the
scheme->tried_regions field remains uninitialized.  The following
kobject_put(&scheme->tried_regions->kobj) call in the error path will
dereference the uninitialized memory.  The setup failures should not be
common.  But once it happens, the consequence is quite bad.

Fix this issue by correctly putting the stats directory instead of the
tried_regions directory.

The issue was discovered [1] by Sashiko.

Link: https://lore.kernel.org/20260618005650.83868-3-sj@kernel.org
Link: https://lore.kernel.org/20260617005223.96813-1-sj@kernel.org
Fixes: 5181b75f438d ("mm/damon/sysfs-schemes: implement schemes/tried_regions directory")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.2.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/sysfs-schemes: fix dir put orders in access_pattern_add_dirs()

Patch series "mm/damon/sysfs-schemes: fix wrong directories put orders in
error paths".

Error paths of damon_sysfs_access_pattern_add_dirs() and
damon_sysfs_scheme_add_dirs() functions put references to directories in
wrong orders.  As a result, uninitialized memory dereference and/or
memory leak can happen.  Fix those.

This patch (of 2):

In access_pattern_add_dirs(), error handling path puts references starting
from setup failed directories.  If the failure happpened from the initial
allication in the setup functions, uninitialized memory dereference
happen.  The allocation failures will not commonly happen, but the
consequence is quite bad.  Fix the wrong reference put orders.

The issue was discovered [1] by Sashiko.

Link: https://lore.kernel.org/20260618005650.83868-2-sj@kernel.org
Link: https://lore.kernel.org/20260617060005.86852-1-sj@kernel.org
Fixes: 7e84b1f8212a ("mm/damon/sysfs: support DAMON-based Operation Schemes")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 5.18.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: shrinker: fix NULL pointer dereference in debugfs

shrinker_debugfs_add() creates both "count" and "scan" debugfs files
unconditionally.

That assumes every shrinker implements both count_objects() and
scan_objects(), which is not guaranteed. For example, the xen-backend
shrinker sets count_objects() but leaves scan_objects() NULL, so writing
to its scan file calls through a NULL function pointer and panics the
kernel:

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
Call Trace:
<TASK>
shrinker_debugfs_scan_write+0x12e/0x270
full_proxy_write+0x5f/0x90
vfs_write+0xde/0x420
? filp_flush+0x75/0x90
? filp_close+0x1d/0x30
? do_dup2+0xb8/0x120
ksys_write+0x68/0xf0
? filp_flush+0x75/0x90
do_syscall_64+0xb3/0x5b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e

The count path has the same issue in principle if a shrinker omits
count_objects().

To fix it, only create "count" and "scan" debugfs files when the
corresponding callbacks are present.

Link: https://lore.kernel.org/20260617090052.27325-1-qi.zheng@linux.dev
Fixes: bbf535fd6f06 ("mm: shrinkers: add scan interface for shrinker debugfs")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: shrinker: fix shrinker_info teardown race with expansion

expand_shrinker_info() iterates all visible memcgs under shrinker_mutex,
including memcgs that have not finished ->css_online() yet.

Once pn->shrinker_info has been published, teardown must stay serialized
with expand_shrinker_info() until that memcg is either fully online or no
longer visible to iteration.  Today alloc_shrinker_info() breaks that rule
by dropping shrinker_mutex before freeing a partially initialized
shrinker_info array, which may cause the following race:

CPU0                   CPU1
====                   ====

css_create
--> list_add_tail_rcu(&css->sibling, &parent_css->children);
    online_css
    --> mem_cgroup_css_online
        --> alloc_shrinker_info
            --> alloc node0 info
                rcu_assign_pointer(C->node0->shrinker_info, old0)
                alloc node1 info -> FAIL -> goto err
                mutex_unlock(shrinker_mutex)

                       shrinker_alloc()
                       --> shrinker_memcg_alloc
                           --> mutex_lock(shrinker_mutex)
                               expand_shrinker_info
                               --> mem_cgroup_iter see the memcg
                                   expand_one_shrinker_info
                                   --> old0 = C->node0->shrinker_info
                                       memcpy(new->unit, old0->unit, ...);

                free_shrinker_info
                --> kvfree(old0);

                                       /* double free !! */
                                       kvfree_rcu(old0, rcu);

The same problem exists later in mem_cgroup_css_online().  If
alloc_shrinker_info() succeeds but a subsequent objcg allocation fails,
the free_objcg -> free_shrinker_info() unwind path tears down the already
published pn->shrinker_info arrays without shrinker_mutex.  The
expand_one_shrinker_info() can race with that teardown in the same way,
leading to use-after-free or double-free of the old shrinker_info.

Fix this by serializing shrinker_info teardown with shrinker_mutex, and by
keeping alloc_shrinker_info() error cleanup inside the locked section.

Link: https://lore.kernel.org/20260617085658.27096-1-qi.zheng@linux.dev
Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/mm: fix ksft_process_madv.sh test category

ksft_process_madv.sh currently runs run_vmtests.sh with the mmap category.
Update it to run the process_madv category, since ksft_mmap.sh already
runs the mmap category tests.

This avoids running mmap tests twice and ensures that process_madv tests
are run through the kselftest harness.

Link: https://lore.kernel.org/20260608103224.344101-1-sarthak.sharma@arm.com
Fixes: 6ce964c02f1c ("selftests/mm: have the harness run each test category separately")
Signed-off-by: Sarthak Sharma <sarthak.sharma@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

cifs: update internal module version number

to 2.60

Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: use unaligned reads in parse_posix_ctxt()

The server controls create-context DataOffset, so the POSIX context data
pointer may be misaligned on strict-alignment architectures. Use
get_unaligned_le32() when reading nlink, reparse_tag, and mode.

Fixes: 69dda3059e7a ("cifs: add SMB2_open() arg to return POSIX data")
Cc: stable@vger.kernel.org
Signed-off-by: Zihan Xi <xizh2024@lzu.edu.cn>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb: client: harden POSIX SID length parsing

posix_info_sid_size() reads sid[1] to obtain the subauthority count,
but its existing boundary check still accepts buffers with only one
remaining byte. Require two bytes before reading sid[1] so all client
paths that reuse the helper reject truncated POSIX SIDs safely.

Fixes: 349e13ad30b4 ("cifs: add smb2 POSIX info level")
Cc: stable@vger.kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Assisted-by: Codex:gpt-5.4
Signed-off-by: Zihan Xi <xizh2024@lzu.edu.cn>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

Merge tag 'bootconfig-fixes-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull bootconfig fix from Masami Hiramatsu:

- bootconfig: Fix NULL-pointer arithmetic

   Fix undefined pointer arithmetic in xbc_snprint_cmdline() when
   probing the buffer length with NULL and size 0. Track the written
   length as a size_t instead to prevent build-time UBSan/FORTIFY_SOURCE
   failures.

* tag 'bootconfig-fixes-v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()

accel/amdxdna: Fix use-after-free in debug BO command handling

When a debug BO command completes, job->drv_cmd may already have been
freed. Accessing it from aie2_sched_drvcmd_resp_handler() can result in
a use-after-free and memory corruption.

Fix this by introducing reference counting for drv_cmd objects and
transferring ownership to the job while it is in flight. This ensures
that the command remains valid until the completion handler finishes
processing it.

Fixes: 7ea046838021 ("accel/amdxdna: Support firmware debug buffer")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260701155556.663541-1-lizhi.hou@amd.com

x86,fs/resctrl: Prevent out-of-bounds access while offlining CPU when SNC enabled

The architecture updates the cpu_mask in a domain's header to track which
online CPUs are associated with the domain. When this mask becomes empty
the architecture initiates offline of the domain that includes calling
on resctrl fs to offline the domain. If it is a monitoring domain in
which LLC occupancy is tracked resctrl fs forces the limbo handler to
clear all busy RMID state associated with the domain.

The limbo handler always reads the current event value associated with a
busy RMID irrespective of it being checked as part of regular "is it still
busy" check or whether it will be forced released anyway. When reading an
RMID on a system with SNC enabled the "logical RMID" is converted to the
"physical RMID" and this conversion requires the NUMA node ID of the
resctrl monitoring domain that is in turn determined by querying the NUMA
node ID of any CPU belonging to the monitoring domain.

When the monitoring domain is going offline its cpu_mask is empty causing
the NUMA node ID query via cpu_to_node() to be done with "nr_cpu_ids" as
argument resulting in an out-of-bounds access.

Refactor the limbo handler to skip reading the RMID when the RMID will
just be forced to no longer be dirty in the domain anyway. Add a safety
check to the architecture's RMID reader to protect against this scenario.

Fixes: e13db55b5a0d ("x86/resctrl: Introduce snc_nodes_per_l3_cache")
Closes: https://sashiko.dev/#/patchset/cover.1780456704.git.reinette.chatre%40intel.com?part=9
Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/16137433df42f85013b2f7a53626795cbd6637b9.1781029125.git.reinette.chatre@intel.com

drm/xe/rtp: Add struct types for RTP tables

We currently have a mixture of styles for our RTP tables with respect of
how we define the number of entries:

  * xe_rtp_process_to_sr() expects to receive the number of entries as
    arguments;
  * xe_rtp_process() expects the array to have a sentinel at the end of
    the array;
  * in xe_rtp_test.c, even though xe_rtp_process_to_sr() does not
    require a sentinel value, we need to rely on that technique to be
    able to count xe_rtp_entry_sr entries because simply using
    ARRAY_SIZE() is not possible.

The style used by xe_rtp_process_to_sr() makes it hard to share the
tables with other compilation units (e.g. kunit tests), since the number
of entries is calculated with ARRAY_SIZE(), which is done at compile
time.

Since we use the size of the tables to create some bitmasks, using a
sentinel style doesn't seem great either.

A way to reconcile things into a single style is to have a struct type
that would hold the entries array and the number of entries.  Since we
have xe_rtp_entry and xe_rtp_entry_sr, we would have one type for each.

The advantage of the proposed approach is that now we have a nice way to
share the tables directly to kunit tests with information about their
size.

v6:
    - Removed sentinels that are not needed

v5:
    - Removed added code from conflict resolution issues

v4:
    - Removed conflicts with main branch

v3:
    - No changes

v2:
    - Add compatibility with new xe_rtp_table_sr format for
      "bad-mcr-reg-forced-to-regular" and
      "bad-regular-reg-forced-to-mcr"

Fixes: 828a8eaf37c3 ("drm/xe/oa: Add MMIO trigger support")
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Violet Monti <violet.monti@intel.com>
Link: https://patch.msgid.link/20260601200947.2032784-7-violet.monti@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 5ff004fdc7377905f2fe5264b8829d35e14608b8)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

ASoC: codecs: tas675x: misc bugfixes and minor changes

Sen Wang <sen@ti.com> says:

Few miscellaneous bug fixes after the initial merge of TAS675x driver, of
which includes:

- Adding READ_ONCE for all concurrent read params
- Corrected kcontrol bits for temperature range
- Corrected conversion notes in the driver documentation

Link: https://patch.msgid.link/20260630183126.2588322-1-sen@ti.com

Documentation: sound: tas675x: Fix temperature range and impedance documentation

Two corrections against the TRM (SLOU589A):
- Corrected channel temperature range
- Corrected conversion formula for global temperature

Fixes: ba46edca354e ("Documentation: sound: Add TAS675x codec mixer controls documentation")
Signed-off-by: Sen Wang <sen@ti.com>
Link: https://patch.msgid.link/20260630183126.2588322-4-sen@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: tas675x: Fix CHx temperature range register bit fields

The initial merged patch mixed up the bits for temp reg with LDG report,
now fixing to the right bits according to TRM (SLOU589A).

Fixes: 133c81f84471 ("ASoC: codecs: Add TAS67524 quad-channel audio amplifier driver")
Signed-off-by: Sen Wang <sen@ti.com>
Link: https://patch.msgid.link/20260630183126.2588322-3-sen@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: codecs: tas675x: use READ_ONCE for params to be used concurrently

active_playback_dais and active_capture_dais are written atomically via
set_bit()/clear_bit() and can be read concurrently from the
fault_check_work delayed work handler.

fault_check_work already uses READ_ONCE; extend the same guard to all other
reads in tas675x_hw_params() and tas675x_mute_stream().

Fixes: 133c81f84471 ("ASoC: codecs: Add TAS67524 quad-channel audio amplifier driver")
Signed-off-by: Sen Wang <sen@ti.com>
Link: https://patch.msgid.link/20260630183126.2588322-2-sen@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>

drm/amdgpu/jpeg: fix jpeg_v4_0_3_is_idle detection

jpeg_v4_0_3_is_idle() initializes ret to false and then accumulates ring
idle status using &=. Since false & condition always remains false, the
function can never report the JPEG block as idle.

Initialize ret to true so the function returns true only when all JPEG
rings report RB_JOB_DONE.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e9df8e9d04e0593d17ddb069f3b7958991cd18c9)
Cc: stable@vger.kernel.org

drm/amdgpu: Fix kernel panic during driver load failure

Avoid kernel panic if MES init fails during driver load. The KIQ ring is
falsely marked as ready as ASICs that use MES, KIQ is owned by MES.

BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:gfx_v12_1_wait_reg_mem+0x5a/0x1f0 [amdgpu]
Call Trace:
gfx_v12_1_ring_emit_reg_write_reg_wait+0x1f/0x30 [amdgpu]
amdgpu_gmc_fw_reg_write_reg_wait+0xb2/0x190 [amdgpu]
amdgpu_gmc_flush_gpu_tlb+0x1cc/0x230 [amdgpu]
amdgpu_gart_invalidate_tlb+0x81/0xa0 [amdgpu]
amdgpu_gart_unbind+0x72/0x90 [amdgpu]
amdgpu_ttm_backend_unbind+0xa4/0xb0 [amdgpu]
amdgpu_ttm_tt_unpopulate+0x13/0xd0 [amdgpu]
amdttm_tt_unpopulate+0x29/0x70 [amdttm]
ttm_bo_put+0x1eb/0x360 [amdttm]
amdgpu_bo_free_kernel+0xf9/0x1f0 [amdgpu]
amdgpu_ih_ring_fini+0x5a/0x90 [amdgpu]
amdgpu_irq_fini_hw+0x58/0x80 [amdgpu]
amdgpu_device_fini_hw+0x4e0/0x5b0 [amdgpu]
amdgpu_driver_load_kms+0x60/0xa0 [amdgpu]
amdgpu_pci_probe+0x28e/0x6d0 [amdgpu]
pci_device_probe+0x19f/0x220
really_probe+0x1ed/0x340
driver_probe_device+0x1e/0x80
__driver_attach+0xd3/0x1a0
bus_for_each_dev+0x68/0xa0
bus_add_driver+0x19f/0x270
driver_register+0x5d/0xf0
do_one_initcall+0xac/0x200
do_init_module+0x1ec/0x280
__se_sys_finit_module+0x2de/0x310
do_syscall_64+0x6a/0x250
entry_SYSCALL_64_after_hwframe+0x4b/0x53

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4623b958dd6da0f4c3026afdf330626a09ecb0f0)
Cc: stable@vger.kernel.org

drm/amd/display: detect_link_and_local_sink: DP alt mode timeout path leaks prev_sink reference

prev_sink is unconditionally retained via dc_sink_retain at function
  entry, but the DP alt mode timeout path inside SIGNAL_TYPE_DISPLAY_PORT
  returns false without releasing prev_sink. All other return paths in the
  function correctly call dc_sink_release(prev_sink), making this the only
  missing cleanup.

Fixes: 54618888d1ea ("drm/amd/display: break down dc_link.c")
Signed-off-by: WenTao Liang <vulab@iscas.ac.cn>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/20260626124555.36910-1-vulab@iscas.ac.cn
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 45510cf662dcf46b5d8926d454f338809f107b9d)
Cc: stable@vger.kernel.org

drm/amd/pm: fix smu13 power limit range calculation

SMU13 reports SocketPowerLimitAc/Dc as the default power limit, but
MsgLimits.Power may carry a different firmware bound for the same PPT
throttler. Using only the socket limit for both min and max can therefore
expose an incorrect power range.

Keep the socket limit as the default, but derive the range from both values:
use the lower value for the min base and the higher value for the max base
before applying OD percentages. Keep the current limit query independent
from the cap calculation.

Fixes: 1eaf26db9590 ("drm/amd/pm: fix smu13 power limit default/cap calculation")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5419
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f45bbf0f62f266ed8422d84f347d75d5fca846a7)
Cc: stable@vger.kernel.org

drm/amdgpu: flush pending RCU callbacks on module unload

Call rcu_barrier() in module exit to wait for outstanding call_rcu() callbacks
before freeing module text, preventing late callback execution in freed memory.

BUG: unable to handle page fault for address: ffffffffc1d59c40
PGD 6a12067 P4D 6a12067 PUD 6a14067 PMD 13698b067 PTE 0
Oops: 0010 [#1] SMP NOPTI
RIP: 0010:0xffffffffc1d59c40
Code: Unable to access opcode bytes at RIP 0xffffffffc1d59c16.
RSP: 0018:ffffc900198c0f28 EFLAGS: 00010286
RAX: ffffffffc1d59c40 RBX: ffff897c7d6b61c0 RCX: ffff88826aff4590
RDX: ffff8884d8b35490 RSI: ffffc900198c0f30 RDI: ffff88812af67290
RBP: 000000000000000a (DONE segment entries) R08: 0000000000000000 R09: 0000000000000100
R10: 0000000000000000 R11: ffffffff82a06100 R12: ffff88811a4e3700
R13: 0000000000000000 R14: ffff897c7d6b6270 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff897c7d680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc1d59c16 CR3: 00000104a980a001 CR4: 0000000002770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? rcu_do_batch+0x163/0x450
? rcu_core+0x177/0x1c0
? __do_softirq+0xc1/0x280
? asm_call_irq_on_stack+0xf/0x20
</IRQ>
? do_softirq_own_stack+0x37/0x50
? irq_exit_rcu+0xc4/0x100
? sysvec_apic_timer_interrupt+0x36/0x80
? asm_sysvec_apic_timer_interrupt+0x12/0x20
? cpuidle_enter_state+0xd4/0x360
? cpuidle_enter+0x29/0x40
? cpuidle_idle_call+0x108/0x1a0
? do_idle+0x77/0xf0
? cpu_startup_entry+0x19/0x20
? secondary_startup_64_no_verify+0xbf/0xcb

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit feaa5039f6c12acc9aa934c2d45dcd251a12c69f)

drm/amdgpu: Fix AMDGPU_GTT_MAX_TRANSFER_SIZE for non-4K systems

Running RCCL unit tests on a system with a 64K PAGE_SIZE triggers
the following warning and causes the test to terminate on latest
upstream kernel:

WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1335 at
amdgpu_bo_release_notify+0x1bc/0x280 [amdgpu],
CPU#18: rccl-UnitTests/33151

Call trace:
amdgpu_bo_release_notify
ttm_bo_release
amdgpu_gem_object_free
drm_gem_object_free
amdgpu_bo_unref
amdgpu_bo_create
amdgpu_bo_create_user
amdgpu_gem_object_create
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu
kfd_ioctl_alloc_memory_of_gpu
kfd_ioctl
sys_ioctl

The warning is triggered because
amdgpu_ttm_next_clear_entity() returns NULL when a clear buffer
operation is requested. This happens because the GART window
allocation for the default_entity, clear_entity and move_entity
fails during initialization.

Commit [1] introduced separate GART windows for the
default_entity, clear_entity and move_entity of each SDMA
instance. Their sizes are derived from
AMDGPU_GTT_MAX_TRANSFER_SIZE, which is currently defined as 1024
pages. This implicitly assumes a 4K PAGE_SIZE, where 1024 pages
correspond to a 4MB transfer. On a 64K PAGE_SIZE system, however,
the same value expands to 64MB.

The default_entity and clear_entity each allocate one
AMDGPU_GTT_MAX_TRANSFER_SIZE GART window, while the move_entity
allocates two such windows. This results in 16MB of GART space
per SDMA instance on a 4K PAGE_SIZE system, but 256MB per SDMA
instance on a 64K PAGE_SIZE system.

On an MI210 system with five SDMA instances and a 512MB GART
aperture, the total GART space required becomes 1.25GB,
exceeding the available GART aperture. Consequently, GART window
allocation fails, amdgpu_ttm_next_clear_entity() returns NULL,
and the above warning is triggered.

Redefine AMDGPU_GTT_MAX_TRANSFER_SIZE in bytes instead of page
units. Where a page count is required, convert it using
PAGE_SHIFT. This preserves the existing 4MB transfer size across
all PAGE_SIZE configurations while keeping GART window
allocations within the available GART aperture.

[1] https://lore.kernel.org/all/20260408100327.1372-3-pierre-eric.pelloux-prayer@amd.com/#t

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5435
Fixes: 897ee11ec020 ("drm/amdgpu: create multiple clear/move ttm entities")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 27213b776a666d3030de5acc3cd75278197b0494)
Cc: stable@vger.kernel.org

drm/amdkfd: Use kvcalloc to allocate arrays

There were a few instances in kfd_chardev.c of kvzalloc being
used to allocate memory for an array.

Switch those to kvcalloc, which
- is the standard way of allocating a zero-initialized array
- does a check for the mul overflowing

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 60b048c93f7a3add39757ad65fe2bb6e58eeae23)
Cc: stable@vger.kernel.org

drm/amdgpu: add support for GC IP version 11.7.1

Initialize GC IP 11_7_1

Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a928d8d81ec5cdb5a8944d08136720811efad0f6)

drm/amdgpu: add support for GC IP version 11.7.0

Initialize GC IP 11_7_0

Signed-off-by: Granthali Vinodkumar Dhandar <granthali.vinodkumardhandar@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit cf591e67c095542a16475df293ec7bc9a118e4ee)

drm/amdgpu: add the doorbell index input for suspending userq

It requires inputing the doorbell offset for MES firmware preempts the
userq, and adding the doorbell offset also keep aliging with the
union MESAPI__SUSPEND in MES firmware.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bc434335ab3c096a33a9e88c7951b4ac574db458)
Cc: stable@vger.kernel.org

drm/amdgpu/mes12: set doorbell offset for suspending userq

Updating the union MESAPI__SUSPEND and union MESAPI__RESUME to
add the doorbell offset for suspending userq.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5b58a2c120063544869d0284d3b355527f9f04f5)
Cc: stable@vger.kernel.org

drm/amdgpu/mes11: set doorbell offset for suspending userq

Updating the union MESAPI__SUSPEND and union MESAPI__RESUME to
add the doorbell offset for suspending userq.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 30af09db33696f7e0de5c0c505cbb0cb92b6e25b)
Cc: stable@vger.kernel.org

drm/amdgpu: fix check in amdgpu_hmm_invalidate_gfx

For a short moment during alloc/free the userptr BO is not part of his VM,
so bo->vm_bo can be NULL.

Keep a reference to the VM root PD as parent of the userptr BO so that
we can always use that to wait for all submissions of the VM instead of
only the one involving the userptr BO.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 91250893cbaa ("drm/amdgpu: fix waiting for all submissions for userptrs")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5399
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 631849ff5d603841e74f19f4a5e30fe1f7d7cf30)
Cc: stable@vger.kernel.org

drm/amdgpu/jpeg: fix jpeg_v5_0_1_is_idle detection

jpeg_v5_0_1_is_idle() initializes ret to false and then accumulates ring
idle status using &=. Since false & condition always remains false, the
function can never report the JPEG block as idle.

Initialize ret to true so the function returns true only when all JPEG
rings report RB_JOB_DONE.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 680adf5faeeabb4585f7aeb53681719e2d6c2f41)
Cc: stable@vger.kernel.org

drm/amdgpu: Rename moved state to needs_update

This state can be reached via other means than physical moves, like PRT
bindings. Make the name match the actual purpose of the state.

Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1f7a795fb9f8186bd81ca9c4a80f75482db53c9e)

drm/amdgpu: Only set bo->moved when the BO was actually moved

The "moved" VM state is a bit unfortunately named, because BOs can end
up in this state without being physically moved. While we need to
invalidate every mapping when BOs are physically moved, in some other
cases like PRT binds/unbinds there is no need to refresh mappings except
those affected by the bind.

Full invalidation of all BO mappings manifested as severe regressions in
PRT bind performance, which this patch fixes. The offending patch is
4cdbba5a16aa ("drm/amdgpu: restructure VM state machine v4") in the
amd-staging-drm-next tree, although it has not yet propagated anywhere
else.

Fixes: 4cdbba5a16aa ("drm/amdgpu: restructure VM state machine v4")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5437
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0b2fa33b4235991a100dd799c891cf5c242aaed1)
Cc: stable@vger.kernel.org

drm/amd/display: guard against overflow in HDCP message dump

[Why]
mod_hdcp_dump_binary_message() computed target_size (a uint32_t) as roughly
byte_size * msg_size and gated the whole write on buf_size >= target_size. A
large msg_size can overflow target_size, wrapping it to a small value that
passes the check while the loop still writes byte_size * msg_size bytes
into buf. All current callers pass small constants so this is not reachable
today, but the unchecked arithmetic should be hardened.

[How]
Drop the overflow-prone target_size precomputation and instead bounds-check the
output position on every iteration, stopping once the next entry would not leave
room for the trailing terminator. This cannot overflow and, for oversized
messages, dumps as much as fits rather than printing nothing.

Fixes: 4c283fdac08a ("drm/amd/display: Add HDCP module")
Assisted-by: Copilot:claude-opus-4.8
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: George Zhang <george.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d0a775e5d70b376696245a14c09e3aa6dde0023a)
Cc: stable@vger.kernel.org

drm/amd/display: use kvzalloc to allocate struct dc

struct dc has grown large over time (most of it the two inlined
dc_scratch_space copies) and now sits close to the page allocator's 4 MiB
contiguous allocation limit. Its actual size is not fixed by the source
alone, it also depends on the compiler and the .config, so it can easily
cross 4 MiB, e.g. with a newer GCC or a config change.

dc_create() allocates it with kzalloc(). Once struct dc exceeds 4 MiB the
request is rounded up to order 11 (8 MiB), which is above MAX_PAGE_ORDER,
so the page allocator warns and returns NULL. dc_create() then fails, DM
init fails and amdgpu probe aborts with -EINVAL:

  WARNING: mm/page_alloc.c:5197 at __alloc_frozen_pages_noprof+0x2f9/0x380
   dc_create+0x38/0x660 [amdgpu]
   amdgpu_dm_init+0x2d9/0x510 [amdgpu]
   dm_hw_init+0x1b/0x90 [amdgpu]
   amdgpu_device_init.cold+0x150d/0x1e13 [amdgpu]
   amdgpu_driver_load_kms+0x19/0x80 [amdgpu]
   amdgpu_pci_probe+0x1e2/0x4c0 [amdgpu]

dc_create() then returns NULL and DM init fails, which aborts the whole
GPU init and makes amdgpu probe fail with -EINVAL ("hw_init of IP block
<dm> failed -22"), leaving the display unusable. The subsequent
amdgpu_irq_put() warnings during teardown are just fallout of unwinding
a half-initialized device.

struct dc is a software-only bookkeeping structure that is never handed
to hardware DMA and is only ever kept as an opaque pointer, so it does
not require physically contiguous memory. Allocate it with kvzalloc()
(and free it with kvfree()) so that the allocator can fall back to
vmalloc() when a contiguous allocation of that size is not available,
which also avoids the MAX_PAGE_ORDER warning entirely.

v2:
- Rebase to amd-staging-drm-next.

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5406
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Honglei Huang <honghuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 991e0516a8072f2292681c6ae98a924ab0e32575)
Cc: stable@vger.kernel.org

drm/amdgpu: invoke pm_genpd_remove() before freeing genpd

Call pm_genpd_remove() to unregister from global list prior to releasing
acp_genpd memory, and clear the pointer after free.

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit cd8650d7a91ee8b768e202354672553faa5cc1f2)
Cc: stable@vger.kernel.org

drm/amdgpu: fix resource leak on ACP reset timeout

When ACP soft reset poll times out, original code returns early without cleanup,
leaking MFD child devices, genpd links and all ACP heap allocations.

Replace direct early return with goto out to force run all cleanup logic
regardless of reset success, preserve timeout error code for caller.

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 98073e4328d7a8d75d03696ab27f6de70ef1aeda)
Cc: stable@vger.kernel.org

drm/amdgpu: reject mapping a reserved doorbell to a new queue

When creating an user-queue, the user space
provides a doorbell BO handle and an offset within
the bo to obtain a doorbell.

However current implementation using xa_store_irq()
to store a doorbell, which allows a later queue created
with the same BO and offset parameters to overwrite an
existing queue and doorbell mapping.

This can cause problems like misrouting fence IRQ
processing to a wrong queue, and mislead the cleanup
process of one queue erasing the mapping of another queue.

This commit fixes this issue by replacing xa_store_irq with
xa_insert_irq, which rejects mapping a reserved
doorbell to a newly created queue

Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6244eae22966350db52faf9c1369d3b2ffc5de4e)
Cc: stable@vger.kernel.org

drm/amd/display: Handle struct drm_plane_state.ignore_damage_clips

The mode-setting pipeline can disabled damage clippings for a commit
by setting ignore_damage_clips in struct drm_plane_state. The commit
will then do a full display update.

Test the flag in DCN code and do a full update in DCN code if it has
been set.

Commit 35ed38d58257 ("drm: Allow drivers to indicate the damage helpers
to ignore damage clips") introduced ignore_damage_clips to selectively
ignore damage clipping in certain framebuffer changes. This driver does
not do that, but DRM's damage iterator will soon rely on the flag.
Therefore supporting it here as well make sense for consistency.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 35ed38d58257 ("drm: Allow drivers to indicate the damage helpers to ignore damage clips")
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Zack Rusin <zackr@vmware.com>
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a24019f6480fad5c077b5956eed942c8960323d6)
Cc: <stable@vger.kernel.org> # v6.8+

drm/amdgpu/gfx12: fix EOP interrupt routing for KQ and userq

Try KQ by ring_id first (KCQ and UQ never share a HW slot); fall back
to amdgpu_userq_process_fence_irq() on miss, since KCQ EOPs were
misrouted into the userq fence path when enable_mes is true.

Require a strict (me,pipe,queue) match in the gfx case, then userq gfx
EOPs fall through to amdgpu_userq_process_fence_irq().

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6c1f4f7ff08448e0e18cd7fc4e59d6c96a36f25d)
Cc: stable@vger.kernel.org

drm/amdgpu/gfx11: fix EOP interrupt routing for KQ and userq

Try KQ by ring_id first (KCQ and UQ never share a HW slot); fall back
to amdgpu_userq_process_fence_irq() on miss, since KQ EOPs were
misrouted into the userq fence path when enable_mes is true.

Require a strict (me,pipe,queue) match in the gfx case, then userq gfx
EOPs fall through to amdgpu_userq_process_fence_irq().

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 88e589cc811ba907209a426c426c469bcb4bb894)
Cc: stable@vger.kernel.org

drm/amdkfd: clamp v9 CRIU control stack checkpoint copy to BO size

CRIU checkpoint copies the MQD control stack using cp_hqd_cntl_stack_size
from hardware without bounding it to the allocated BO region. If the HW
field is larger than the queue's control stack allocation, memcpy reads
past the BO into adjacent GTT memory and can leak kernel data to userspace.

Store the page-aligned control stack BO size in mqd_manager and clamp
checkpoint copies and reported checkpoint sizes to
min(cp_hqd_cntl_stack_size, mm->ctl_stack_size). Apply the same bound
for multi-XCC v9.4.3 checkpoint layout.

Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com>
Reviewed-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6c2abd0ec09e86c6323010673766f76050e28aa3)
Cc: stable@vger.kernel.org

drm/amdgpu: fix aperture mapping leak

amdgpu_pci_remove() calls drm_dev_unplug() before invoking the driver
fini routines. This causes drm_dev_enter() in amdgpu_ttm_fini() to
always return false, so iounmap(aper_base_kaddr) never runs on normal
driver unload, leaving an orphaned entry in the x86 PAT interval tree.

On connected_to_cpu hardware, the aperture is mapped write-back (WB) via
ioremap_cache(). On reload, IP discovery calls memremap(..., MEMREMAP_WC)
over the same range. The WC vs WB conflict causes:

  ioremap error for 0x..., requested 0x1, got 0x0
  amdgpu: discovery failed: -2

Fix by switching to devres-managed mappings so cleanup is guaranteed
regardless of drm_dev_enter() state:

- connected_to_cpu path: devm_memremap(MEMREMAP_WB). For
  IORESOURCE_SYSTEM_RAM ranges this takes the try_ram_remap() shortcut,
  returning __va(offset) from the existing kernel direct map. No new
  ioremap VA or PAT entry is created, so there is nothing to orphan.

- dGPU path: devm_ioremap_wc() registers iounmap() as a devres action,
  guaranteeing cleanup at device_del() time.

Also remove iounmap(aper_base_kaddr) from amdgpu_device_unmap_mmio()
since the mapping is now devres-owned.

v2: Remove redundant x86_64 guard (Lijo)

Fixes: 9d0af8b4def0 ("drm/amdgpu: pre-map device buffer as cached for A+A config")
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d871e99879cb5fd1fa798b006b4888887e63a17a)
Cc: stable@vger.kernel.org

drm/amd/display: avoid large stack allocation in commit_planes_do_stream_update_sequence

The function has two arrays on the stack to hold temporary dsc_optc_config
and dsc_config objects. The combination blows through common stack frame
warning limits in combination with the other local variables:

drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:4070:22: error: stack frame size (1352) exceeds limit
(1280) in 'commit_planes_do_stream_update_sequence' [-Werror,-Wframe-larger-than]

Since neither array is initialized or used outside of the
add_link_update_dsc_config_sequence() function, there is no actual
need to keep each element around.

Replace the arrays with a single instance each to reduce the stack usage
to less than half.

Fixes: 9f49d3cd7e71 ("drm/amd/display: Implement block sequencing infrastructure for modular hardware operations.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Acked-by: George Zhang <george.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9e0896fa6f7dbe9ca3dbbd3b593fa91670f4820b)
Cc: stable@vger.kernel.org

drm/amd/display: Remove DCCG registers not needed in DCN42

[why]

Some resources that exist in the DCN block are not needed and shouldn't
be used.

[how]

Remove defines from register lists.

Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com>
Signed-off-by: George Zhang <george.zhang@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit dac8aa629a45e34027444f74d3b86b6f104b024c)

drm/amd/display: Fix DCN42 null registers & register masks

[why]

The register lists used on DCN42 variants are different. Some reused
codepaths are trying to access registers not used.

[how]

Add DISPCLK_FREQ_CHANGECNTL, HUBPREQ_DEBUG, and HDMISTREAMCLK_CNTL to
the register lists.

Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com>
Signed-off-by: George Zhang <george.zhang@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 64142f9d51aff32f4130d916cb8f044a072ad27d)

drm/amdkfd: Guard m->cp_hqd_eop_control setting by q->eop_ring_buffer_size

To avoid wraparound if the value is 0.

Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c0cae35661868af207077a4306bc42c7c972947c)
Cc: stable@vger.kernel.org

drm/amdgpu/vce: fix integer overflow in image size

Fix a security vulnerability where malicious VCE command streams
with oversized dimensions (e.g. 65536×65536) cause 32-bit integer
overflow, wrapping the calculated buffer size to 0. This bypasses
validation and allows GPU firmware to perform out-of-bound memory
access.

The fix uses 64-bit arithmetic to detect overflow and rejects
invalid dimensions before they reach the hardware.

V2: remove redundant check
V3: modify max height value
V4: remove size64

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit cbe408dba581755ad1279a487ec786d8927d778d)
Cc: stable@vger.kernel.org

drm/amdgpu/vcn4: avoid rereading IB param length

Reuse the parameter length returned by
vcn_v4_0_enc_find_ib_param() instead of rereading it from
the IB.

This avoids a potential TOCTOU issue if the IB contents
change between reads.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit dbb02b4755f8c1f3773263f2d779872c1c0c073a)
Cc: stable@vger.kernel.org

drm/amdgpu: fix division by zero with invalid uvd dimensions

When width or height is less than 16, width_in_mb or height_in_mb
becomes 0, leading to fs_in_mb being 0. This causes a division by
zero when calculating num_dpb_buffer in H264 and H264 Perf decode
paths.

Add validation to reject frames with width < 16 or height < 16
before performing any calculations that depend on these values.

V2: Format change - move up all vaiable definitions.
V3: Use warn_once to avoid spam.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3e41d26c70b0a459d041cc19482a226c4b7423cb)
Cc: stable@vger.kernel.org

drm/amd/display: set MSA MISC1 bit 6 when using VSC SDP for DCE 11.x

When BT.2020 colorimetry is selected, the driver sends information using
VSC SDP but does not set "ignore MSA colorimetry" bit on older GPUs with
DCE-based IPs. This causes certain sinks to prefer colorimetry
information in DP MSA, resulting in terrible color rendering ("dull"
colors) when HDR is enabled.

This commit wires up the MISC1 bit 6 for GPUs with DCE 11.x based IPs to
correctly configure sinks to ignore colorimetry information in MSA,
resolving the color rendering issue.

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4849
Assisted-by: oh-my-pi:GPT-5.5
Signed-off-by: Leorize <leorize+oss@disroot.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 323a09e56c1d549ce47d4f110de77b0051b4a8bf)
Cc: stable@vger.kernel.org

drm/amd/pm: fix amdgpu_pm_info power display units

amdgpu_pm_info displayed power sensor readings with the wrong fractional unit.
It treated the low byte of the raw sensor value as the decimal part of watts,
while that field represents milliwatts in the decoded value. As a result,
debugfs could report misleading SoC power when the remainder was not already
a two-digit centiwatt value.

Example with query = 0x00000354:

  raw field        value
  ---------------------
  query >> 8       3 W
  query & 0xff     84 mW
  decoded power    3084 mW

  output           value
  ---------------------
  before           3.84 W
  after            3.08 W

Fixes: f0b8f65b4825 ("drm/amd/amdgpu: fix the GPU power print error in pm info")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 01992b121fb652c753d37e0c1427a2d1a557d2b1)
Cc: stable@vger.kernel.org

drm/amd/pm: make pp_features read-only when scpm is enabled

SCPM owns power feature control when enabled.

Make pp_features read-only during sysfs setup by clearing its write bits
and store callback.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6a5786e191fdce36c5db170e5209cf609e8f0087)
Cc: stable@vger.kernel.org

drm/amdgpu/sdma7.1: replace BUG_ON() with WARN_ON()

There's no need to crash the kernel for these cases.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c4f230b51cf2d3e7e8b1c800331f3dbed2a9e3f5)
Cc: stable@vger.kernel.org

drm/amdgpu/sdma7.0: replace BUG_ON() with WARN_ON()

There's no need to crash the kernel for these cases.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9723a8bed3aa251a26bee4583bac9d8fb064dd44)
Cc: stable@vger.kernel.org

drm/amdgpu/sdma6.0: replace BUG_ON() with WARN_ON()

There's no need to crash the kernel for these cases.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c17a508a7d652da3728f8bbc481bfffe96d65a87)
Cc: stable@vger.kernel.org

drm/amdgpu/sdma5.2: replace BUG_ON() with WARN_ON()

There's no need to crash the kernel for these cases.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ae658afc7f47f6147371ec42cc6b1a793dfdb5af)
Cc: stable@vger.kernel.org

drm/amdgpu/sdma5.0: replace BUG_ON() with WARN_ON()

There's no need to crash the kernel for these cases.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8d144a0eb09537055841af48c9e7c2d4cd48e84d)
Cc: stable@vger.kernel.org

drm/amdgpu/sdma4.4.2: replace BUG_ON() with WARN_ON()

There's no need to crash the kernel for these cases.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fa4f86a148271e325e95287630a3a15a9cd35fdc)
Cc: stable@vger.kernel.org

drm/amdgpu/gfx12.1: replace BUG_ON() with WARN_ON()

There's no need to crash the kernel for these cases.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e4d99e04b2e9b13b97d3b17804c735f62689db23)
Cc: stable@vger.kernel.org

drm/amdgpu/gfx12: replace BUG_ON() with WARN_ON()

There's no need to crash the kernel for these cases.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f952076f76d62f783e8ba4995a7c400d39354ccf)
Cc: stable@vger.kernel.org

drm/amdgpu/gfx11: replace BUG_ON() with WARN_ON()

There's no need to crash the kernel for these cases.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit daa62107452d2451787c4248ca38fa2d1a0cbefd)
Cc: stable@vger.kernel.org

drm/amdgpu/gfx10: replace BUG_ON() with WARN_ON()

There's no need to crash the kernel for these cases.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ac6f00beb658239bced4aaed9efbb04a35348d48)
Cc: stable@vger.kernel.org

drm/amdgpu/gfx9.4.3: replace BUG_ON() with WARN_ON()

There's no need to crash the kernel for these cases.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5676593d08998d7a6d9e2d51d6b54b3820e3755c)
Cc: stable@vger.kernel.org

drm/amdgpu/gfx9: replace BUG_ON() with WARN_ON()

There's no need to crash the kernel for these cases.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b71604f8685b0eba07866f4e8dc30f93e1931054)
Cc: stable@vger.kernel.org

drm/amdgpu/gfx8: drop unecessary BUG_ON()

There's no need to crash the kernel for this case.

Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4d7c25208ca612b754f3bf39e9f16e725b828891)
Cc: stable@vger.kernel.org

drm/amdgpu/soc24: reset dGPU if suspend got aborted

For SOC24 ASICs (RDNA4 / Navi 4x dGPUs) re-enabling PM features fails if an
S3 suspend got aborted, the same issue already handled for SOC21 and SOC15:

  commit df3c7dc5c58b ("drm/amdgpu: Reset dGPU if suspend got aborted")
  commit 38e8ca3e4b6d ("amdgpu/soc15: enable asic reset for dGPU in case of suspend abort")

The aborted resume fails with:

  amdgpu: SMU: No response msg_reg: 6 resp_reg: 0
  amdgpu: Failed to enable requested dpm features!
  amdgpu: resume of IP block <smu> failed -62

Apply the same workaround for soc24: detect the aborted-suspend state at
resume via the sign-of-life register and reset the device before re-init.

This is a workaround till a proper solution is finalized.

Fixes: 98b912c50e44 ("drm/amdgpu: Add soc24 common ip block (v2)")
Signed-off-by: Jakob Linke <jakob@linke.cx>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fed5bdbfe1d4a19a26c70f7fc58017dc88be1c18)
Cc: stable@vger.kernel.org

Merge tag 'nf-26-06-30' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter: updates for net

The following patchset contains Netfilter fixes for *net*.
Due to bug volume the plan is to make a second *net* pull request
this Friday.

1) Zero nf_conntrack_expect at allocation to prevent uninitialized data
leaks to userspace. Add missing exp->dir initialization.

2) Prevent out-of-bounds writes in nft_set_pipapo caused by inconsistent
clones during allocation failures.  Fail operations if the clone enters an
error state.  This was a day-0 bug.

3) Fix use-after-free race between ipset dump and array resizing. Protect
array pointer access with rcu_read_lock().  From Xiang Mei. Bug existed
since v4.20.

4) Validate skb_dst() exists before access in nf_conntrack_sip.
This Prevent crash when called from tc ingress or openvswitch.
From Pablo Neira Ayuso.  Bug added in 4.3 when ovs gained support
for conntrack helpers.

5) Cap the maximum number of expectations to NF_CT_EXPECT_MAX_CNT during
userspace helper policy updates.  Also from Pablo.

6) Prevent NULL pointer dereference in nft_fib on netdev egress hooks. Add
nft_fib_netdev_validate() to restrict fib expressions to appropriate
netdev hooks. Restrict nft_fib_validate() to IPv4, IPv6, and INET
protocols.  From Theodor Arsenij Larionov-Trichkine.
Bug was exposed in v5.16 when egress hooks got added.

7) Restrict nfnetlink_queue writes to network headers. Validate IP/IPv6
header length and disable extension headers or IP option modifications.
Disable bridge modification for now, its unlikely anyone is using this.

8) Restrict arbitrary writes to link-layer and network headers in nftables.
Prevent link-layer modifications from spilling into network headers.
Prevent writes to IP version and length fields.

9) Restrict L3 checksum update offset to IPv4. Else csum offset can be
used to munge arbitrary header offsets, rendering the previous change moot.

These three patches are follow-ups to a 7.1 change that disabled
header rewrite ability in unprivileged network namespaces.
unprivileged netns support is not yet enabled again here.

netfilter pull request nf-26-06-30

* tag 'nf-26-06-30' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nftables: restrict checkum update offset
  netfilter: nftables: restrict linklayer and network header writes
  netfilter: nfnetlink_queue: restrict writes to network header
  netfilter: nft_fib: reject fib expression on the netdev egress hook
  netfilter: nfnetlink_cthelper: cap to maximum number of expectation per master
  netfilter: nf_conntrack_sip: validate skb_dst() before accessing it
  netfilter: ipset: fix race between dump and ip_set_list resize
  netfilter: nft_set_pipapo: don't leak bad clone into future transaction
  netfilter: nf_conntrack_expect: zero at allocation time
====================

Link: https://patch.msgid.link/20260630045243.2657-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

drm/xe/rtp: Add RING_FORCE_TO_NONPRIV_DENY to OA whitelists

Unconditionally whitelisting OA registers is a security violation. Set
RING_FORCE_TO_NONPRIV_DENY bit in OA nonpriv slots, so that OA registers
don't get whitelisted by default after probe, gt reset, resume and engine
reset.

Fixes: 828a8eaf37c3 ("drm/xe/oa: Add MMIO trigger support")
Cc: stable@vger.kernel.org # v6.12+
Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patch.msgid.link/20260615224227.34880-2-ashutosh.dixit@intel.com
(cherry picked from commit 90511bdcfda97211c01f1d945d4ea616578d8fca)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Remove redundant exec_queue_suspended() check in submit_exec_queue()

There already has a check for exec_queue_suspended(q) that returns early
if suspended.

Fixes: 65280af331aa ("drm/xe/multi_queue: skip submit when primary queue is suspended")
Signed-off-by: Lu Yao <yaolu@kylinos.cn>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260617012516.19930-1-yaolu@kylinos.cn
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 173202a5a3a9e6590194ce0f5880d1529a71ade7)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/pt: Fix NULL pointer dereference in xe_pt_zap_ptes_entry()

The page-table walk framework may pass a NULL *child pointer for
unpopulated entries. xe_pt_zap_ptes_entry() called container_of(*child)
before checking for NULL, then dereferenced the result, causing a crash.

Move the container_of() call after a NULL guard, so the function returns
early instead of proceeding with an invalid pointer. XE_WARN_ON is kept
to help root cause the issue, but we now bail instead of crashing the
driver.

v2: Comment that triggering XE_WARN_ON is unexpected behavior (Matt Brost)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20260616081756.286918-1-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
(cherry picked from commit b9297d19d9df5d4b6c994648570c5dcd1cac68ff)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: wedge from the timeout handler only after releasing the queue

A kernel job that exhausts its recovery attempts called
xe_device_declare_wedged() directly from guc_exec_queue_timedout_job(),
while the handler still owned the timed-out job and the queue scheduler
(sched = &q->guc->sched, stopped at the top of the handler).

In the default wedged mode (XE_WEDGED_MODE_UPON_CRITICAL_ERROR),
xe_device_declare_wedged() takes the destructive path in
xe_guc_submit_wedge(): guc_submit_reset_prepare(), xe_guc_submit_stop()
- which calls guc_exec_queue_stop() on every queue, including this one -
softreset and pause-abort. That tears submission down, signals the
in-flight fences and restarts the schedulers. This is the correct
behaviour when the wedge originates outside the TDR, but not when the
TDR itself triggers it: every queue should be torn down except the one
the TDR is currently operating on, which it still owns.

Control then returned to the handler, which kept using the now stale job
and scheduler:

  xe_sched_job_set_error(job, err);
  drm_sched_for_each_pending_job(tmp_job, &sched->base, NULL)
          xe_sched_job_set_error(to_xe_sched_job(tmp_job), -ECANCELED);

drm_sched_for_each_pending_job() warns because the scheduler is no
longer stopped (WARN_ON(!drm_sched_is_stopped())) and the iteration then
dereferences a freed job, faulting on the slab poison:

  Oops: general protection fault ... 0x6b6b6b6b6b6b6c3b
  RIP: guc_exec_queue_timedout_job+...

Defer the wedge until the handler has finished operating on the queue,
right before returning DRM_GPU_SCHED_STAT_NO_HANG, so the teardown no
longer races with this handler's use of @q.

Fixes: 770031ec2312 ("drm/xe: fix job timeout recovery for unstarted jobs and kernel queues")
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Assisted-by: GitHub-Copilot:claude-opus-4.8
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260612162414.287971-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit a889e9b06bfdb375fc88b3b2a4b143f621f930c6)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

ASoC: rsnd: adg: make rsnd_adg_clk_control() idempotent

rsnd_adg_clk_control() is asymmetric on the disable path: the clkin
clocks are guarded by clkin_rate[], but the "adg" clock is disabled
unconditionally. If an enable attempt fails (for example a clkin
failing to turn on during resume), the error path correctly rolls
everything back, but rsnd_resume() ignores the return value, so the
following system suspend calls rsnd_adg_clk_disable() again and
underflows the "adg" clock enable count:

  adg_0_clks1 already disabled
  WARNING: drivers/clk/clk.c:1188 clk_core_disable+0xa4/0xac
  Call trace:
   clk_core_disable+0xa4/0xac (P)
   clk_disable+0x30/0x4c
   rsnd_adg_clk_control+0x9c/0x2cc
   rsnd_suspend+0x20/0x74
   device_suspend+0x140/0x3ec
   dpm_suspend+0x168/0x270

Track the enable state explicitly and bail out of redundant
enable/disable calls, mirroring what is already done for the per-SSI
clock prepare state. A failed enable leaves the state as disabled, so
the next suspend becomes a no-op and the next resume retries cleanly.

Fixes: 47899d53f86f ("ASoC: rsnd: adg: Add per-SSI ADG and SSIF supply clock management")
Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/20260610164704.2211321-1-john.madieu.xa@bp.renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>

pkey: Move keytype check from pkey api to handler

The PKEY_VERIFYPROTK ioctl takes data from user-space and verifies the
contained protected key. While checking the integrity of the ioctl
request structure is the responsibility of the generic pkey_api code,
the verification of the contained protected key is the responsibility
of the pkey handler.

The keytype verification (based on the calculated bitsize of the key)
is part of the protected key verification and therefore the
responsibility of the pkey handler (which already verifies
it). Therefore the keytype verification is removed from the generic
pkey_api code.

As the calculation of the key bitsize is currently wrong, the removal
of the keytype check in pkey_api also removes this wrong
calculation. For this reason, the commit is flagged with the Fixes:
tag.

Cc: stable@kernel.org # 6.12+
Fixes: 8fcc231ce3be ("s390/pkey: Introduce pkey base with handler registry and handler modules")
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

Merge patch series "iomap: consolidate bio submission"

Christoph Hellwig <hch@lst.de> says:

This patch changes how iomap submits bios for reads.  The old behavior
to build up bios across iomap was already considered problematic for
a while, but we now ran into a erofs bug because of it, so it's time
to finally fix it.

* patches from https://patch.msgid.link/20260629121750.3392300-2-hch@lst.de:
  iomap: submit read bio after each extent
  fuse: call fuse_send_readpages explicitly from fuse_readahead
  iomap: consolidate bio submission

Link: https://patch.msgid.link/20260629121750.3392300-2-hch@lst.de
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

iomap: submit read bio after each extent

Currently the iomap buffered read path tries to build up read context
(i.e. bios for the typical block based case) over multiple iomaps as
long as the sector matches.  This does not take into account files
that can map to multiple different devices.  While this could be fixed
by a bdev check in iomap_bio_read_folio_range, the building up of I/O
over iomaps actually was a problem for the not yet merged ext2 iomap
port, as that does want to send out I/O at the end of an indirect
block mapped range.

So instead of adding more checks move over to a model where a bio only
spans a single iomap.  Change ->submit_read to be called after each
iteration so that the bio based users submit the bio after each iomap.
Fuse is unchanged because the previous commit stopped using ->submit_read
for it.

Fixes: dfeab2e95a75 ("erofs: add multiple device support")
Reported-by: Kelu Ye <yekelu1@huawei.com>
Reported-by: Yifan Zhao <zhaoyifan28@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260629121750.3392300-4-hch@lst.de
Tested-by: Yifan Zhao <zhaoyifan28@huawei.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

fuse: call fuse_send_readpages explicitly from fuse_readahead

Move the call to fuse_send_readpages from the iomap ->submit_read method
to the fuse readahead implementation.

fuse_read_folio() does not need to call fuse_send_readpages() because it
always does reads synchronously (the iomap->submit_read method for this
was a no-op since data->ia is always NULL for fuse_read_folio()).

This prepares for an iomap fix that will call ->submit_read after each
iomap.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260629121750.3392300-3-hch@lst.de
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

iomap: consolidate bio submission

Add a iomap_bio_submit_read_endio helper factored out of
iomap_bio_submit_read to that all ->submit_read implementations for
iomap_read_ops that use iomap_bio_read_folio_range can shared the
logic.

Right now that logic is mostly trivial, but already has a bug for XFS
because the XFS version is too trivial:  file system integrity validation
needs a workqueue context and thus can't happen from the default iomap
bi_end_io I/O handler.  Unfortunately the iomap refactoring just before
fs integrity landed moved code around here and the call go misplaced,
meaning it never got called.  The PI information still is verified by
the block layer, but the offloading is less efficient (and the future
userspace interface can't get at it).

Fixes: 0b10a370529c ("iomap: support T10 protection information")
Cc: stable@vger.kernel.org # v7.1
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260629121750.3392300-2-hch@lst.de
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

fhandle: reject detached mounts in capable_wrt_mount()

The recent fhandle RCU fix moved the mount namespace capability check
into capable_wrt_mount(), so a non-NULL mnt_namespace survives the
ns_capable() dereference. The helper still assumes the later
READ_ONCE(mount->mnt_ns) must be non-NULL because may_decode_fh()
checked is_mounted() first.

That assumption is not stable. A detached mount from
open_tree(..., OPEN_TREE_CLONE) can be dissolved on fput while
open_by_handle_at() is between those checks, and umount_tree() can
clear mount->mnt_ns. If the helper observes NULL, it dereferences
mnt_ns->user_ns and panics.

Return false when the RCU read observes a detached mount. This keeps
the relaxed permission path conservative: a mount no longer attached
to a namespace cannot authorize open_by_handle_at() access.

Fixes: 620c266f3949 ("fhandle: relax open_by_handle_at() permission checks")
Cc: stable@vger.kernel.org
Signed-off-by: David Lee <david.lee@trailofbits.com>
Assisted-by: LLM
Link: https://patch.msgid.link/20260701114438.24431-1-david.lee@trailofbits.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

Merge patch series "netfs: Miscellaneous fixes"

David Howells <dhowells@redhat.com> says:

Here are some miscellaneous fixes for netfslib.  I separated them from my
netfs-next branch.  Various Sashiko review comments[1][2][3] are addressed:

(1) Fix the decision whether to disallow write-streaming due to fscache
     use.

(2) Fix netfs_create_write_req() to better handle async cache object
     creation.

(3) Fix a double fput in cachefiles_create_tmpfile().

(4) Fix alteration of S_KERNEL_FILE inode flag without holding inode lock.

(5) Fix a potential mathematical underflow in
     iov_iter_extract_xarray_pages() and make it return 0 and free the
     array if no pages could be extracted.

(6) Fix a missing alloc failure check in iov_iter_extract_bvec_pages().

(7) Fix iov_iter_extract_user_pages() so that it doesn't leak the pages
     array if it returns an error or 0 (inasmuch as the leak is really in
     the callers).

(8) Remove an unused variable in kunit_iov_iter.c.

(9) Fix extract_xarray_to_sg() to calculate folio offset correctly.

(10) Fix a kdoc comment.

(11) Replace the netfs_inode::wb_lock mutex with a bit lock so that the
     lock can be passed to the collector so that multiple asynchronous
     writebacks won't interfere with each other.

(12) Fix writeback error handling to go through writeback_iter() so that it
     can clean up its state.

(13) Fix ENOMEM handling in writeback to clean up the current folio if we
     can't allocate a rolling buffer segment.

(14) Fix unbuffered/DIO write retry for filesystems that don't have a
     ->prepare_write() method.

[1] https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
[2] https://sashiko.dev/#/patchset/20260616100821.2062304-1-dhowells%40redhat.com
[3] https://sashiko.dev/#/patchset/20260619140646.2633762-1-dhowells%40redhat.com
[4] https://sashiko.dev/#/patchset/20260624115737.2964520-1-dhowells%40redhat.com

* patches from https://patch.msgid.link/20260625140640.3116900-1-dhowells@redhat.com:
  netfs: Fix DIO write retry for filesystems without a ->prepare_write()
  netfs: Fix folio state after ENOMEM whilst under writeback iteration
  netfs: Fix writeback error handling
  netfs: Fix writethrough to use collection offload
  netfs: Replace wb_lock with a bit lock for asynchronicity
  netfs: Fix kdoc warning
  scatterlist: Fix offset in folio calc in extract_xarray_to_sg()
  iov_iter: Remove unused variable in kunit_iov_iter.c
  iov_iter: Fix a memory leak in iov_iter_extract_user_pages()
  iov_iter: Fix missing alloc fail check in iov_iter_extract_bvec_pages()
  iov_iter: Fix potential underflow in iov_iter_extract_xarray_pages()
  cachefiles: Fix file burial to take lock when unsetting S_KERNEL_FILE
  cachefiles: Fix double fput
  netfs: Fix netfs_create_write_req() to handle async cache object creation
  netfs: Fix decision whether to disallow write-streaming due to fscache use

Link: https://patch.msgid.link/20260625140640.3116900-1-dhowells@redhat.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

netfs: Fix DIO write retry for filesystems without a ->prepare_write()

Fix netfs_unbuffered_write() so that it doesn't re-issue a write twice when
the filesystem doesn't have a ->prepare_write(). The resetting of the
iterator and the call to netfs_reissue_write() should just be removed as
almost everything it does is done again when the loop it's in goes back to
the top.

It does, however, still need the IN_PROGRESS flag setting, so that (and the
stat inc) are moved out of the if-statement.

Further, the MADE_PROGRESS flags should be cleared and wreq->transferred
should be updated, so fix those too.

Reported-by: syzbot+3c74b1f0c372e98efc32@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3c74b1f0c372e98efc32
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-16-dhowells@redhat.com
cc: Paulo Alcantara <pc@manguebit.org>
cc: hongao <hongao@uniontech.com>
cc: ChenXiaoSong <chenxiaosong@chenxiaosong.com>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

netfs: Fix folio state after ENOMEM whilst under writeback iteration

Fix the state of the current folio when ENOMEM occurs during writeback
iteration. The folio needs to be redirtied and unlocked before the
terminal writeback_iter() is invoked.

Fixes: 06fa229ceb36 ("netfs: Abstract out a rolling folio buffer implementation")
Link: https://sashiko.dev/#/patchset/20260619140646.2633762-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-15-dhowells@redhat.com
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

netfs: Fix writeback error handling

Fix the error handling in writeback_iter() loop. If an error occurs,
writeback_iter() needs to be called again with *error set to the error so
that it can clean up iteration state. Further, the current folio needs
unlocking and redirtying.

Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Link: https://sashiko.dev/#/patchset/20260619140646.2633762-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-14-dhowells@redhat.com
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

netfs: Fix writethrough to use collection offload

Fix writethrough write to set NETFS_RREQ_OFFLOAD_COLLECTION on the request
so that collection is processed asynchronously rather than only right at
the end - and also so that asynchronous O_SYNC writes get collected at all.

Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Closes: https://sashiko.dev/#/patchset/20260616100821.2062304-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-13-dhowells@redhat.com
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

netfs: Replace wb_lock with a bit lock for asynchronicity

The netfs_inode::wb_lock mutex is used to prevent multiple simultaneous
writebacks from fighting each other (a writeback thread will write multiple
discontiguous regions within the same request). The mutex, however, only
serialises the issuing of subrequests; it doesn't serialise the collection
of results, and, in particular, the updating of file size information and
fscache populatedness data.

Unfortunately, the mutex cannot be held around the entire process as it has
to be unlocked in the same thread in which it is locked - and we don't want
to hold up the allocator whilst we complete the writeback.

Fix this by replacing the mutex with a bit flag and a list of lock waiters
so that the lock can be dropped in the collector thread after collection is
complete.

Link: https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-12-dhowells@redhat.com
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

netfs: Fix kdoc warning

Fix a kdoc warning due to a misnamed parameter in the description.

Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-11-dhowells@redhat.com
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

scatterlist: Fix offset in folio calc in extract_xarray_to_sg()

Fix the calculation of the offset in the folio being extracted in
extract_xarray_to_sg().

Note that in the near future, ITER_XARRAY should be removed.

Fixes: f5f82cd18732 ("Move netfs_extract_iter_to_sg() to lib/scatterlist.c")
Link: https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-10-dhowells@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christoph Hellwig <hch@infradead.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Mike Marshall <hubcap@omnibond.com>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

iov_iter: Remove unused variable in kunit_iov_iter.c

Remove the no longer used variable 'b' from iov_kunit_copy_to_bvec(). The
variable is initialised and incremented, but nothing now makes use of the
value.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-9-dhowells@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
cc: Ming Lei <ming.lei@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christoph Hellwig <hch@infradead.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

iov_iter: Fix a memory leak in iov_iter_extract_user_pages()

There's a potential memory leak in callers of iov_iter_extract_user_pages()
whereby if a pages array is allocated in function, it isn't freed before
returning of an error or 0.

Now, it's not a leak per se in iov_iter_extract_user_pages() as, if an
array is allocated, it's returned through *pages, so it's incumbent on the
caller to free it. However, not all callers do.

Fix this by freeing the table and clearing *pages before returning an error
or 0. Note that iov_iter_extract_pages() and its subfunctions are allowed
to return 0 without returning an array (for instance if the iterator count
is 0).

Fixes: 7d58fe731028 ("iov_iter: Add a function to extract a page list from an iterator")
Closes: https://sashiko.dev/#/patchset/20260616100821.2062304-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-8-dhowells@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christoph Hellwig <hch@infradead.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

iov_iter: Fix missing alloc fail check in iov_iter_extract_bvec_pages()

Fix iov_iter_extract_bvec_pages() to check if want_pages_array() fails and,
if so, return -ENOMEM appropriately.

Fixes: e4e535bff2bc ("iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages")
Link: https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-7-dhowells@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
cc: Ming Lei <ming.lei@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christoph Hellwig <hch@infradead.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

iov_iter: Fix potential underflow in iov_iter_extract_xarray_pages()

In iov_iter_extract_xarray_pages(), if no pages are extracted because
there's a hole (or something otherwise unextractable) in the xarray, then
the calculation of maxsize at the end can go wrong if the starting offset
is not zero.

Fix this by returning 0 in such a case and freeing the page array if
allocated here rather than being passed in.

Note that in the near future, ITER_XARRAY should be removed.

Fixes: 7d58fe731028 ("iov_iter: Add a function to extract a page list from an iterator")
Link: https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
Link: https://sashiko.dev/#/patchset/20260616100821.2062304-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-6-dhowells@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christoph Hellwig <hch@infradead.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Mike Marshall <hubcap@omnibond.com>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

cachefiles: Fix file burial to take lock when unsetting S_KERNEL_FILE

Fix cachefiles_bury_object() to lock the inode of the file being buried
whilst it unsets the S_KERNEL_FILE flag.

Fixes: 07a90e97400c ("cachefiles: Implement culling daemon commands")
Closes: https://sashiko.dev/#/patchset/20260616100821.2062304-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-5-dhowells@redhat.com
cc: Paulo Alcantara <pc@manguebit.org>
cc: NeilBrown <neil@brown.name>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

cachefiles: Fix double fput

Fix a double fput() in error handling in cachefiles_create_tmpfile().

Link: https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-4-dhowells@redhat.com
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

netfs: Fix netfs_create_write_req() to handle async cache object creation

netfs_create_write_req() will skip caching if the fscache cookie is
disabled, but this is a problem because async cache object creation might
not have got far enough yet that has been enabled - thereby causing the
call to fscache_begin_write_operation() to be skipped.

Fix this by removing the checks on the cookie and delegating this to
fscache_begin_write_operation().

Fixes: 7b589a9b45ae ("netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags")
Closes: https://sashiko.dev/#/patchset/20260624115737.2964520-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-3-dhowells@redhat.com
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

netfs: Fix decision whether to disallow write-streaming due to fscache use

netfs_perform_write() buffers data by writing it into the pagecache for
later writeback.  If the folio it wants to write to isn't present, it uses
"write streaming" in which is will store partial data in a non-uptodate,
but dirty folio.

However, when fscache is in use, this is a potential problem as writes to
the cache have to be aligned to the cache backend's DIO granularity, and so
netfs_perform_write() attempts to suppress write-streaming in such a case,
requiring the folio content to be fetched first unless the entire folio is
going to be overwritten.  This allows the content to be written to the
cache too.

Unfortunately, the test netfs_perform_write() uses isn't correct because it
doesn't take into account the fact that the object lookup is asynchronous
and farmed off to a work queue, so there's a short window in which the
cache is doing a lookup but the test fails because the answer is undefined.

This can be triggered by the generic/464 xfstest, and causes a warning to
be emitted in cachefiles (in code not yet upstream) because it sees a write
that doesn't have its bounds rounded out to DIO alignment.

Fix this by changing the condition to whether FSCACHE_COOKIE_IS_CACHING is
set on a cookie rather than whether the cookie is marked enabled.  Note
that this is really just a hint as to whether we allow write streaming or
not and no other aspects of the cookie or cache object are accessed.

Also apply the same fix to netfs_write_begin().

Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/20260625140640.3116900-2-dhowells@redhat.com
cc: Paulo Alcantara <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

exec: fix off-by-one in binfmt max rewrite depth comment

The loop in exec_binprm() permits depth values 0 through 5, up to 5
successive binfmt rewrites (setting bprm->interpreter) until the 6th
one would fail on depth > 5 and return -ELOOP. The comment claimed 4
levels, which was wrong. Adjusting the code to allow only 4 rewrites
would be breaking userland, so fix the comment and not the code.

Reproducer (a chain of shebanged scripts followed by an ELF binary):

    #!/bin/sh

    tmp=$(mktemp -d)
    echo $tmp
    cd $tmp

    mk () { echo $2 > $1; chmod +x $1; }

    for i in $(seq 4); do
     mk $i "#!$((i + 1))"
    done

    mk 5 '#!/bin/true'
    ./1 &&
    echo '5 binfmt rewrites OK (1 -> 2 -> 3 -> 4 -> 5 -> /bin/true)'

    mk 5 '#!6'
    mk 6 '#!/bin/true'
    ./1 ||
    echo '6 binfmt rewrites KO (1 -> 2 -> 3 -> 4 -> 5 -> 6 -> /bin/true)'

Signed-off-by: Alan Urmancheev <alan.urman@gmail.com>
Link: https://patch.msgid.link/20260623052322.74711-1-alan.urman@gmail.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

iomap: guard io_size EOF trim against concurrent truncate underflow

iomap: fix zero padding data issue in concurrent append writes
changed ioend accounting so that io_size tracks only valid data
within EOF.  This trims io_size when a writeback range extends
past end_pos:

    ioend->io_size += map_len;
    if (ioend->io_offset + ioend->io_size > end_pos)
        ioend->io_size = end_pos - ioend->io_offset;

However, if end_pos ends up below ioend->io_offset, the subtraction
becomes negative and is stored in size_t io_size, causing an unsigned
wrap to a huge value.  This can happen when writeback continues past
byte-level EOF up to a block-aligned range, or when a concurrent
truncate shrinks the file after end_pos was sampled in
iomap_writeback_handle_eof().

A wrapped io_size can mislead append detection and corrupt
completion-time size handling, since filesystem end_io paths consume
io_size for decisions such as on-disk EOF updates and unwritten/COW
completion ranges.

Fix this by clamping io_size to zero when EOF has moved to or before
the ioend start offset.  This preserves the original intent of trimming
io_size to valid in-EOF data while avoiding the underflow.

Fixes: 51d20d1dacbe ("iomap: fix zero padding data issue in concurrent append writes")
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Morduan Zang <zhangdandan@uniontech.com>
Link: https://patch.msgid.link/9E38E2659B47DC2A+20260624062622.337469-1-zhangdandan@uniontech.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>