git.ipfire.org Git - thirdparty/linux.git/log

mm/mglru: use folio_mark_accessed to replace folio_set_active

MGLRU gives high priority to folios mapped in page tables.  As a result,
folio_set_active() is invoked for all folios read during page faults.  In
practice, however, readahead can bring in many folios that are never
accessed via page tables.

A previous attempt by Lei Liu proposed introducing a separate LRU for
readahead[1] to make readahead pages easier to reclaim, but that approach
is likely over-engineered.

Before commit 4d5d14a01e2c ("mm/mglru: rework workingset protection"),
folios with PG_active were always placed in the youngest generation,
leading to over-protection and increased refaults.  After that commit,
PG_active folios are placed in the second youngest generation, which is
still too optimistic given the presence of readahead.  In contrast, the
classic active/inactive scheme is more conservative.

This patch switches to using folio_mark_accessed() and
begins prefaulted file folios from the second oldest
generation instead of active generations.
We should also adjust the following accordingly:
- WORKINGSET_ACTIVATE: aligned with setting active for refaulted workingset
  folios;
- lru_gen_folio_seq(): place (pre)faulted file folios into the second
oldest generation;
- promote second-scanned folios to workingset in
folio_check_references(): we now have to depend on
folio_lru_refs() > 1, since we previously relied on PG_referenced
being set during the first scan, but PG_referenced is now set
earlier.

On x86, running a kernel build inside a memcg with a 1GB memory
limit using 20 threads.

w/o patch:
real 1m50.764s
user 25m32.305s
sys 4m0.012s
pswpin: 1333245
pswpout: 4366443
pgpgin: 6962592
pgpgout: 17780712
swpout_zero: 1019603
swpin_zero: 14764
refault_file: 287794
refault_anon: 1347963

w/ patch:
real 1m48.879s
user 25m29.224s
sys 3m37.421s
pswpin: 568480
pswpout: 2322657
pgpgin: 4073416
pgpgout: 9613408
swpout_zero: 593275
swpin_zero: 9118
refault_file: 262505
refault_anon: 577550

active/inactive LRU:

real 1m49.928s
user 25m28.196s
sys 3m40.740s
pswpin: 463452
pswpout: 2309119
pgpgin: 4438856
pgpgout: 9568628
swpout_zero: 743704
swpin_zero: 7244
refault_file: 562555
refault_anon: 470694

Lance and Xueyuan made a huge contribution to this patch through testing.

Link: https://lore.kernel.org/20260526130938.66253-1-baohua@kernel.org
Link: https://lore.kernel.org/linux-mm/20250916072226.220426-1-liulei.rjpt@vivo.com/
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
Tested-by: Lance Yang <lance.yang@linux.dev>
Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Kairui Song <kasong@tencent.com>
Cc: Qi Zheng <qi.zheng@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: wangzicheng <wangzicheng@honor.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Lei Liu <liulei.rjpt@vivo.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kasan/test: only do kmalloc_double_kzfree for generic mode

kmalloc_double_kzfree() would corrupt kernel memory when the just freed
memory were allocated by another thread before the second call to
kfree_sensitive() and the new allocation tag happened to match the old
one.

This could not happen in GENERIC mode as it uses quarantine.

Link: https://lore.kernel.org/20260524031053.381776-1-wsw9603@163.com
Signed-off-by: Wang Wensheng <wsw9603@163.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: trace esz at first setup

DAMON traces effective size quota from the second update, only if a change
has been made by the update.  Tracing only changed updates was an
intentional decision to avoid unnecessary same value tracing.  Always
skipping the first value is just an unintended mistake.

The mistake makes the tracepoint based investigation incomplete, because
the first effective size quota is never traced.  It is not a big issue
when the 'consist' quota tuner is used, because it keeps changing the
quota in the usual setup.

However, when the 'temporal' tuner is used, the quota value is not changed
before the goal achievement status is completely changed.  For example, if
the DAMOS scheme is started with an under-achieved goal, the quota is set
to the maximum value, and kept the same value until the goal is achieved.
Because DAMON skips the first value, the user cannot know what effective
quota the current scheme is using.  Only after the goal is achieved, the
effective quota is changed to zero, and traced.

Unconditionally trace the initial quota value to fix this problem.

Note that the 'temporal' quota tuner was introduced by commit af738a6a00c1
("mm/damon/core: introduce DAMOS_QUOTA_GOAL_TUNER_TEMPORAL"), which was
added to 7.1-rc1.  But even with the 'consist' quota tuner, the tracing is
unintentionally incomplete.  Hence this commit marks the introduction of
the trace event as the broken commit.

Link: https://lore.kernel.org/20260520150311.80925-1-sj@kernel.org
Fixes: a86d695193bf ("mm/damon: add trace event for effective size quota")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.17.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Docs/admin-guide/mm/damon/usage: clarify current_value of quota goals

The sysfs interface for DAMON quota goals includes a `current_value` file.
This file is not updated by the kernel and only serves to receive user
input.

Clarify in the documentation that the kernel does not update
`current_value`, and that reading it only has meaning when `target_metric`
is set to `user_input`.

While at it, fix missing commas in the goal files list.

Link: https://lore.kernel.org/20260521202020.126500-3-maksym.shcherba@lnu.edu.ua
Signed-off-by: Maksym Shcherba <maksym.shcherba@lnu.edu.ua>
Reviewed-by: SeongJae Park <sj@kernel.org>
Assisted-by: Antigravity:Gemini-3.1-Pro
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon: fix missing parens in macro arguments

Patch series "mm/damon: fix macro arguments and clarify quota goals doc",
v2.

This patch (of 2):

The DAMON iterator macros do not wrap their pointer arguments with
parentheses. This can cause build failures when the argument is a complex
expression due to operator precedence issues.

Add missing parentheses around the arguments in the following macros
to prevent potential build failures:
- damon_for_each_region()
- damon_for_each_region_from()
- damon_for_each_region_safe()
- damos_for_each_quota_goal()

Link: https://lore.kernel.org/20260521202020.126500-1-maksym.shcherba@lnu.edu.ua
Link: https://lore.kernel.org/20260521202020.126500-2-maksym.shcherba@lnu.edu.ua
Signed-off-by: Maksym Shcherba <maksym.shcherba@lnu.edu.ua>
Reviewed-by: SeongJae Park <sj@kernel.org>
Assisted-by: Antigravity:Gemini-3.1-Pro
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/damon/sysfs.sh: test pause file existence

sysfs.sh DAMON selftest is not testing the existence of the 'pause' sysfs
file. Add the test.

Link: https://lore.kernel.org/20260522154026.80546-15-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/damon/sysfs.sh: test addr_unit file existence

sysfs.sh DAMON selftest is not testing the existence of addr_unit sysfs
file. Add the test.

Link: https://lore.kernel.org/20260522154026.80546-14-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/damon/sysfs.sh: test monitoring intervals goal dir

sysfs.sh DAMON selftest is not testing monitoring intervals goal
directory. Add the test.

Link: https://lore.kernel.org/20260522154026.80546-13-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/damon/sysfs.py: stop kdamonds before failing

When an assertion is failed, sysfs.py DAMON selftest immediately exits the
test program leaving the DAMON running behind.  Many of the following
tests need to start DAMON on their own.  But because DAMON that was
started by sysfs.py is still running, those start attempts fail, and the
tests are failed or skipped.  Update sysfs.py to stop DAMON before exiting
the test program due to the assertion failure.

Link: https://lore.kernel.org/20260522154026.80546-12-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/tests/core-kunit: add damon_set_regions() test cases

damon_set_regions() is one of the main DAMON kernel API functions that set
up the monitoring target memory region boundaries. Implement unit tests
for verifying its basic functionalities.

Link: https://lore.kernel.org/20260522154026.80546-11-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: remove damon_verify_nr_regions()

When CONFIG_DAMON_DEBUG_SANITY is enabled, damon_verify_nr_regions() is
called for each damon_nr_regions() invocation.  damon_veify_nr_regions()
iterates all regions.  damon_nr_regions() is called for each region in
kdamond_reset_aggregated() and damos_apply_scheme().  Hence it imposes
O(n**2) overhead where n is the number of regions.

Though the verification is enabled only under DAMON_DEBUG_SANITY, which is
not for production use cases, it could be too high overhead.  Meanwhile,
damon_verify_ctx() is doing the damon_nr_regions() test.  Because
damon_verify_ctx() is called for each kdamond_call(), the test coverage
from damon_verify_ctx() could be sufficient.  Remove damon_nr_regions()
verification.

Link: https://lore.kernel.org/20260522154026.80546-10-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: add kdamond_call() debug_sanity check

kdamond_call() is the place where DAMON API callers are allowed to access
the DAMON context's public internal state including the monitoring
results. Hence it is important to ensure it is called with the expected
DAMON context state. Do the check under DAMON_DEBUG_SANITY.

Link: https://lore.kernel.org/20260522154026.80546-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: hide damon_destroy_region()

damon_destroy_region() is being used by only DAMON core, but exposed to
DAMON API callers. Exposing something that is not really being used by
others will only increase the maintenance cost. Hide it.

Link: https://lore.kernel.org/20260522154026.80546-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: hide damon_insert_region()

damon_insert_region() is being used by only DAMON core, but exposed to
DAMON API callers. Exposing something that is not really being used by
others will only increase the maintenance cost. Hide it.

Link: https://lore.kernel.org/20260522154026.80546-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: hide damon_add_region()

damon_add_region() is being used by only DAMON core, but exposed to DAMON
API callers. Exposing something that is not really being used by others
will only increase the maintenance cost. Hide it.

Link: https://lore.kernel.org/20260522154026.80546-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/tests/vaddr-kunit: replace damon_add_region() with damon_set_regions()

DAMON virtual address operation set (vaddr) unit tests is using
damon_add_region() for setup of DAMON monitoring target region boundaries
setup. But, damon_set_regions() is designed for exactly the purpose. All
other DAMON API callers use the function for the purpose. Replace
damon_add_region() usage in the unit tests with damon_set_regions(), for
unifying the use case and reducing the maintenance cost.

Link: https://lore.kernel.org/20260522154026.80546-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

samples/damon/mtier: replace damon_add_region() with damon_set_regions()

mtier DAMON sample module and DAMON virtual address operation set (vaddr)
unit tests are using damon_add_region() for setup of DAMON monitoring
target region boundaries setup.  But, damon_set_regions() is designed for
exactly the purpose.  All other DAMON API callers use the function for the
purpose.  Replace damon_add_region() usage in mtier sample module with
damon_set_regions(), for unifying the use case and reducing the
maintenance cost.

Link: https://lore.kernel.org/20260522154026.80546-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: do not use region out of a loop in damon_set_regions()

damon_set_regions() assumes the DAMON region iterator is referencing the
last region after the region iteration loop is completed. The code is
indeed implemented in the way, but that is not a documented safe behavior.
Hence it is unreliable and difficult to read. Cleanup the code to avoid
the case.

No behavioral change is intended.

Link: https://lore.kernel.org/20260522154026.80546-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/damon/core: safely handle no region case in damon_set_regions()

Patch series "mm/damon: minor improvements for code readability and tests".

Implement minor improvements on code readability and tests for DAMON.

First seven patches are for DAMON code readability and resulting
maintenance.  Patches 1 and 2 make damon_set_regions() safer and easier to
read.  Patches 3 and 4 remove fragmented DAMON API use cases.  Patches 5-7
hides unused core functions that are unnecessarily exposed to API callers.

The following seven patches are for DAMON tests improvement.  Patches 8
and 9 adds and removes DAMON_DEBUG_SANITY verifications to ensure
reasonable test coverage without too high overhead.  Patch 10 adds a new
kunit test for damon_set_regions().  Patch 11 makes sysfs.py selftest more
gracefully finishes under test failures.  Patches 12-13 adds simple
sysfs.sh test cases for the monitoring intervals goal directory, the
addr_unit file and the pause file.

This patch (of 14):

damon_set_regions() calls damon_first_region() regardless of the number of
DAMON regions in a given DAMON target.  damon_first_region() internally
uses list_first_entry(), which clearly documents the list is expected to
be not empty.  Due to the internal implementation of the macro,
damon_set_regions() is safe for now.  But the internal implementation of
the macro can be changed in future.  Refactor the function to explicitly
and safely handle the empty region list case without depending on the
internal implementation.

No behavioral change is intended.

Link: https://lore.kernel.org/20260522154026.80546-1-sj@kernel.org
Link: https://lore.kernel.org/20260522154026.80546-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/vma: eliminate mmap_action->error_hook, introduce error_override

Rather than providing a hook, simplify things by providing the ability to
override mmap action errors. This allows us to more carefully validate
the value provided and thus ensure only a valid error code is specified,
and simplifies the interface.

This way, we eliminate all hooks but mmap_prepare and allow only mmap
actions to be specified (which core mm controls).

This significantly improves robustness and eliminates any unnecessary code
duplication in driver mmap hooks.

We also update the /dev/mem logic (the only user) to use
mmap_action->error_override instead.

Link: https://lore.kernel.org/55d13f7d016b827c459946d46a56105635be111c.1780397980.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/vma: remove mmap_action->success_hook

This hook was introduced to work around code that seemed to absolutely
require access to a VMA pointer upon mmap().

However, providing this hook leaves a backdoor to drivers getting access
to the very thing mmap_prepare eliminates - a pointer to the VMA.

Let's solve this contradiction by removing it. The key intended user was
hugetlb, however it seems that the best course now is to avoid allowing
all drivers the ability to work around mmap_prepare, and find a different
solution there.

Link: https://lore.kernel.org/f79434e6d30af6d92999be6b76e197f1847105fa.1780397980.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

drivers/char/mem: eliminate unnecessary use of success_hook

Patch series "remove mmap_action success, error hooks", v3.

The mmap_action->success_hook was a strange beast added to enable code
which appeared to absolutely require access to a VMA pointer to work
correctly.

Primarily this was for hugetlb, however a different approach will be taken
there, as clearly more work is required to figure out a sensible way of
converting hugetlb to use mmap_prepare.

The other user was the memory char driver, specifically /dev/zero which
has the unusual property of explicitly setting file-backed VMAs anonymous.

Providing the success hook was always foolish, as it allowed drivers a way
to workaround the restriction that they should not access a pointer to a
not-yet-correctly-initialised VMA - which defeats the purpose of the
mmap_prepare work.

We can achieve the same thing in memory char driver without needing the
success hook, so this series removes that, then removes the success hook
altogether.

The error hook is also unnecessary - the motivation for this was for
functions which need to override the error code when performing an mmap
action in order to avoid breaking userspace.

We can achieve this by just providing a field for the error code. Doing
this means we don't have to worry about the hook doing anything odd.

We also add a check to ensure the error code is in fact valid.

Again the memory char driver is the only current user of this, so this
series updates it to use that.

After this change mmap_action has no custom hooks at all, which seems
rather more cromulent than before.

This patch (of 3):

/dev/zero, uniquely, marks memory mapped there as anonymous. This is
currently achieved using the mmap_action->success_hook.

However this hook circumvents the abstraction of VMA initialisation so
it's preferable to do things a different way.

To achieve this, this patch firstly defaults the VMA descriptor's vm_ops
field to the dummy VMA operations, which is what file-backed VMAs default
this field to.

That way, we can detect whether a driver sets this field to NULL in order
to mark it anonymous.

We then introduce vma_desc_set_anonymous() to do this explicitly, and
invoke it in mmap_zero_prepare().

This way, any driver which does not explicitly set desc->vm_ops, retains
the dummy vm_ops as they would previously.

We also update set_vma_user_defined_fields() to make clear that we are
either setting vma->vm_ops to what is provided by the driver (or
defaulting to dummy_vm_ops if not set), or setting the VMA anonymous.

This lays the groundwork for removing the success hook.

Link: https://lore.kernel.org/cover.1780397980.git.ljs@kernel.org
Link: https://lore.kernel.org/010579cca6787cf7bb057ab1f7228978b10601c8.1780397980.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests/mm/split_huge_page_test.c: close fd on write error

When create_pagecache_thp_and_fd() write returns error on
/proc/sys/vm/dropcache, it just "goto err_out_unlink", which left fd still
open.

Use "goto err_out_close" to close the fd.

Link: https://lore.kernel.org/20260520020336.28914-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: "Liam R. Howlett" <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/page_alloc: fix defrag_mode for non-reclaimable allocations

When defrag_mode is enabled, ALLOC_NOFRAGMENT is enforced to prevent
migratetype fallbacks and keep pageblocks clean.  The allocator relies on
reclaim and compaction to free pages of the correct type before allowing
fallback as a last resort.

However, non-reclaimable allocations such as GFP_ATOMIC cannot invoke
direct reclaim or compaction.  With defrag_mode=1, these allocations hit
the !can_direct_reclaim bailout in __alloc_pages_slowpath() with
ALLOC_NOFRAGMENT still set, and fail without ever attempting a fallback.

This causes a large number of SLUB allocation failures for
skbuff_head_cache under network-heavy workloads, despite free memory being
available in other migratetype freelists.

We observed it on a few of the Meta workloads that adopted
defrag_mode=1.

For the service under load there were 85509 SLUB allocation failures
messages in dmesg within 2 hours.  All of them are GFP_ATOMIC
allocations for skbuff_head_cache, despite free pages being available
in other migratetype freelists (~13 GB free).

Since it is networking path from the practical point of view, this
means dropped packets, failed RPC requests, tail latency spikes and
overall service degradation.

Clear ALLOC_NOFRAGMENT and retry for allocations that request kswapd
reclaim but cannot do direct reclaim themselves (GFP_ATOMIC).  Purely
speculative allocations like GFP_TRANSHUGE_LIGHT that don't set
__GFP_KSWAPD_RECLAIM are left to fail, since they have reasonable
fallbacks and should not cause fragmentation.

Link: https://lore.kernel.org/20260520122228.201550-1-d@ilvokhin.com
Fixes: e3aa7df331bc ("mm: page_alloc: defrag_mode")
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

MAINTAINERS: add more files to PAGE CACHE section

Add include/linux/writeback.h and
include/trace/events/{filemap.h,readahead.h,writeback.h}.

Link: https://lore.kernel.org/20260520-page-cache-maintainers-v1-1-f93438d2186d@columbia.edu
Signed-off-by: Tal Zussman <tz2294@columbia.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Merge tag 'net-7.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter, wireless and Bluetooth.

  Current release - fix to a fix:

   - Bluetooth: MGMT: fix backward compatibility with bluetoothd
     which adds stray bytes to MGMT_OP_ADD_EXT_ADV_DATA

  Previous releases - regressions:

   - af_unix: fix inq_len update inaccuracy on partial read

   - eth: fec: fix pinctrl default state restore order on resume

   - wifi: iwlwifi:
       - mvm: don't support the reset handshake for old firmwares
       - pcie: simplify the resume flow if fast resume is not used,
         work around NIC access failures

  Previous releases - always broken:

   - Bluetooth: L2CAP: reject BR/EDR signaling packets over MTUsig

   - sctp: fix a couple of bugs in COOKIE_ECHO processing

   - sched: fix pedit partial COW leading to page cache corruption

   - wifi: nl80211: reject oversized EMA RNR lists

   - netfilter:
       - conntrack_irc: fix possible out-of-bounds read
       - bridge: make ebt_snat ARP rewrite writable

   - appletalk: zero-initialize aarp_entry to prevent heap info leak

   - ipv4: restrict IPOPT_SSRR and IPOPT_LSRR options

   - mptcp: fix number of bugs reported by AI scans and discovered
     during NVMe over MPTCP testing"

* tag 'net-7.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
  Reapply "bnxt_en: bring back rtnl_lock() in the bnxt_open() path"
  udp: clear skb->dev before running a sockmap verdict
  sctp: purge outqueue on stale COOKIE-ECHO handling
  bonding: annotate data-races arcound churn variables
  net/802/mrp: fix vector attribute parsing in mrp_pdu_parse_vecattr
  rtase: Avoid sleeping in get_stats64()
  ieee802154: 6lowpan: only accept IPv6 packets in lowpan_xmit()
  ipv6: mcast: Fix use-after-free when processing MLD queries
  selftests: net: add vxlan vnifilter notification test
  vxlan: vnifilter: fix spurious notification on VNI update
  vxlan: vnifilter: send notification on VNI add
  rtase: Reset TX subqueue when clearing TX ring
  octeontx2-af: npc: Fix CPT channel mask in npc_install_flow
  dt-bindings: ethernet: eswin: fix hsp-sp-csr backward compatibility
  sctp: validate cached peer INIT chunk length in COOKIE_ECHO processing
  net/sched: fix pedit partial COW leading to page cache corruption
  vsock/vmci: fix sk_ack_backlog leak on failed handshake
  net: bonding: fix NULL pointer dereference in bond_do_ioctl()
  geneve: fix length used in GRO hint UDP checksum adjustment
  net: ethernet: mtk_eth_soc: Fix use-after-free in metadata dst teardown
  ...

Merge tag 'drm-xe-fixes-2026-06-04' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- Revert removing support for unpublished NVL-S GuC (Daniele)
- Suspend fixes related to multi-queue (Niranjana)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aiHPGiPrAyHgwBZl@intel.com

Merge branch 'net-ethtool-make-sure-__ethtool_get_link_ksettings-is-ops-locked'

Jakub Kicinski says:

====================
net: ethtool: make sure __ethtool_get_link_ksettings() is ops-locked

This is prep for the series which will make most of the ethtool ops
run without rtnl_lock. The AI bots surfaced a number of callers of
__ethtool_get_link_ksettings() which need fixing, so I decided to
send that as a smaller prep-series. Each driver changed separately
for ease of review.

Full series unlocking ethtool ops AKA v1::
https://lore.kernel.org/20260528231637.251822-1-kuba@kernel.org
====================

Link: https://patch.msgid.link/20260603012840.2254293-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: make sure __ethtool_get_link_ksettings() is ops-locked

All drivers which may call *_get_link_ksettings() on ops-locked
devices from paths already holding the ops lock are ready now.
Make __ethtool_get_link_ksettings() take the ops lock, and assert
that it's held in netif_get_link_ksettings().

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260603012840.2254293-12-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

scsi: fcoe: don't recurse on the netdev's ops lock

fcoe_link_speed_update() calls __ethtool_get_link_ksettings() on the
lport's netdev, which will soon take the dev's ops lock. Some notifier
callers already arrive with this lock held. Switch to
netif_get_link_ksettings() and adjust the explicit call sites to take
the netdev lock explicitly.

Within fcoe_device_notification() try to only query the link speed
from notifiers which announce link state change (UP / CHANGE),
DOWN / GOING_DOWN notifiers are slightly sketchy when it comes
to ops locking right now, and the code already special-cases
those by maintaining the local link_possible variable.

Also take the lock in bnx2fc_net_config(), even though I think
that bnx2fc call sites are largely irrelevant since it's not
an ops-locked driver.

Link: https://patch.msgid.link/20260603012840.2254293-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

leds: trigger: netdev: don't recurse on the netdev ops lock

get_device_state() calls __ethtool_get_link_ksettings() on the trigger's
netdev, which will soon take the dev's ops lock. Three of its callers
already hold that lock and one doesn't, so the function would either
deadlock or run unprotected depending on the path.

Make get_device_state() expect the dev's ops lock held and switch to
netif_get_link_ksettings():

  * netdev_trig_notify() NETDEV_UP / NETDEV_CHANGE / NETDEV_CHANGENAME
    arrive with the dev's ops lock held (per netdevices.rst).
  * set_device_name() does not hold the lock, take it explicitly.

Due to lock ordering we need to reshuffle the code in set_device_name()
a little bit. We need to find the device earlier on, so that we can
lock it before we take trigger_data->lock.

Link: https://patch.msgid.link/20260603012840.2254293-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: sched: don't recurse on the netdev ops lock in qdiscs

cbs_set_port_rate() and taprio_set_picos_per_byte() are reached from
two paths and both already hold the device's ops lock:

*_change(), via tc_modify_qdisc() which calls netdev_lock_ops(dev)
before dispatching to the qdisc ops.

*_dev_notifier() on NETDEV_UP / NETDEV_CHANGE, where caller
holds the ops lock across the notifier chain.

Switch to netif_get_link_ksettings() to avoid deadlock once
__ethtool_get_link_ksettings() starts taking the netdev lock.

Link: https://patch.msgid.link/20260603012840.2254293-9-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: don't recurse on the port's netdev ops lock

port_cost() calls __ethtool_get_link_ksettings() on the port device,
which will soon take the port's ops lock. br_port_carrier_check()
is reached via the NETDEV_CHANGE notifier from linkwatch, which
already holds the port's ops lock, so the call would deadlock.

Make port_cost() expect the port's ops lock held and switch to
netif_get_link_ksettings(). The only other caller is new_nbp(),
make sure it takes the lock explicitly.

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260603012840.2254293-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: team: don't recurse on the port's netdev ops lock

__team_port_change_send() calls __ethtool_get_link_ksettings() on
the port, which will soon take the port's ops lock. The notifier
caller already holds it while the slave-add/del callers do not,
so the function would either deadlock or run unprotected depending
on the path.

Make __team_port_change_send() expect the port's ops lock held and
switch to netif_get_link_ksettings(). team_device_event()'s NETDEV_UP /
NETDEV_CHANGE already arrive with the port's ops lock held.
team_port_add() now take it explicitly.

Note that NETDEV_DOWN and team_port_del() will pass false as @linkup
so they will not execute netif_get_link_ksettings(). This is fortunate
as NETDEV_DOWN has somewhat mixed locking right now.

Link: https://patch.msgid.link/20260603012840.2254293-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bonding: don't recurse on the slave's netdev ops lock

bond_update_speed_duplex() calls __ethtool_get_link_ksettings() on
the slave, which will soon take the slave's ops lock. One of its
callers already holds it and the other three don't, so the function
would either deadlock or run unprotected depending on the path.

Make the helper expect the slave's ops lock held and switch to
netif_get_link_ksettings(). Wrap the three call sites that don't
already hold it:

  * bond_enslave() (rtnl held; core drops the lower's ops lock
    around ->ndo_add_slave).
  * bond_miimon_commit() (rtnl_trylock'd from the mii workqueue).
  * bond_ethtool_get_link_ksettings() (rtnl held via ethtool layer,
    bond device itself is not ops locked).

The call site which does already hold the ops lock is
bond_slave_netdev_event() via NETDEV_UP / NETDEV_CHANGE notifiers,
so it stays as-is.

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260603012840.2254293-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: add netif_get_link_ksettings() for correct ops-locked use

__ethtool_get_link_ksettings() is exported and called from sysfs
and many drivers. It invokes ethtool_ops->get_link_ksettings
so by our own docs it should be holding netdev lock for ops locked
devices. Looks like commit 2bcf4772e45a ("net: ethtool:
try to protect all callback with netdev instance lock")
missed adding the ops lock here.

There's a number of callers we need to fix up so let's add the
netif_get_link_ksettings() helper first, without any actual
locking changes (this commit is a nop).

Not treating this as a fix because I don't think any driver cares
at this point, but if we want to remove the rtnl_lock protection
this will become critical.

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260603012840.2254293-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: document NETDEV_CHANGENAME as ops locked

NETDEV_CHANGENAME is only emitted from netif_change_name().
netif_change_name() has two callers both of which hold netdev_lock_ops()
around the call site:
- dev_change_name()
- do_setlink()

Document NETDEV_CHANGENAME as always ops locked.

Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260603012840.2254293-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: cmis_cdb: hold instance lock for ops locked devices

FW module flashing was written so that the flashing happens
without holding rtnl_lock. This allows flashing multiple modules
at once. Current drivers can handle that well, but we should
let drivers depend on the netdev instance lock. Instance lock
is per netdev, and so is the module so we won't break parallel
updates.

Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260603012840.2254293-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: rename netdev_ops_assert_locked()

Jakub suggests renaming the existing assert to match
the netdev_lock_ops_compat() semantics.

We want netdev_assert_locked_ops() to mean - if the driver
is ops locked - check that it's holding the device lock.

The existing helper check for either ops lock or rtnl_lock,
which is the locking behavior of netdev_lock_ops_compat().

The reason for naming divergence is likely that
netdev_ops_assert_locked() predated the _compat() helpers.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260603012840.2254293-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'trace-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:

- Fix CFI violation in probestub function

   The probestub is a function to allow tprobes to hook to a tracepoint
   to gain access to its parameters.

   The function itself is only referenced by the tracepoint structure
   which lives in the __tracepoint section. objtool explicitly ignores
   that section and when processing functions in the kernel, if it
   detects one that has no references it will seal it to have its ENDBR
   stripped on boot up.

   This means the probstub function will have its ENDBR stripped and if
   a tprobe is attached to it with IBT enabled, it will go *boom*.

* tag 'trace-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix CFI violation in probestub being called by tprobes

rust: sync: add #[must_use] to GlobalGuard and GlobalLock::try_lock

Guard is marked #[must_use] since dropping it releases the lock. GlobalGuard
wraps Guard with identical semantics but was missing the annotation, so
discarding it would silently compile without warning.

Similarly, GlobalLock::try_lock was missing #[must_use]. Option<T> does not
propagate #[must_use] from T, so the attribute needs to be on the function
directly - same reason Lock::try_lock has it.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260502160057.3402896-1-ashutoshdesai993@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

drm/amd/display: Consult MCCS FreeSync cap only if requested & supported

When the do_mccs parameter is false, we don't call
dm_helpers_read_mccs_caps, so sink->mccs_caps.freesync_supported is
unlikely to be true.

Fixes: 6f71d5dd3206 ("drm/amd/display: Read sink freesync support via mccs")
Bug: https://gitlab.freedesktop.org/drm/amd/-/work_items/5286
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 115bf5ca318e18a3dc1888ec6271c7052774952a)

drm/amdkfd: Unwind debug trap enable on copy_to_user failure

If kfd_dbg_trap_enable() fails while copying runtime_info to userspace,
it had already activated the trap, set debug_trap_enabled, taken an extra
process reference, and opened the debug event file. Return -EFAULT without
unwinding that state, leaving inconsistent trap state and a refcount
imbalance that could break later DISABLE/ENABLE.

On copy_to_user failure, deactivate the trap and undo the rest of the
enable setup before returning.

Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 01112e241e37f9ac98b6f418d93ce2e0b87b7ee0)

drm/amdkfd: Add bounds check for AMDKFD_IOC_WAIT_EVENTS

The kfd_wait_on_events ioctl passes a user-supplied num_events parameter
directly to alloc_event_waiters() which calls kcalloc() without validation.
This allows unprivileged users with /dev/kfd access to trigger large kernel
memory allocations, potentially causing memory exhaustion and denial of
service via the OOM killer.

Add a check to reject num_events values exceeding KFD_SIGNAL_EVENT_LIMIT
(4096), which is the maximum number of events a single process can create.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 39eb6da7acee8d0cc12a8959235b590f295d7b4c)

drm/amdgpu: restart the CS if some parts of the VM are still invalidated

Make sure that we only submit work with full up to date VM page tables.

Backport to 7.1 and older.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 59720bfd8c6dbebeb8d5a7ab64241b007efd9213)
Cc: stable@vger.kernel.org

drm/amdgpu/userq: Fix reading timeline points in wait ioctl

Use correct u64 type.

Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0ac98160dfb6ab3c6d7b38e0ff9687780beed9cb)

drm/amdkfd: fix SMI event cross-process information leak

kfd_smi_ev_enabled() skips the suser privilege check when pid=0.
PROCESS_START, PROCESS_END, and VMFAULT events are emitted with
pid=0 while carrying another process's PID and command name, so any
/dev/kfd user in the render group can monitor all GPU workloads.

Pass the target process PID into kfd_smi_event_add() for these events
so the existing per-client filter restricts delivery to the owning
process or CAP_SYS_ADMIN subscribers.

Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 92a8dba246d371fe268280e5fd74b0955688e6df)

of: reserved_mem: avoid post-init UAF when alloc_reserved_mem_array() fails

The global pointer 'reserved_mem' continues to reference the
reserved_mem_array which lives in __initdata if
alloc_reserved_mem_array() fails. of_reserved_mem_lookup() is
exported for post-init use, that would dereference freed memory
and trigger a use-after-free.

So reset reserved_mem_count to 0 when alloc_reserved_mem_array()
fails.

Fixes: 00c9a452a235 ("of: reserved_mem: Add code to dynamically allocate reserved_mem array")
Signed-off-by: Wandun Chen <chenwandun@lixiang.com>
Link: https://patch.msgid.link/20260604015332.3669384-1-chenwandun1@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

drm/amdkfd: always resume_all after suspend_all

Need to restore any good queues even if the suspend_all
failed for some. Always run remove_queue as that will
schedule a GPU reset is removing the queue fails.

v2: move resume_all after remove

Fixes: eb067d65c33e ("drm/amdkfd: Update BadOpcode Interrupt handling with MES")
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx: move fault and EOP IRQ get/put to hw_init/hw_fini

priv_reg / priv_inst / bad_op and (on v11+) userq EOP IRQs are
acquired in late_init but released in hw_fini.  This split forced
gfx_v9_0_hw_fini() to defensively guard each put with
amdgpu_irq_enabled() because hw_fini runs on paths that may not
reach late_init.

amdgpu_ip_block_hw_fini() only runs after hw_init returns success,
and suspend / resume cycle the refs through the same path, so
hw_init / hw_fini pair without any extra tracking.  Move the gets
there and drop the guards.

While here, fix the pre-existing partial-failure leak in
set_userq_eop_interrupts() (gfx11 / 12_0 / 12_1).  amdgpu_irq_get()
increments the refcount before calling .set, so a failure partway
through the loop leaves earlier successful gets stranded.  Track
the loop position and roll back on the enable path.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Merge tag 's390-7.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Alexander Gordeev:

- Enable IOMMUFD and VFIO cdev such that PCI pass-through to
   QEMU/KVM can optionally utilize native IOMMUFD

- With HAVE_ARCH_BUG_FORMAT enabled the BUG infrastructure might
   misinterpret flags or fault. Fix this by moving the "format"
   field emission into __BUG_ENTRY()

- The generic version of _THIS_IP_ is known to be brittle and may
   break with current and future GCC and Clang optimizations.  Fix
   it by overriding _THIS_IP_

* tag 's390-7.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: Implement _THIS_IP_ using inline asm
  s390/bug: Always emit format word in __BUG_ENTRY
  s390/configs: Enable IOMMUFD and VFIO cdev in defconfigs

drm/amd/display: Consult MCCS FreeSync cap only if requested & supported

When the do_mccs parameter is false, we don't call
dm_helpers_read_mccs_caps, so sink->mccs_caps.freesync_supported is
unlikely to be true.

Fixes: 6f71d5dd3206 ("drm/amd/display: Read sink freesync support via mccs")
Bug: https://gitlab.freedesktop.org/drm/amd/-/work_items/5286
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Use strscpy in profile mode parsing

Use strscpy to copy the buffer which makes it explicit that a valid NULL
terminated string gets copied. Also, make it explicit that the source
buffer can be copied safely to the temporary buffer by checking against
its size.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Fix infinite loop parsing CRAT with zero subtype length

Malformed ACPI CRAT tables can advertise a zero or undersized subtype
length. The parser then fails to advance the cursor and loops forever
while the remaining image still looks large enough for a generic header.

Validate sub_type_hdr->length on each iteration before parsing or
advancing. Return -EINVAL and warn when length is zero or smaller than
the generic subtype header.

Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: fix sysfs topology prop length on buffer truncation

sysfs_show_gen_prop() accumulated snprintf()'s return value into the
offset. snprintf() reports bytes that would have been written, not
bytes actually written, so a truncated sysfs show could over-report
its length. Use sysfs_emit_at(), which returns only the bytes written.

Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: drop retry loop in amdgpu_hmm_range_get_pages

Since commit c08972f55594 ("drm/amdgpu: fix amdgpu_hmm_range_get_pages")
moved mmu_interval_read_begin() out of the per-chunk loop, the
captured notifier_seq is no longer refreshed across retries. As a
result, the existing -EBUSY retry path can never make progress:

  hmm_range_fault() returns -EBUSY only when
  mmu_interval_check_retry(notifier, notifier_seq) reports that the
  sequence is stale. Once the sequence has advanced, the stored seq
  will never match again, so every subsequent call within the same
  invocation returns -EBUSY immediately.

The "goto retry" therefore degenerates into a busy spin that simply
burns CPU for the full HMM_RANGE_DEFAULT_TIMEOUT (~1s) window before
finally bailing out with -EAGAIN. This is pure latency with no chance
of recovery, and it actively hurts the KFD userptr stack: the caller
ends up blocked for a second while holding mmap_lock, only to return
-EAGAIN to the restore worker (or to userspace) which would have
re-driven the operation immediately anyway.

Drop the retry/timeout entirely and let -EBUSY propagate straight to
out_free_pfns, where it is already translated to -EAGAIN. Recovery is
handled at a higher level: the KFD restore_userptr_worker reschedules
itself, and the userptr ioctl path returns -EAGAIN to userspace.

No functional regression: the previous behaviour on -EBUSY was already
to fail with -EAGAIN after a 1s stall; we just skip the stall.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Honglei Huang <honghuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: bound OD parameter parsing to stack array size

Reject inputs once parameter_size reaches the array limit, and pass
ARRAY_SIZE(parameter) into parse_input_od_command_lines() for defense in
depth.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Stop pp_od_clk_voltage emit at PAGE_SIZE

Stop appending OD sections in amdgpu_get_pp_od_clk_voltage()
once the sysfs page is full, instead of checking every sysfs_emit_at()
in SMU helpers. This is purely defensive hardening.

v2: Drop the prior series that checked sysfs_emit_at()
return values in every SMU *_emit_clk_levels() helper and
smu_cmn_print_*().(Kevin)

v3: Update description, remove all clamping

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Unwind debug trap enable on copy_to_user failure

If kfd_dbg_trap_enable() fails while copying runtime_info to userspace,
it had already activated the trap, set debug_trap_enabled, taken an extra
process reference, and opened the debug event file. Return -EFAULT without
unwinding that state, leaving inconsistent trap state and a refcount
imbalance that could break later DISABLE/ENABLE.

On copy_to_user failure, deactivate the trap and undo the rest of the
enable setup before returning.

Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: validate the mes firmware version for gfx12.1

MES firmware should report the same version whether read from
the register or from the firmware ucode binary. This is not
always the case, so add a log when they mismatch.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: validate the mes firmware version for gfx12

MES firmware should report the same version whether read from
the register or from the firmware ucode binary. This is not
always the case, so add a log when they mismatch.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: compare MES firmware version ucode for gfx11

MES firmware should report the same version whether read from
the register or from the firmware ucode binary. This is not
always the case, so add a log when they mismatch.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Add bounds check for AMDKFD_IOC_WAIT_EVENTS

The kfd_wait_on_events ioctl passes a user-supplied num_events parameter
directly to alloc_event_waiters() which calls kcalloc() without validation.
This allows unprivileged users with /dev/kfd access to trigger large kernel
memory allocations, potentially causing memory exhaustion and denial of
service via the OOM killer.

Add a check to reject num_events values exceeding KFD_SIGNAL_EVENT_LIMIT
(4096), which is the maximum number of events a single process can create.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: restart the CS if some parts of the VM are still invalidated

Make sure that we only submit work with full up to date VM page tables.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: use unsigned types for local pipe and REG_GET counters

Two small type fixes that match how the values are actually consumed:

- decide_zstate_support() iterates from 0 to pipe_count, which is
  unsigned. Make the loop index unsigned int.

- hpo_enc401_read_state() reads HDMI_PIXEL_ENCODING and
  HDMI_DEEP_COLOR_DEPTH via REG_GET_2(), which internally casts the
  output pointer to (uint32_t *). Passing the address of an int is a
  strict-aliasing wart even when the sizes match. Declare the locals
  as uint32_t.

No behavioural change since the values are only compared against small
non-negative constants.

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: widen dc_hdmi_frl_flags.force_frl_rate to unsigned int

dc_hdmi_frl_flags.force_frl_rate mirrors dc_debug_options.force_frl_rate,
which was just widened to unsigned int. Match the type here too so the
assignment in link_hdmi_frl.c does not narrow from unsigned to signed.

All call sites in link_hdmi_frl.c only compare the value against 0, 0xF,
or an hdmi_frl_link_rate enum whose values are non-negative, so the
change is behaviour-preserving and does not introduce sign-compare
warnings.

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: Fix reading timeline points in wait ioctl

Use correct u64 type.

Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vcn5.0.0: enable secure submission on unified ring for VCN 5.3.0

Enable secure submission support on the unified ring for VCN IP version
5.3.0 by setting `secure_submission_supported = true` in
vcn_v5_0_0_unified_ring_vm_funcs.

Secure IB submission is supported on VCN 5.3.0 hardware/firmware,
allowing protected decode workloads to bypass the common IB gate.
Without this, secure playback submissions can be blocked and fail.

Other VCN 5.x variants using the same vcn_v5_0_0_ip_block
(e.g. IP_VERSION(5, 0, 0)) do not support secure submission
on the unified ring and therefore continue using non-secure paths.

This change only advertises existing hardware/firmware capability;
non-secure decode paths remain unaffected.

Signed-off-by: Jeevana Muthyala <Jeevana.Muthyala2@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: deprecate guilty handling

The guilty handling tried to establish a second way of signaling problems with
the GPU back to userspace. This caused quite a bunch of issue we had to work
around, especially lifetime issues with the drm_sched_entity.

Just drop the handling altogether and use the dma_fence based approach instead.

v2: fix reversed condition in entity check (Alex)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add lockdep annotations for lock ordering validation

Add lockdep annotations to teach lockdep the correct lock hierarchy
and catch ordering violations during development. This follows the
pattern established by dma-resv in drivers/dma-buf/dma-resv.c.

Lock ordering hierarchy (outermost to innermost):

1. userq_sch_mutex   - Global userq scheduler (enforce_isolation)
2. userq_mutex       - Per-context userq (held across queue create/destroy)
3. notifier_lock     - MMU notifier synchronization
4. vram_lock         - VRAM memory allocator
5. reset_domain->sem - GPU reset synchronization
6. reset_lock        - Reset control mutex
7. srbm_mutex        - SRBM register access
8. grbm_idx_mutex    - GRBM index register access
9. mmio_idx_lock     - MMIO index access (spinlock)

The implementation provides:
- Lock ordering training at module init (amdgpu_lockdep_init)
- Lock class association for real driver locks (amdgpu_lockdep_set_class)

Dummy locks are associated with the same class keys as real driver locks
via lockdep_set_class(), ensuring lockdep connects the training ordering
with actual runtime locks.

Testing:
  Build the kernel with CONFIG_PROVE_LOCKING=y (enables CONFIG_LOCKDEP):
    scripts/config --enable PROVE_LOCKING
    scripts/config --enable DEBUG_LOCKDEP
    make -j$(nproc)

  On boot, dmesg should show:
    AMDGPU: Lockdep annotations initialized (9 lock levels)

  The companion IGT test (tests/amdgpu/amd_lockdep) exercises lock-heavy
  GPU code paths concurrently to trigger lockdep warnings on violations:
    sudo ./build/tests/amdgpu/amd_lockdep
    sudo dmesg | grep -A 50 "circular locking dependency"

  IGT subtests:
    concurrent-reset-and-submit  - reset_sem vs submission locks
    concurrent-mmap-and-evict    - mmap_lock vs vram_lock
    concurrent-userptr-and-reset - notifier_lock vs reset_sem
    stress-all-paths             - all of the above simultaneously

  A clean dmesg (no "circular locking dependency" or "possible recursive
  locking detected" messages) confirms no lock ordering violations.

  For CI integration, the test should be run on kernels compiled with
  CONFIG_LOCKDEP=y; dmesg is scanned post-run for lockdep splats.

v2: (Christian)
- Move notifier_lock and vram_lock before reset locks in hierarchy.
  HMM invalidation holds notifier_lock and can wait for GPU reset
  completion, so notifier_lock must be outer to reset_domain->sem.
- Associate dummy locks with lock class keys via lockdep_set_class()
  so lockdep connects training with real driver locks.
- Update commit message to list all 9 lock levels.

Requires CONFIG_PROVE_LOCKING=y to activate.

Cc: Christian Konig <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: fix SMI event cross-process information leak

kfd_smi_ev_enabled() skips the suser privilege check when pid=0.
PROCESS_START, PROCESS_END, and VMFAULT events are emitted with
pid=0 while carrying another process's PID and command name, so any
/dev/kfd user in the render group can monitor all GPU workloads.

Pass the target process PID into kfd_smi_event_add() for these events
so the existing per-client filter restricts delivery to the owning
process or CAP_SYS_ADMIN subscribers.

Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add DCN42B to dml21_translation_helper

Needed for DML to function with DCN42B.

Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix DCN42B version detection

In resource_parse_asic_id, the check for GC_11_0_4 was unbounded, which
caused it to override the detection of DCN42B.

Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix user-triggerable BUG()/BUG_ON() calls

Replace BUG()/BUG_ON() with error logs and safe returns in several
places where they can be triggered by invalid userspace input,
preventing DoS via kernel panic.

Signed-off-by: Ce Sun <cesun102@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

selftests/bpf: ignore call depth accounting for retbleed in verifier tests

When running the selftests on a retbleed-affected platform (eg:
Skylake), with call depth accounting enabled
(CONFIG_CALL_DEPTH_TRACKING=y) _and_ with retbleed=stuff, some verifier
selftests fail to validate the jited instructions. For example:

  MATCHED    SUBSTR: ' endbr64'
  MATCHED    SUBSTR: ' nopl (%rax,%rax)'
  MATCHED    SUBSTR: ' xorq %rax, %rax'
  MATCHED    SUBSTR: ' pushq %rbp'
  MATCHED    SUBSTR: ' movq %rsp, %rbp'
  MATCHED    SUBSTR: ' endbr64'
  MATCHED    SUBSTR: ' cmpq $0x21, %rax'
  MATCHED    SUBSTR: ' ja L0'
  MATCHED    SUBSTR: ' pushq %rax'
  MATCHED    SUBSTR: ' movq %rsp, %rax'
  MATCHED    SUBSTR: ' jmp L1'
  MATCHED    SUBSTR: 'L0: pushq %rax'
  MATCHED    SUBSTR: 'L1: pushq %rax'
  MATCHED    SUBSTR: ' movq -0x10(%rbp), %rax'
  WRONG LINE  REGEX: ' callq 0x{{.*}}'

Those affected selftests allways fail on some call instruction: this
failure is due to the JIT compiler emitting call depth accounting for
retbleed mitigation (see x86_call_depth_emit_accounting calls in
bpf_jit_comp.c), resulting in an additional instruction being inserted
in front of every call instruction, similar to this one:

  sarq    $0x5, %gs:-0x39882741(%rip)

Fix those selftests by allowing them to ignore this possibly present
call depth accounting instruction.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260528-fix_tests_for_retbleed_stuff-v1-1-c2022a1f3bee@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

dt-bindings: soc: imx: Add fsl,aipi-bus and fsl,emi-bus

Add the fsl,aipi-bus and fsl,emi-bus compatible strings for i.MX1 and
i.MX2 variants.

These compatibles are only intended for existing legacy chips (more than 15
years old) and will not be used for new device trees.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>

bpf: Take mmap_lock in zap_pages()

zap_vma_range() requires the owning mm's mmap_lock to be held.

Taking mmap_read_lock under arena->lock would AB-BA against
arena_vm_close() and arena_map_mmap(), both of which run with
mmap_write_lock held and then acquire arena->lock. Instead drop
arena->lock, mmget_not_zero() the vma's mm, take mmap_read_lock, and
re-resolve the vma via find_vma() since it may have been unmapped or
replaced while waiting.

Track processed vmls with a per-call generation in vml->zap_gen and
serialize zap_pages() callers with a new arena->zap_mutex so
concurrent callers on different uaddr ranges do not mark each other's
vmls processed before the zap is done.

Reported-by: David Hildenbrand <david@kernel.org>
Fixes: 317460317a02 ("bpf: Introduce bpf_arena.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260528222014.38980-1-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: clean up btf_scan_decl_tags()

Refactor the newly introduced btf_scan_decl_tags() to improve
readability and maintainability. The current implementation uses a
manual if-else chain and a magic number offset to strip the "arg:"
prefix from declaration tags.

Replace the if-else logic with a table-driven approach using a static
const array. This separates the tag data from the scanning logic, making
the helper more extensible for future tags. Additionally, replace the
magic number '4' with a sizeof-based calculation on the prefix string to
ensure the offset remains synchronized with the search key.

Finally, optimize the loop by moving the is_global check to the top of
the block. This allows the verifier to fail-fast on static subprograms
without performing unnecessary BTF string and type lookups.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260603201822.770596-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Test signed loader error paths

The positive path for signed BPF loaders is covered today by the
signed lskels (fentry_test, fexit_test, atomics).

But the runtime metadata check the generated loader performs (libbpf
gen_loader's emit_signature_match), the map content hash it relies
on, the load-time signature, and the immutability invariants of its
metadata map are not yet covered.

Thus, add a new, extensive test suite which drives libbpf's gen_loader
(bpf_object__gen_loader, gen_hash=true), the same machinery which
bpftool uses for signed light skeletons, and exercise corner cases
so that we can assert this in BPF CI:

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t signed_loader
  [...]
  [    1.840842] clocksource: Switched to clocksource tsc
  #405/1   signed_loader/metadata_check_shape:OK
  #405/2   signed_loader/metadata_match:OK
  #405/3   signed_loader/metadata_sha_mismatch:OK
  #405/4   signed_loader/metadata_not_exclusive:OK
  #405/5   signed_loader/metadata_hash_not_computed:OK
  #405/6   signed_loader/signature_enforced:OK
  #405/7   signed_loader/signature_too_large:OK
  #405/8   signed_loader/signature_bad_keyring:OK
  #405/9   signed_loader/metadata_ctx_max_entries_ignored:OK
  #405/10  signed_loader/metadata_ctx_initial_value_ignored:OK
  #405/11  signed_loader/signature_authenticates_insns:OK
  #405/12  signed_loader/hash_requires_frozen:OK
  #405/13  signed_loader/no_update_after_freeze:OK
  #405/14  signed_loader/freeze_writable_mmap:OK
  #405/15  signed_loader/no_writable_mmap_frozen:OK
  #405/16  signed_loader/map_hash_matches_libbpf:OK
  #405/17  signed_loader/map_hash_multi_element:OK
  #405/18  signed_loader/map_hash_bad_size:OK
  #405/19  signed_loader/map_hash_unsupported_type:OK
  #405     signed_loader:OK
  Summary: 1/19 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260603211658.471212-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Cover exclusive map create-time validation

map_excl exercises exclusive-map binding (allowed/denied), map-in-map
and map iterator rejection. It does not cover the create-time validation
of excl_prog_hash: the kernel only accepts a SHA-256-sized hash and
requires the pointer and size to be consistent.

Add map_excl_create_validation to check the rejected combinations:

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t map_excl
  [...]
  [    1.780305] clocksource: Switched to clocksource tsc
  #215/1   map_excl/map_excl_allowed:OK
  #215/2   map_excl/map_excl_denied:OK
  #215/3   map_excl/map_excl_no_map_in_map:OK
  #215/4   map_excl/map_excl_no_map_iter:OK
  #215/5   map_excl/map_excl_create_validation:OK
  #215     map_excl:OK
  Summary: 1/5 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260603211658.471212-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

rust: module_param: add missing newline to pr_warn_once

Add a trailing newline ('\n') to the pr_warn_once! call in set_param to
ensure the kernel ring buffer flushes the message correctly and
prevents log line smearing.

Signed-off-by: Kenny Glowner <SisyphusCode0311@gmail.com>
Suggested-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://github.com/Rust-for-Linux/linux/issues/1139
[Sami: Updated the commit message as we use pr_warn_once now.]
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>

module: decompress: check return value of module_extend_max_pages()

module_extend_max_pages() calls kvrealloc() internally and returns
-ENOMEM on allocation failure. The return value is never checked.

If the initial allocation fails, info->pages remains NULL and
info->max_pages remains 0. Subsequent calls to module_get_next_page()
will attempt to dynamically grow the array by calling
module_extend_max_pages(info, 0) since info->used_pages is 0. This
results in kvrealloc(NULL, 0) returning ZERO_SIZE_PTR, which is treated
as a success, leading to a dereference of ZERO_SIZE_PTR and a kernel
oops.

Fix: add the missing error check after module_extend_max_pages() and
return immediately on failure. This matches the pattern used by every
other kvrealloc() caller in the module loading path.

Fixes: b1ae6dc41eaa ("module: add in-kernel support for decompressing")
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andrii Kuchmenko <capyenglishlite@gmail.com>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
[Sami: Corrected the analysis in the commit message.]
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>

net/mlx5: convert miss_list allocation to kvmalloc_array()

dr_icm_buddy_init_ste_cache() allocates the per-buddy miss_list using
the open-coded kvmalloc(n * sizeof(*p), ...) form. The neighbouring
allocations in the same function already use the kvcalloc()/
kvzalloc_objs() forms; switch this last one to kvmalloc_array() for
consistency and for the size_mul overflow check that kvmalloc_array()
performs.

The semantics are unchanged: kvmalloc_array() returns a non-zeroed
buffer, just like the previous kvmalloc() call. Existing callers of
buddy->miss_list initialise each list_head before use.

Signed-off-by: William Theesfeld <william@theesfeld.net>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260601193758.626537-1-william@theesfeld.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MIPS: VDSO: Only map the data pages when the vDSO is used

A future change will make it possible to disable the time-related vDSO.
In that case there is no point in calling into the datastore.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Link: https://patch.msgid.link/20260521-vdso-mips-kconfig-v1-4-2f79dcd6c78f@linutronix.de

MIPS: Introduce Kconfig MIPS_GENERIC_GETTIMEOFDAY

The logic to enable the generic vDSO Kconfig symbols is about to become
more complex.

Introduce a new helper symbol to keep the configuration readable.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Link: https://patch.msgid.link/20260521-vdso-mips-kconfig-v1-3-2f79dcd6c78f@linutronix.de

vdso/datastore: Always provide symbol declarations

Allow callers to easily reference these symbols in code that is built
even when the generic datastore is disabled.

As there are no good default no-op variants of these symbols, do not
provide stubs but require users to have their own fallback handling
using IS_ENABLED(CONFIG_HAVE_GENERIC_VDSO).

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260521-vdso-mips-kconfig-v1-2-2f79dcd6c78f@linutronix.de

MAINTAINERS: Add include/linux/vdso_datastore.h to vDSO block

This file is part of the generic vDSO subsystem.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260521-vdso-mips-kconfig-v1-1-2f79dcd6c78f@linutronix.de

vdso/gettimeofday: Rename __arch_get_vdso_u_timens_data()

Originally this function was supposed to work the same way as
__arch_get_vdso_u_time_data() and be overridden on some architectures.
However the actually used implementation, which just adds PAGE_SIZE, does
not need this override mechanism.

Adjust the name to reflect the true nature of the function.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260519-vdso-arch_get_vdso_u_timens_data-v1-1-43f0d62716e8@linutronix.de

Reapply "bnxt_en: bring back rtnl_lock() in the bnxt_open() path"

This reverts commit 850d9248d2eac662f869c766a598c877690c74e5.
This reapplies commit 325eb217e41f ("bnxt_en: bring back rtnl_lock()
in the bnxt_open() path").

Breno reports a lockdep warning in bnxt. During FW reset the driver
may end up calling netif_set_real_num_tx_queues() (if queue count
changes), so calls to bnxt_open() still require rtnl_lock.

  net/sched/sch_generic.c:1416 suspicious rcu_dereference_protected() usage!

   dev_qdisc_change_real_num_tx+0x54/0xe0
   netif_set_real_num_tx_queues+0x4ed/0xa80
   __bnxt_open_nic+0x9cb/0x3490
   bnxt_open+0x1cb/0x370
   bnxt_fw_reset_task+0x80d/0x1e80
   process_scheduled_works+0x9c1/0x13b0

The reverted commit was just an optimization / experiment
so let's go back to taking the lock.

Reported-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/ah726OtFX-Qw3U-R@gmail.com
Fixes: 850d9248d2ea ("Revert "bnxt_en: bring back rtnl_lock() in the bnxt_open() path"")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260603195845.2574426-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: clear skb->dev before running a sockmap verdict

On the UDP receive path skb->dev is repurposed as dev_scratch (the
truesize/state cache set by udp_set_dev_scratch()), through the
union { struct net_device *dev; unsigned long dev_scratch; } in sk_buff.

When a UDP socket is in a sockmap, sk_data_ready is
sk_psock_verdict_data_ready(), which calls udp_read_skb() -> recv_actor()
(sk_psock_verdict_recv) to run the attached SK_SKB verdict program in softirq.
If that program calls a socket-lookup helper (bpf_sk_lookup_tcp/udp,
bpf_skc_lookup_tcp), bpf_skc_lookup() does:

if (skb->dev)
caller_net = dev_net(skb->dev);

skb->dev still holds the dev_scratch value (a non-NULL integer), so dev_net()
dereferences it as a struct net_device * and the kernel takes a general
protection fault on a non-canonical address in softirq:

  Oops: general protection fault, probably for non-canonical address 0x1010000800004a0
  CPU: 1 UID: 0 PID: 1406 Comm: syz.2.19 Not tainted 7.1.0-rc6 #1 PREEMPT(full)
  RIP: 0010:bpf_skc_lookup net/core/filter.c:7033 [inline]
  RIP: 0010:bpf_sk_lookup+0x45/0x160 net/core/filter.c:7047
  Call Trace:
   <IRQ>
   bpf_prog_4675cb904b7071f8+0x12e/0x14e
   bpf_prog_run_pin_on_cpu+0xc6/0x1f0
   sk_psock_verdict_recv+0x1ba/0x350
   udp_read_skb+0x31a/0x370
   sk_psock_verdict_data_ready+0x2e3/0x600
   __udp_enqueue_schedule_skb+0x4c8/0x650
   udpv6_queue_rcv_one_skb+0x3ec/0x740
   udp6_unicast_rcv_skb+0x11d/0x140
   ip6_protocol_deliver_rcu+0x61e/0x950
   ip6_input_finish+0xa9/0x150
   NF_HOOK+0x286/0x2f0
   ip6_input+0x117/0x220
   NF_HOOK+0x286/0x2f0
   __netif_receive_skb+0x85/0x200
   process_backlog+0x374/0x9a0
   __napi_poll+0x4f/0x1c0
   net_rx_action+0x3b0/0x770
   handle_softirqs+0x15a/0x460
   do_softirq+0x57/0x80
   </IRQ>

The rmem charge that dev_scratch accounted for is released by skb_recv_udp() on
dequeue, just above, so the scratch is dead by the time recv_actor() runs. Clear
skb->dev so bpf_skc_lookup() falls back to sock_net(skb->sk), which
skb_set_owner_sk_safe() set just above.

Fixes: 965b57b469a5 ("net: Introduce a new proto_ops ->read_skb()")
Cc: stable@vger.kernel.org
Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260603162737.697215-1-rhkrqnwk98@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: purge outqueue on stale COOKIE-ECHO handling

sctp_stream_update() is only invoked when the association is moved into
COOKIE_WAIT during association setup/reconfiguration. In this path, the
outbound stream scheduler state (stream->out_curr) is expected to be
clean, since no user data should have been transmitted yet unless the
state machine has already partially progressed.

However, a corner case exists in sctp_sf_do_5_2_6_stale(): when a
Stale Cookie ERROR is received, the association is rolled back from
COOKIE_ECHOED to COOKIE_WAIT. In this scenario, user data may already
have been queued and even bundled with the COOKIE-ECHO chunk.

During the rollback, sctp_stream_update() frees the old stream table
and installs a new one, but it does not invalidate stream->out_curr.
As a result, out_curr may still point to a freed sctp_stream_out
entry from the previous stream state.

Later, SCTP scheduler dequeue paths (FCFS, RR, PRIO, etc.) rely on
stream->out_curr->ext, which can lead to use-after-free once the old
stream state has been released via sctp_stream_free().

This results in crashes such as (reported by Yuqi):

  BUG: KASAN: slab-use-after-free in sctp_sched_fcfs_dequeue+0x13a/0x140
  Read of size 8 at addr ff1100004d4d3208 by task mini_poc/9312
  CPU: 1 UID: 1001 PID: 9312 Comm: mini_poc Not tainted
     7.1.0-rc1-00305-gbd3a4795d574 #5 PREEMPT(full)
   sctp_sched_fcfs_dequeue+0x13a/0x140
   sctp_outq_flush+0x1603/0x33e0
   sctp_do_sm+0x31c9/0x5d30
   sctp_assoc_bh_rcv+0x392/0x6f0
   sctp_inq_push+0x1db/0x270
   sctp_rcv+0x138d/0x3c10

Fix this by fully purging the association outqueue when handling the
Stale Cookie case. This ensures all pending transmit and retransmit
state is dropped, and any scheduler cached pointers are invalidated,
making it safe to rebuild stream state during COOKIE_WAIT restart.

Updating only stream->out_curr would be insufficient, since queued
and retransmittable data would still reference the old stream state and
trigger later use-after-free in dequeue paths.

Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Reported-by: Yuqi Xu <xuyq21@lenovo.com>
Reported-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/94318159b9052907a6cbb7256aee8b5f8dfbfccb.1780510304.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ASoC: Intel: catpt: Code cleanup

Cezary Rojewski <cezary.rojewski@intel.com> says:

All of the changes found here are cleanups and from functional
perspective, have no impact - either unused code is being removed or
existing code is altered to use helpers/macros to improve readability.

Collateral of recent fixes [1]. There is one more patchset with similar
goal following this one. Before the team managed to actually fix the
problem, a number of changes were added to make the code easier to
understand for people who are not the author (me).

[1]: https://lore.kernel.org/linux-sound/20260528083444.1439233-1-cezary.rojewski@intel.com/

Link: https://patch.msgid.link/20260603085827.1964796-1-cezary.rojewski@intel.com

ASoC: Intel: catpt: Cleanup components_kcontrols[]

Fix alignment and drop redundant comments.
While at it, declare the mute-boolean explicitly.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260603085827.1964796-8-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: Intel: catpt: Drop manipulation of the obsolete direction flag

Setting up direction for struct dma_slave_config is obsolete, see the
description of the struct. The transfer performed by the catpt-driver
is also always DMA_MEM_TO_MEM not DMA_MEM_TO_DEV with preparation step
being dmaengine_prep_dma_memcpy().

DW's ->device_prep_dma_memcpy() always fixes the direction to
DMA_MEM_TO_MEM even if its user fails to do so, see
drivers/dma/dw/core.c. While the change impacts number of checks done
by ->device_config() - p/m buswidth checks are skipped - fields being
fixed up in those i.e.: .dst_addr_width and .src_addr_width, do not take
part in DMA_MEM_TO_MEM transfer.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260603085827.1964796-7-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: Intel: catpt: Remove unused WAVES controls

Support for the WAVES module was never officially published. The
kcontrols present in the existing code were added to retain 1:1 UAPI
with catpt-driver's predecessor, the haswell-driver despite the lack of
users for the functionality. Several years have passed since the
successful transition to the catpt-driver and removal of its predecessor
and there is no reason to keep the unused code.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260603085827.1964796-6-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: Intel: catpt: Simplify catpt_stream_find()

Code line reduction and more transparent variable naming.
No functional changes.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260603085827.1964796-5-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: Intel: catpt: Simplify the RAM-navigation code

Add catpt_iram_addr() to the catpt helpers family and replace all the
open arithmetics with them. Makes it easier to understand the code.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260603085827.1964796-4-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: Intel: catpt: Replace RAM-helpers with resource_xxx()

For catpt_sram_init(), the exact same functionality has been provided to
ioport.h with commit 9fb6fef0fb49 ("resource: Add resource set range and
size helpers") in recent years.

As for catpt_dram/iram_size(), leave it for the driver initialization
only. Have all other manipulations be done using resource_xxx() API
which are more familiar to kernel developers.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260603085827.1964796-3-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: Intel: catpt: Utilize lock-guard helper

The lock-guard helps simplify the driver's code.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260603085827.1964796-2-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>

regulator: bq257xx: drop confusing configuration of_node

The driver reuses the OF node of the parent multi-function device but
still sets the of_node field of the regulator configuration to any prior
OF node.

Since the MFD child device does not have an OF node set until probe is
called, this field is set to NULL on first probe and to the reused OF
node if the driver is later rebound.

As the device_set_of_node_from_dev() helper drops a reference to any
prior OF node before taking a reference to the new one this can
apparently also confuse LLMs like Sashiko which flags it as a potential
use-after-free (which it is not).

Drop the confusing and redundant configuration of_node assignment.

Link: https://sashiko.dev/#/patchset/20260408073055.5183-1-johan%40kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260604115912.2734074-1-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>