]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
12 days agocrypto: ccp - Replace snprintf("%s") with strscpy
Thorsten Blum [Tue, 24 Mar 2026 11:30:07 +0000 (12:30 +0100)] 
crypto: ccp - Replace snprintf("%s") with strscpy

Replace snprintf("%s") with the faster and more direct strscpy().

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
12 days agocrypto: hifn_795x - Replace snprintf("%s") with strscpy
Thorsten Blum [Tue, 24 Mar 2026 11:27:05 +0000 (12:27 +0100)] 
crypto: hifn_795x - Replace snprintf("%s") with strscpy

Replace snprintf("%s", ...) with the faster and more direct strscpy().
Check if the return value is less than 0 to detect string truncation.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
12 days agocrypto: qat - disable 420xx AE cluster when lead engine is fused off
Ahsan Atta [Tue, 24 Mar 2026 11:12:34 +0000 (11:12 +0000)] 
crypto: qat - disable 420xx AE cluster when lead engine is fused off

The get_ae_mask() function only disables individual engines based on
the fuse register, but engines are organized in clusters of 4. If the
lead engine of a cluster is fused off, the entire cluster must be
disabled.

Replace the single bitmask inversion with explicit test_bit() checks
on the lead engine of each group, disabling the full ADF_AE_GROUP
when the lead bit is set.

Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Fixes: fcf60f4bcf54 ("crypto: qat - add support for 420xx devices")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
12 days agocrypto: qat - disable 4xxx AE cluster when lead engine is fused off
Ahsan Atta [Tue, 24 Mar 2026 11:11:12 +0000 (11:11 +0000)] 
crypto: qat - disable 4xxx AE cluster when lead engine is fused off

The get_ae_mask() function only disables individual engines based on
the fuse register, but engines are organized in clusters of 4. If the
lead engine of a cluster is fused off, the entire cluster must be
disabled.

Replace the single bitmask inversion with explicit test_bit() checks
on the lead engine of each group, disabling the full ADF_AE_GROUP
when the lead bit is set.

Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Fixes: 8c8268166e834 ("crypto: qat - add qat_4xxx driver")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
12 days agocrypto: testmgr - Add test vectors for authenc(hmac(md5),cbc(aes))
Aleksander Jan Bajkowski [Tue, 3 Mar 2026 18:48:44 +0000 (19:48 +0100)] 
crypto: testmgr - Add test vectors for authenc(hmac(md5),cbc(aes))

Test vectors were generated starting from existing CBC(AES) test vectors
(RFC3602, NIST SP800-38A) and adding HMAC(MD5) computed with Python
script. Then, the results were double-checked on Mediatek MT7981 (safexcel)
and NXP P2020 (talitos). Both platforms pass self-tests.

Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
12 days agocrypto: af_alg - limit RX SG extraction by receive buffer budget
Douya Le [Thu, 2 Apr 2026 15:34:55 +0000 (23:34 +0800)] 
crypto: af_alg - limit RX SG extraction by receive buffer budget

Make af_alg_get_rsgl() limit each RX scatterlist extraction to the
remaining receive buffer budget.

af_alg_get_rsgl() currently uses af_alg_readable() only as a gate
before extracting data into the RX scatterlist. Limit each extraction
to the remaining af_alg_rcvbuf(sk) budget so that receive-side
accounting matches the amount of data attached to the request.

If skcipher cannot obtain enough RX space for at least one chunk while
more data remains to be processed, reject the recvmsg call instead of
rounding the request length down to zero.

Fixes: e870456d8e7c8d57c059ea479b5aadbb55ff4c3a ("crypto: algif_skcipher - overhaul memory management")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Douya Le <ldy3087146292@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
12 days agoMerge tag 'v7.0-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Fri, 3 Apr 2026 00:29:48 +0000 (17:29 -0700)] 
Merge tag 'v7.0-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

 - Add missing async markers to tegra

 - Fix long hmac key DMA handling in caam

 - Fix spurious ENOSPC errors in deflate

 - Fix SG chaining in af_alg

 - Do not use in-place process in algif_aead

 - Fix out-of-place destination overflow in authencesn

* tag 'v7.0-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: authencesn - Do not place hiseq at end of dst for out-of-place decryption
  crypto: algif_aead - Revert to operating out-of-place
  crypto: af-alg - fix NULL pointer dereference in scatterwalk
  crypto: deflate - fix spurious -ENOSPC
  crypto: caam - fix overflow on long hmac keys
  crypto: caam - fix DMA corruption on long hmac keys
  crypto: tegra - Add missing CRYPTO_ALG_ASYNC

13 days agolib/crc: arm64: Simplify intrinsics implementation
Ard Biesheuvel [Mon, 30 Mar 2026 14:46:35 +0000 (16:46 +0200)] 
lib/crc: arm64: Simplify intrinsics implementation

NEON intrinsics are useful because they remove the need for manual
register allocation, and the resulting code can be re-compiled and
optimized for different micro-architectures, and shared between arm64
and 32-bit ARM.

However, the strong typing of the vector variables can lead to
incomprehensible gibberish, as is the case with the new CRC64
implementation. To address this, let's repaint all variables as
uint64x2_t to minimize the number of vreinterpretq_xxx() calls, and to
be able to rely on the ^ operator for exclusive OR operations. This
makes the code much more concise and readable.

While at it, wrap the calls to vmull_p64() et al in order to have a more
consistent calling convention, and encapsulate any remaining
vreinterpret() calls that are still needed.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260330144630.33026-11-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
13 days agolib/crc: arm64: Use existing macros for kernel-mode FPU cflags
Ard Biesheuvel [Mon, 30 Mar 2026 14:46:33 +0000 (16:46 +0200)] 
lib/crc: arm64: Use existing macros for kernel-mode FPU cflags

Use the existing CC_FPU_CFLAGS and CC_NO_FPU_CFLAGS to pass the
appropriate compiler command line options for building kernel mode NEON
intrinsics code. This is tidier, and will make it easier to reuse the
code for 32-bit ARM.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260330144630.33026-9-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
13 days agolib/crc: arm64: Drop unnecessary chunking logic from crc64
Ard Biesheuvel [Mon, 30 Mar 2026 14:46:32 +0000 (16:46 +0200)] 
lib/crc: arm64: Drop unnecessary chunking logic from crc64

On arm64, kernel mode NEON executes with preemption enabled, so there is
no need to chunk the input by hand.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260330144630.33026-8-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
13 days agolib/crc: arm64: Assume a little-endian kernel
Eric Biggers [Wed, 1 Apr 2026 00:44:31 +0000 (17:44 -0700)] 
lib/crc: arm64: Assume a little-endian kernel

Since support for big-endian arm64 kernels was removed, the CPU_LE()
macro now unconditionally emits the code it is passed, and the CPU_BE()
macro now unconditionally discards the code it is passed.

Simplify the assembly code in lib/crc/arm64/ accordingly.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20260401004431.151432-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
13 days agoMerge branch 'task-local-data-bug-fixes-and-improvement'
Alexei Starovoitov [Thu, 2 Apr 2026 22:11:08 +0000 (15:11 -0700)] 
Merge branch 'task-local-data-bug-fixes-and-improvement'

Amery Hung says:

====================
Task local data bug fixes and improvement

This patchset fixed three task local data bugs, improved the
memory allocation code, and dropped unnecessary TLD_READ_ONCE. Please
find the detail in each patch's commit msg.

One thing worth mentioning is that Patch 3 allows us to renable task
local data selftests as the library now always calls aligned_alloc()
with size matching alignment under default configuration.

v1 -> v2
 - Fix potential memory leak
 - Drop TLD_READ_ONCE()
Link: https://lore.kernel.org/bpf/20260326052437.590158-1-ameryhung@gmail.com/
====================

Link: https://patch.msgid.link/20260331213555.1993883-1-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agoselftests/bpf: Improve task local data documentation and fix potential memory leak
Amery Hung [Tue, 31 Mar 2026 21:35:55 +0000 (14:35 -0700)] 
selftests/bpf: Improve task local data documentation and fix potential memory leak

If TLD_FREE_DATA_ON_THREAD_EXIT is not enabled in a translation unit
that calls __tld_create_key() first, another translation unit that
enables it will not get the auto cleanup feature as pthread key is only
created once when allocation metadata. Fix it by always try to create
the pthread key when __tld_create_key() is called.

Also improve the documentation:
- Discourage user from using different options in different translation
  units
- Specify calling tld_free() before thread exit as undefined behavior

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-6-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agoselftests/bpf: Remove TLD_READ_ONCE() in the user space header
Amery Hung [Tue, 31 Mar 2026 21:35:54 +0000 (14:35 -0700)] 
selftests/bpf: Remove TLD_READ_ONCE() in the user space header

TLD_READ_ONCE() is redundant as the only reference passed to it is
defined as _Atomic. The load is guaranteed to be atomic in C11 standard
(6.2.6.1). Drop the macro.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-5-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agoselftests/bpf: Make sure TLD_DEFINE_KEY runs first
Amery Hung [Tue, 31 Mar 2026 21:35:53 +0000 (14:35 -0700)] 
selftests/bpf: Make sure TLD_DEFINE_KEY runs first

Without specifying constructor priority of the hidden constructor
function defined by TLD_DEFINE_KEY, __tld_create_key(..., dyn_data =
false) may run after tld_get_data() called from other constructors.
Threads calling tld_get_data() before __tld_create_key(..., dyn_data
= false) will not allocate enough memory for all TLDs and later result
in OOB access. Therefore, set it to the lowest value available to
users. Note that lower means higher priority and 0-100 is reserved to
the compiler.

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-4-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agoselftests/bpf: Simplify task_local_data memory allocation
Amery Hung [Tue, 31 Mar 2026 21:35:52 +0000 (14:35 -0700)] 
selftests/bpf: Simplify task_local_data memory allocation

Simplify data allocation by always using aligned_alloc() and passing
size_pot, size rounded up to the closest power of two to alignment.

Currently, aligned_alloc(page_size, size) is only intended to be used
with memory allocators that can fulfill the request without rounding
size up to page_size to conserve memory. This is enabled by defining
TLD_DATA_USE_ALIGNED_ALLOC. The reason to align to page_size is due to
the limitation of UPTR where only a page can be pinned to the kernel.
Otherwise, malloc(size * 2) is used to allocate memory for data.

However, we don't need to call aligned_alloc(page_size, size) to get
a contiguous memory of size bytes within a page. aligned_alloc(size_pot,
...) will also do the trick. Therefore, just use aligned_alloc(size_pot,
...) universally.

As for the size argument, create a new option,
TLD_DONT_ROUND_UP_DATA_SIZE, to specify not rounding up the size.
This preserves the current TLD_DATA_USE_ALIGNED_ALLOC behavior, allowing
memory allocators with low overhead aligned_alloc() to not waste memory.
To enable this, users need to make sure it is not an undefined behavior
for the memory allocator to have size not being an integral multiple of
alignment.

Compared to the current implementation, !TLD_DATA_USE_ALIGNED_ALLOC
used to always waste size-byte of memory due to malloc(size * 2).
Now the worst case becomes size - 1 and the best case is 0 when the size
is already a power of two.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agoselftests/bpf: Fix task_local_data data allocation size
Amery Hung [Tue, 31 Mar 2026 21:35:51 +0000 (14:35 -0700)] 
selftests/bpf: Fix task_local_data data allocation size

Currently, when allocating memory for data, size of tld_data_u->start
is not taken into account. This may cause OOB access. Fixed it by adding
the non-flexible array part of tld_data_u.

Besides, explicitly align tld_data_u->data to 8 bytes in case some
fields are added before data in the future. It could break the
assumption that every data field is 8 byte aligned and
sizeof(tld_data_u) will no longer be equal to
offsetof(struct tld_data_u, data), which we use interchangeably.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agogenirq/chip: Invoke add_interrupt_randomness() in handle_percpu_devid_irq()
Michael Kelley [Thu, 2 Apr 2026 20:23:59 +0000 (13:23 -0700)] 
genirq/chip: Invoke add_interrupt_randomness() in handle_percpu_devid_irq()

handle_percpu_devid_irq() is a version of handle_percpu_irq() but with the
addition of a pointer to a per-CPU devid.

However, handle_percpu_irq() invokes add_interrupt_randomness(), while
handle_percpu_devid_irq() currently does not.

Add the missing add_interrupt_randomness(), as it is needed when per-CPU
interrupts with devid's are used in VMs for interrupts from the hypervisor.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260402202400.1707-2-mhklkml@zohomail.com
13 days agoarm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06
Mark Brown [Fri, 6 Mar 2026 17:00:53 +0000 (17:00 +0000)] 
arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06

Update the definition of SMIDR_EL1 in the sysreg definition to reflect the
information in DD0601 2025-06. This includes somewhat more generic ways of
describing the sharing of SMCUs, more information on supported priorities
and provides additional resolution for describing affinity groups.

Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
13 days agoMerge branch 'libbpf-clarify-raw-address-single-kprobe-attach-behavior'
Andrii Nakryiko [Thu, 2 Apr 2026 20:23:19 +0000 (13:23 -0700)] 
Merge branch 'libbpf-clarify-raw-address-single-kprobe-attach-behavior'

Hoyeon Lee says:

====================
libbpf: clarify raw-address single kprobe attach behavior

Today libbpf documents single-kprobe attach through func_name, with an
optional offset. For the PMU-based path, func_name = NULL with an
absolute address in offset already works as well, but that is not
described in the API.

This patchset clarifies this behavior. First commit fixes kprobe
and uprobe attach error handling to use direct error codes. Next adds
kprobe API comments for the raw-address form and rejects it explicitly
for legacy tracefs/debugfs kprobes. Last adds PERF and LINK selftests
for the raw-address form, and checks that LEGACY rejects it.
---
Changes in v7:
- Change selftest line wrapping and assertions

Changes in v6:
- Split the kprobe/uprobe direct error-code fix into a separate patch

Changes in v5:
- Add kprobe API docs, use -EOPNOTSUPP, and switch selftests to LIBBPF_OPTS

Changes in v4:
- Inline raw-address error formatting and remove the probe_target buffer

Changes in v3:
- Drop bpf_kprobe_opts.addr and reuse offset when func_name is NULL
- Make legacy tracefs/debugfs kprobes reject the raw-address form
- Update selftests to cover PERF/LINK raw-address attach and LEGACY reject

Changes in v2:
- Fix line wrapping and indentation
====================

Link: https://patch.msgid.link/20260401143116.185049-1-hoyeon.lee@suse.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
13 days agoselftests/bpf: Add test for raw-address single kprobe attach
Hoyeon Lee [Wed, 1 Apr 2026 14:29:31 +0000 (23:29 +0900)] 
selftests/bpf: Add test for raw-address single kprobe attach

Currently, attach_probe covers manual single-kprobe attaches by
func_name, but not the raw-address form that the PMU-based
single-kprobe path can accept.

This commit adds PERF and LINK raw-address coverage. It resolves
SYS_NANOSLEEP_KPROBE_NAME through kallsyms, passes the absolute address
in bpf_kprobe_opts.offset with func_name = NULL, and verifies that
kprobe and kretprobe are still triggered. It also verifies that LEGACY
rejects the same form.

Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260401143116.185049-4-hoyeon.lee@suse.com
13 days agolibbpf: Clarify raw-address single kprobe attach behavior
Hoyeon Lee [Wed, 1 Apr 2026 14:29:30 +0000 (23:29 +0900)] 
libbpf: Clarify raw-address single kprobe attach behavior

bpf_program__attach_kprobe_opts() documents single-kprobe attach
through func_name, with an optional offset. For the PMU-based path,
func_name = NULL with an absolute address in offset already works as
well, but that is not described in the API.

This commit clarifies this existing non-legacy behavior. For PMU-based
attach, callers can use func_name = NULL with an absolute address in
offset as the raw-address form. For legacy tracefs/debugfs kprobes,
reject this form explicitly.

Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260401143116.185049-3-hoyeon.lee@suse.com
13 days agolibbpf: Use direct error codes for kprobe/uprobe attach
Hoyeon Lee [Wed, 1 Apr 2026 14:29:29 +0000 (23:29 +0900)] 
libbpf: Use direct error codes for kprobe/uprobe attach

perf_event_open_probe() and perf_event_{k,u}probe_open_legacy() helpers
are returning negative error codes directly on failure. This commit
changes bpf_program__attach_{k,u}probe_opts() to use those return
values directly instead of re-reading possibly changed errno.

Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260401143116.185049-2-hoyeon.lee@suse.com
13 days agoaccel: ethosu: Add hardware dependency hint
Jean Delvare [Wed, 1 Apr 2026 10:23:23 +0000 (12:23 +0200)] 
accel: ethosu: Add hardware dependency hint

The Ethos-U NPU is only available on ARM systems, so add a hardware
dependency hint to prevent this driver from being needlessly
included in kernels built for other architectures.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: https://patch.msgid.link/20260401122323.6127a77c@endymion
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
13 days agolibbpf: Fix BTF handling in bpf_program__clone()
Mykyta Yatsenko [Wed, 1 Apr 2026 15:16:40 +0000 (16:16 +0100)] 
libbpf: Fix BTF handling in bpf_program__clone()

Align bpf_program__clone() with bpf_object_load_prog() by gating
BTF func/line info on FEAT_BTF_FUNC kernel support, and resolve
caller-provided prog_btf_fd before checking obj->btf so that callers
with their own BTF can use clone() even when the object has no BTF
loaded.

While at it, treat func_info and line_info fields as atomic groups
to prevent mismatches between pointer and count from different sources.

Move bpf_program__clone() to libbpf 1.8.

Fixes: 970bd2dced35 ("libbpf: Introduce bpf_program__clone()")
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260401151640.356419-1-mykyta.yatsenko5@gmail.com
13 days agoarm64: mm: Remove pmd_sect() and pud_sect()
Ryan Roberts [Mon, 30 Mar 2026 16:17:04 +0000 (17:17 +0100)] 
arm64: mm: Remove pmd_sect() and pud_sect()

The semantics of pXd_leaf() are very similar to pXd_sect(). The only
difference is that pXd_sect() only considers it a section if PTE_VALID
is set, whereas pXd_leaf() permits both "valid" and "present-invalid"
types.

Using pXd_sect() has caused issues now that large leaf entries can be
present-invalid since commit a166563e7ec37 ("arm64: mm: support large
block mapping when rodata=full"), so let's just remove the API and
standardize on pXd_leaf().

There are a few callsites of the form pXd_leaf(READ_ONCE(*pXdp)). This
was previously fine for the pXd_sect() macro because it only evaluated
its argument once. But pXd_leaf() evaluates its argument multiple times.
So let's avoid unintended side effects by reimplementing pXd_leaf() as
an inline function.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
13 days agoarm64: mm: Handle invalid large leaf mappings correctly
Ryan Roberts [Mon, 30 Mar 2026 16:17:03 +0000 (17:17 +0100)] 
arm64: mm: Handle invalid large leaf mappings correctly

It has been possible for a long time to mark ptes in the linear map as
invalid. This is done for secretmem, kfence, realm dma memory un/share,
and others, by simply clearing the PTE_VALID bit. But until commit
a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") large leaf mappings were never made invalid in this way.

It turns out various parts of the code base are not equipped to handle
invalid large leaf mappings (in the way they are currently encoded) and
I've observed a kernel panic while booting a realm guest on a
BBML2_NOABORT system as a result:

[   15.432706] software IO TLB: Memory encryption is active and system is using DMA bounce buffers
[   15.476896] Unable to handle kernel paging request at virtual address ffff000019600000
[   15.513762] Mem abort info:
[   15.527245]   ESR = 0x0000000096000046
[   15.548553]   EC = 0x25: DABT (current EL), IL = 32 bits
[   15.572146]   SET = 0, FnV = 0
[   15.592141]   EA = 0, S1PTW = 0
[   15.612694]   FSC = 0x06: level 2 translation fault
[   15.640644] Data abort info:
[   15.661983]   ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
[   15.694875]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[   15.723740]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[   15.755776] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081f3f000
[   15.800410] [ffff000019600000] pgd=0000000000000000, p4d=180000009ffff403, pud=180000009fffe403, pmd=00e8000199600704
[   15.855046] Internal error: Oops: 0000000096000046 [#1]  SMP
[   15.886394] Modules linked in:
[   15.900029] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-dirty #4 PREEMPT
[   15.935258] Hardware name: linux,dummy-virt (DT)
[   15.955612] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[   15.986009] pc : __pi_memcpy_generic+0x128/0x22c
[   16.006163] lr : swiotlb_bounce+0xf4/0x158
[   16.024145] sp : ffff80008000b8f0
[   16.038896] x29: ffff80008000b8f0 x28: 0000000000000000 x27: 0000000000000000
[   16.069953] x26: ffffb3976d261ba8 x25: 0000000000000000 x24: ffff000019600000
[   16.100876] x23: 0000000000000001 x22: ffff0000043430d0 x21: 0000000000007ff0
[   16.131946] x20: 0000000084570010 x19: 0000000000000000 x18: ffff00001ffe3fcc
[   16.163073] x17: 0000000000000000 x16: 00000000003fffff x15: 646e612065766974
[   16.194131] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[   16.225059] x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000018
[   16.256113] x8 : 0000000000000018 x7 : 0000000000000000 x6 : 0000000000000000
[   16.287203] x5 : ffff000019607ff0 x4 : ffff000004578000 x3 : ffff000019600000
[   16.318145] x2 : 0000000000007ff0 x1 : ffff000004570010 x0 : ffff000019600000
[   16.349071] Call trace:
[   16.360143]  __pi_memcpy_generic+0x128/0x22c (P)
[   16.380310]  swiotlb_tbl_map_single+0x154/0x2b4
[   16.400282]  swiotlb_map+0x5c/0x228
[   16.415984]  dma_map_phys+0x244/0x2b8
[   16.432199]  dma_map_page_attrs+0x44/0x58
[   16.449782]  virtqueue_map_page_attrs+0x38/0x44
[   16.469596]  virtqueue_map_single_attrs+0xc0/0x130
[   16.490509]  virtnet_rq_alloc.isra.0+0xa4/0x1fc
[   16.510355]  try_fill_recv+0x2a4/0x584
[   16.526989]  virtnet_open+0xd4/0x238
[   16.542775]  __dev_open+0x110/0x24c
[   16.558280]  __dev_change_flags+0x194/0x20c
[   16.576879]  netif_change_flags+0x24/0x6c
[   16.594489]  dev_change_flags+0x48/0x7c
[   16.611462]  ip_auto_config+0x258/0x1114
[   16.628727]  do_one_initcall+0x80/0x1c8
[   16.645590]  kernel_init_freeable+0x208/0x2f0
[   16.664917]  kernel_init+0x24/0x1e0
[   16.680295]  ret_from_fork+0x10/0x20
[   16.696369] Code: 927cec03 cb0e0021 8b0e0042 a9411c26 (a900340c)
[   16.723106] ---[ end trace 0000000000000000 ]---
[   16.752866] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[   16.792556] Kernel Offset: 0x3396ea200000 from 0xffff800080000000
[   16.818966] PHYS_OFFSET: 0xfff1000080000000
[   16.837237] CPU features: 0x0000000,00060005,13e38581,957e772f
[   16.862904] Memory Limit: none
[   16.876526] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

This panic occurs because the swiotlb memory was previously shared to
the host (__set_memory_enc_dec()), which involves transitioning the
(large) leaf mappings to invalid, sharing to the host, then marking the
mappings valid again. But pageattr_p[mu]d_entry() would only update the
entry if it is a section mapping, since otherwise it concluded it must
be a table entry so shouldn't be modified. But p[mu]d_sect() only
returns true if the entry is valid. So the result was that the large
leaf entry was made invalid in the first pass then ignored in the second
pass. It remains invalid until the above code tries to access it and
blows up.

The simple fix would be to update pageattr_pmd_entry() to use
!pmd_table() instead of pmd_sect(). That would solve this problem.

But the ptdump code also suffers from a similar issue. It checks
pmd_leaf() and doesn't call into the arch-specific note_page() machinery
if it returns false. As a result of this, ptdump wasn't even able to
show the invalid large leaf mappings; it looked like they were valid
which made this super fun to debug. the ptdump code is core-mm and
pmd_table() is arm64-specific so we can't use the same trick to solve
that.

But we already support the concept of "present-invalid" for user space
entries. And even better, pmd_leaf() will return true for a leaf mapping
that is marked present-invalid. So let's just use that encoding for
present-invalid kernel mappings too. Then we can use pmd_leaf() where we
previously used pmd_sect() and everything is magically fixed.

Additionally, from inspection kernel_page_present() was broken in a
similar way, so I'm also updating that to use pmd_leaf().

The transitional page tables component was also similarly broken; it
creates a copy of the kernel page tables, making RO leaf mappings RW in
the process. It also makes invalid (but-not-none) pte mappings valid.
But it was not doing this for large leaf mappings. This could have
resulted in crashes at kexec- or hibernate-time. This code is fixed to
flip "present-invalid" mappings back to "present-valid" at all levels.

Finally, I have hardened split_pmd()/split_pud() so that if it is passed
a "present-invalid" leaf, it will maintain that property in the split
leaves, since I wasn't able to convince myself that it would only ever
be called for "present-valid" leaves.

Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
Cc: stable@vger.kernel.org
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
13 days agoarm64: mm: Fix rodata=full block mapping support for realm guests
Ryan Roberts [Mon, 30 Mar 2026 16:17:02 +0000 (17:17 +0100)] 
arm64: mm: Fix rodata=full block mapping support for realm guests

Commit a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") enabled the linear map to be mapped by block/cont while
still allowing granular permission changes on BBML2_NOABORT systems by
lazily splitting the live mappings. This mechanism was intended to be
usable by realm guests since they need to dynamically share dma buffers
with the host by "decrypting" them - which for Arm CCA, means marking
them as shared in the page tables.

However, it turns out that the mechanism was failing for realm guests
because realms need to share their dma buffers (via
__set_memory_enc_dec()) much earlier during boot than
split_kernel_leaf_mapping() was able to handle. The report linked below
showed that GIC's ITS was one such user. But during the investigation I
found other callsites that could not meet the
split_kernel_leaf_mapping() constraints.

The problem is that we block map the linear map based on the boot CPU
supporting BBML2_NOABORT, then check that all the other CPUs support it
too when finalizing the caps. If they don't, then we stop_machine() and
split to ptes. For safety, split_kernel_leaf_mapping() previously
wouldn't permit splitting until after the caps were finalized. That
ensured that if any secondary cpus were running that didn't support
BBML2_NOABORT, we wouldn't risk breaking them.

I've fix this problem by reducing the black-out window where we refuse
to split; there are now 2 windows. The first is from T0 until the page
allocator is inititialized. Splitting allocates memory for the page
allocator so it must be in use. The second covers the period between
starting to online the secondary cpus until the system caps are
finalized (this is a very small window).

All of the problematic callers are calling __set_memory_enc_dec() before
the secondary cpus come online, so this solves the problem. However, one
of these callers, swiotlb_update_mem_attributes(), was trying to split
before the page allocator was initialized. So I have moved this call
from arch_mm_preinit() to mem_init(), which solves the ordering issue.

I've added warnings and return an error if any attempt is made to split
in the black-out windows.

Note there are other issues which prevent booting all the way to user
space, which will be fixed in subsequent patches.

Reported-by: Jinjiang Tu <tujinjiang@huawei.com>
Closes: https://lore.kernel.org/all/0b2a4ae5-fc51-4d77-b177-b2e9db74f11d@huawei.com/
Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
Cc: stable@vger.kernel.org
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
13 days agoeventpoll: defer struct eventpoll free to RCU grace period
Nicholas Carlini [Tue, 31 Mar 2026 13:25:32 +0000 (15:25 +0200)] 
eventpoll: defer struct eventpoll free to RCU grace period

In certain situations, ep_free() in eventpoll.c will kfree the epi->ep
eventpoll struct while it still being used by another concurrent thread.
Defer the kfree() to an RCU callback to prevent UAF.

Fixes: f2e467a48287 ("eventpoll: Fix semi-unbounded recursion")
Signed-off-by: Nicholas Carlini <nicholas@carlini.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
13 days agoaccel/ivpu: Trigger recovery on TDR with OS scheduling
Karol Wachowski [Thu, 2 Apr 2026 12:55:26 +0000 (14:55 +0200)] 
accel/ivpu: Trigger recovery on TDR with OS scheduling

With OS scheduling mode the driver cannot determine which context
caused the timeout, so context abort cannot be used. Instead of
queuing context_abort_work, directly trigger full device recovery
when a job timeout (TDR) occurs in OS scheduling mode.

Fixes: ade00a6c903f ("accel/ivpu: Perform engine reset instead of device recovery on TDR")
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20260402125526.845210-1-karol.wachowski@linux.intel.com
13 days agosched_ext: Fix is_bpf_migration_disabled() false negative on non-PREEMPT_RCU
Changwoo Min [Thu, 2 Apr 2026 02:31:50 +0000 (11:31 +0900)] 
sched_ext: Fix is_bpf_migration_disabled() false negative on non-PREEMPT_RCU

Since commit 8e4f0b1ebcf2 ("bpf: use rcu_read_lock_dont_migrate() for
trampoline.c"), the BPF prolog (__bpf_prog_enter) calls migrate_disable()
only when CONFIG_PREEMPT_RCU is enabled, via rcu_read_lock_dont_migrate().
Without CONFIG_PREEMPT_RCU, the prolog never touches migration_disabled,
so migration_disabled == 1 always means the task is truly
migration-disabled regardless of whether it is the current task.

The old unconditional p == current check was a false negative in this
case, potentially allowing a migration-disabled task to be dispatched to
a remote CPU and triggering scx_error in task_can_run_on_remote_rq().

Only apply the p == current disambiguation when CONFIG_PREEMPT_RCU is
enabled, where the ambiguity with the BPF prolog still exists.

Fixes: 8e4f0b1ebcf2 ("bpf: use rcu_read_lock_dont_migrate() for trampoline.c")
Cc: stable@vger.kernel.org # v6.18+
Link: https://lore.kernel.org/lkml/20250821090609.42508-8-dongml2@chinatelecom.cn/
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
13 days agodrm/amd/display: Wire up dcn10_dio_construct() for all pre-DCN401 generations
Ionut Nechita [Mon, 23 Mar 2026 21:13:43 +0000 (23:13 +0200)] 
drm/amd/display: Wire up dcn10_dio_construct() for all pre-DCN401 generations

Description:
 - Commit b82f0759346617b2 ("drm/amd/display: Migrate DIO registers access
   from hwseq to dio component") moved DIO_MEM_PWR_CTRL register access
   behind the new dio abstraction layer but only created the dio object for
   DCN 4.01. On all other generations (DCN 10/20/21/201/30/301/302/303/
   31/314/315/316/32/321/35/351/36), the dio pointer is NULL, causing the
   register write to be silently skipped.

   This results in AFMT HDMI memory not being powered on during init_hw,
   which can cause HDMI audio failures and display issues on affected
   hardware including Renoir/Cezanne (DCN 2.1) APUs that use dcn10_init_hw.

   Call dcn10_dio_construct() in each older DCN generation's resource.c
   to create the dio object, following the same pattern as DCN 4.01. This
   ensures the dio pointer is non-NULL and the mem_pwr_ctrl callback works
   through the dio abstraction for all DCN generations.

Fixes: b82f07593466 ("drm/amd/display: Migrate DIO registers access from hwseq to dio component.")
Reviewed-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Ionut Nechita <ionut_n2001@yahoo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
13 days agosched_ext: Fix missing warning in scx_set_task_state() default case
Samuele Mariotti [Thu, 2 Apr 2026 17:00:25 +0000 (19:00 +0200)] 
sched_ext: Fix missing warning in scx_set_task_state() default case

In scx_set_task_state(), the default case was setting the
warn flag, but then returning immediately. This is problematic
because the only purpose of the warn flag is to trigger
WARN_ONCE, but the early return prevented it from ever firing,
leaving invalid task states undetected and untraced.

To fix this, a WARN_ONCE call is now added directly in the
default case.

The fix addresses two aspects:

 - Guarantees the invalid task states are properly logged
   and traced.

 - Provides a distinct warning message
   ("sched_ext: Invalid task state") specifically for
   states outside the defined scx_task_state enum values,
   making it easier to distinguish from other transition
   warnings.

This ensures proper detection and reporting of invalid states.

Signed-off-by: Samuele Mariotti <smariotti@disroot.org>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
13 days agoMerge tag 'v7.0-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd
Linus Torvalds [Thu, 2 Apr 2026 19:03:15 +0000 (12:03 -0700)] 
Merge tag 'v7.0-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd

Pull smb server fix from Steve French:

 - Fix out of bound write

* tag 'v7.0-rc6-ksmbd-server-fix' of git://git.samba.org/ksmbd:
  ksmbd: fix OOB write in QUERY_INFO for compound requests

13 days agoata: libata-transport: remove static variable ata_scsi_transport_template
Heiner Kallweit [Thu, 2 Apr 2026 13:32:13 +0000 (15:32 +0200)] 
ata: libata-transport: remove static variable ata_scsi_transport_template

Simplify the code by making struct ata_scsi_transportt public, instead
of using separate variable ata_scsi_transport_template.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
13 days agoata: libata-transport: split struct ata_internal
Heiner Kallweit [Thu, 2 Apr 2026 13:31:22 +0000 (15:31 +0200)] 
ata: libata-transport: split struct ata_internal

There's no need for an umbrella struct, so remove it. It's also a
prerequisite for making the embedded struct scsi_transport_template
public.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
13 days agoata: libata-transport: use static struct ata_transport_internal to simplify match...
Heiner Kallweit [Thu, 2 Apr 2026 13:30:48 +0000 (15:30 +0200)] 
ata: libata-transport: use static struct ata_transport_internal to simplify match functions

Both matching functions can make use of static struct
ata_transport_internal. This eliminates the dependency on static
variable ata_scsi_transport_template, and it allows to remove helper
to_ata_internal(). Small drawback is that a forward declaration of
both functions is needed.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
13 days agoata: libata-transport: inline ata_attach|release_transport
Heiner Kallweit [Thu, 2 Apr 2026 13:30:05 +0000 (15:30 +0200)] 
ata: libata-transport: inline ata_attach|release_transport

Both functions are helpers which are used only once. So remove them and
merge their code into libata_transport_init() and libata_transport_exit()
respectively.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
13 days agoata: libata-transport: instantiate struct ata_internal statically
Heiner Kallweit [Thu, 2 Apr 2026 13:29:21 +0000 (15:29 +0200)] 
ata: libata-transport: instantiate struct ata_internal statically

Struct ata_internal is only instantiated once, in module init code.
So we can also instantiate it statically, which allows simplifying
the code.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
13 days agoMerge branch 'net-stmmac-tso-fixes-cleanups'
Jakub Kicinski [Thu, 2 Apr 2026 18:28:23 +0000 (11:28 -0700)] 
Merge branch 'net-stmmac-tso-fixes-cleanups'

Russell King says:

====================
net: stmmac: TSO fixes/cleanups

This is a more refined version of the previous patch series fixing
and cleaning up the TSO code.

I'm not sure whether "TSO" or "GSO" should be used to describe this
feature - although it primarily handles TCP, dwmac4 appears to also
be able to handle UDP.

In essence, this series adds a .ndo_features_check() method to handle
whether TSO/GSO can be used for a particular skbuff - checking which
queue the skbuff is destined for and whether that has TBS available
which precludes TSO being enabled on that channel.

I'm also adding a check that the header is smaller than 1024 bytes,
as documented in those sources which have TSO support - this is due
to the hardware buffering the header in "TSO memory" which I guess
is limited to 1KiB. I expect this test never to trigger, but if
the headers ever exceed that size, the hardware will likely fail.
While IPv4 headers are unlikely to be anywhere near this, there is
nothing in the protocol which prevents IPv6 headers up to 64KiB.

As we now have a .ndo_features_check() method, I'm moving the VLAN
insertion for TSO packets into core code by unpublishing the VLAN
insertion features when we use TSO. Another move is for checksumming,
which is required for TSO, but stmmac's requirements for offloading
checksums are more strict - and this seems to be a bug in the TSO
path.

I've changed the hardware initialisation to always enable TSO support
on the channels even if the user requests TSO/GSO to be disabled -
this fixes another issue as pointed out by Jakub in a previous review.

I'm moving the setup of the GSO features, cleaning those up, and
adding a warning if platform glue requests this to be enabled but the
hardware has no support. Hopefully this will never trigger if everyone
got the STMMAC_FLAG_TSO_EN flag correct. Also adding a check for TxPBL
value.

Finally, moving the "TSO supported" message to the new
stmmac_set_gso_features() function so keep all this TSO stuff together.
====================

Link: https://patch.msgid.link/aczHVF04LIGq_lYO@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: move "TSO supported" message to stmmac_set_gso_features()
Russell King (Oracle) [Wed, 1 Apr 2026 07:22:20 +0000 (08:22 +0100)] 
net: stmmac: move "TSO supported" message to stmmac_set_gso_features()

Move the "TSO supported" message to stmmac_set_gso_features() so that
we group all probe-time TSO stuff in one place.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pu8-0000000Eau5-3Zne@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: check txpbl for TSO
Russell King (Oracle) [Wed, 1 Apr 2026 07:22:15 +0000 (08:22 +0100)] 
net: stmmac: check txpbl for TSO

Documentation states that TxPBL must be >= 4 to allow TSO support, but
the driver doesn't check this. TxPBL comes from the platform glue code
or DT. Add a check with a warning if platform glue code attempts to
enable TSO support with TxPBL too low.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pu3-0000000Eatz-39ts@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: add warning when TSO is requested but unsupported
Russell King (Oracle) [Wed, 1 Apr 2026 07:22:10 +0000 (08:22 +0100)] 
net: stmmac: add warning when TSO is requested but unsupported

Add a warning message if TSO is requested by the platform glue code but
the core wasn't configured for TSO.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pty-0000000Eatt-2TjZ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: make stmmac_set_gso_features() more readable
Russell King (Oracle) [Wed, 1 Apr 2026 07:22:05 +0000 (08:22 +0100)] 
net: stmmac: make stmmac_set_gso_features() more readable

Make stmmac_set_gso_features() more readable by adding some whitespace
and getting rid of the indentation.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptt-0000000Eatn-1ziK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: split out gso features setup
Russell King (Oracle) [Wed, 1 Apr 2026 07:22:00 +0000 (08:22 +0100)] 
net: stmmac: split out gso features setup

Move the GSO features setup into a separate function, co-loated with
other GSO/TSO support.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pto-0000000Eath-1VDH@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: simplify GSO/TSO test in stmmac_xmit()
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:55 +0000 (08:21 +0100)] 
net: stmmac: simplify GSO/TSO test in stmmac_xmit()

The test in stmmac_xmit() to see whether we should pass the skbuff to
stmmac_tso_xmit() is more complex than it needs to be. This test can
be simplified by storing the mask of GSO types that we will pass, and
setting it according to the enabled features.

Note that "tso" is a mis-nomer since commit b776620651a1 ("net:
stmmac: Implement UDP Segmentation Offload"). Also note that this
commit controls both via the TSO feature. We preserve this behaviour
in this commit.

Also, this commit unconditionally accessed skb_shinfo(skb)->gso_type
for all frames, even when skb_is_gso() was false. This access is
eliminated.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptj-0000000Eatb-11zK@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: move check for hardware checksum supported
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:50 +0000 (08:21 +0100)] 
net: stmmac: move check for hardware checksum supported

Add a check in .ndo_features_check() to indicate whether hardware
checksum can be performed on the skbuff. Where hardware checksum is
not supported - either because the channel does not support Tx COE
or the skb isn't suitable (stmmac uses a tighter test than
can_checksum_protocol()) we also need to disable TSO, which will be
done by harmonize_features() in net/core/dev.c

This fixes a bug where a channel which has COE disabled may still
receive TSO skbuffs.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pte-0000000EatU-0ILt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: move TSO VLAN tag insertion to core code
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:44 +0000 (08:21 +0100)] 
net: stmmac: move TSO VLAN tag insertion to core code

stmmac_tso_xmit() checks whether the skbuff is trying to offload
vlan tag insertion to hardware, which from the comment in the code
appears to be buggy when the TSO feature is used.

Rather than stmmac_tso_xmit() inserting the VLAN tag, handle this
in stmmac_features_check() which will then use core net code to
handle this. See net/core/dev.c::validate_xmit_skb()

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptY-0000000EatO-42Qv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: add GSO MSS checks
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:39 +0000 (08:21 +0100)] 
net: stmmac: add GSO MSS checks

Add GSO MSS checks to stmmac_features_check().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptT-0000000EatI-3feh@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: add TSO check for header length
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:34 +0000 (08:21 +0100)] 
net: stmmac: add TSO check for header length

According to the STM32MP151 documentation which covers dwmac v4.2, the
hardware TSO feature can handle header lengths up to a maximum of 1023
bytes.

Add a .ndo_features_check() method implementation to check the header
length meets these requirements, otherwise fall back to software GSO.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptO-0000000EatC-39il@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: add stmmac_tso_header_size()
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:29 +0000 (08:21 +0100)] 
net: stmmac: add stmmac_tso_header_size()

We will need to compute the size of the protocol headers in two places,
so move this into a separate function.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptJ-0000000Eat5-2ZlA@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: fix TSO support when some channels have TBS available
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:24 +0000 (08:21 +0100)] 
net: stmmac: fix TSO support when some channels have TBS available

According to the STM32MP25xx manual, which is dwmac v5.3, TBS (time
based scheduling) is not permitted for channels which have hardware
TSO enabled. Intel's commit 5e6038b88a57 ("net: stmmac: fix TSO and
TBS feature enabling during driver open") concurs with this, but it
is incomplete.

This commit avoids enabling TSO support on the channels which have
TBS available, which, as far as the hardware is concerned, means we
do not set the TSE bit in the DMA channel's transmit control register.

However, the net device's features apply to all queues(channels), which
means these channels may still be handed TSO skbs to transmit, and the
driver will pass them to stmmac_tso_xmit(). This will generate the
descriptors for TSO, even though the channel has the TSE bit clear.

Fix this by checking whether the queue(channel) has TBS available,
and if it does, fall back to software GSO support.

Fixes: 5e6038b88a57 ("net: stmmac: fix TSO and TBS feature enabling during driver open")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7ptE-0000000Easz-28tv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: fix .ndo_fix_features()
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:19 +0000 (08:21 +0100)] 
net: stmmac: fix .ndo_fix_features()

netdev features documentation requires that .ndo_fix_features() is
stateless: it shouldn't modify driver state. Yet, stmmac_fix_features()
does exactly that, changing whether GSO frames are processed by the
driver.

Move this code to stmmac_set_features() instead, which is the correct
place for it. We don't need to check whether TSO is supported; this
is already handled via the setup of netdev->hw_features, and we are
guaranteed that if netdev->hw_features indicates that a feature is
not supported, .ndo_set_features() won't be called with it set.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pt9-0000000East-1YAO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agonet: stmmac: fix channel TSO enable on resume
Russell King (Oracle) [Wed, 1 Apr 2026 07:21:14 +0000 (08:21 +0100)] 
net: stmmac: fix channel TSO enable on resume

Rather than configuring the channels depending on whether GSO/TSO is
currently enabled by the user, always enable if the hardware has TSO
support and the platform wants TSO to be enabled.

This avoids the channel TSO enable bit being disabled after a resume
when the user has disabled TSO features. This will cause problems when
the user re-enables TSO.

This bug goes back to commit f748be531d70 ("stmmac: support new GMAC4")

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1w7pt4-0000000Easn-14WL@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agoata: libata-eh: Do not retry reset if the device is gone
Igor Pylypiv [Thu, 2 Apr 2026 16:07:05 +0000 (09:07 -0700)] 
ata: libata-eh: Do not retry reset if the device is gone

If a device is hot-unplugged or otherwise disappears during error handling,
ata_eh_reset() may fail with -ENODEV. Currently, the error handler will
continue to retry the reset operation up to max_tries times.

Prevent unnecessary reset retries by exiting the loop early when
ata_do_reset() returns -ENODEV.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
13 days agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 2 Apr 2026 17:57:09 +0000 (10:57 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-7.0-rc7).

Conflicts:

net/vmw_vsock/af_vsock.c
  b18c83388874 ("vsock: initialize child_ns_mode_locked in vsock_net_init()")
  0de607dc4fd8 ("vsock: add G2H fallback for CIDs not owned by H2G transport")

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
  ceee35e5674a ("bnxt_en: Refactor some basic ring setup and adjustment logic")
  57cdfe0dc70b ("bnxt_en: Resize RSS contexts on channel count change")

drivers/net/wireless/intel/iwlwifi/mld/mac80211.c
  4d56037a02bd ("wifi: iwlwifi: mld: block EMLSR during TDLS connections")
  687a95d204e7 ("wifi: iwlwifi: mld: correctly set wifi generation data")

drivers/net/wireless/intel/iwlwifi/mld/scan.h
  b6045c899e37 ("wifi: iwlwifi: mld: Refactor scan command handling")
  ec66ec6a5a8f ("wifi: iwlwifi: mld: Fix MLO scan timing")

drivers/net/wireless/intel/iwlwifi/mvm/fw.c
  078df640ef05 ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v
2")
  323156c3541e ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days agoPCI: dwc: Fix type mismatch for kstrtou32_from_user() return value
Hans Zhang [Wed, 1 Apr 2026 02:30:48 +0000 (10:30 +0800)] 
PCI: dwc: Fix type mismatch for kstrtou32_from_user() return value

kstrtou32_from_user() returns int, but the return value was stored in
a u32 variable 'val', risking sign loss. Use a dedicated int variable
to correctly handle the return code.

Fixes: 4fbfa17f9a07 ("PCI: dwc: Add debugfs based Silicon Debug support for DWC")
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260401023048.4182452-1-18255117159@163.com
13 days agoMerge tag 'for-7.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 2 Apr 2026 17:31:30 +0000 (10:31 -0700)] 
Merge tag 'for-7.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "One more fix for a potential extent tree corruption due to an
  unexpected error value.

  When the search for an extent item failed, it under some circumstances
  was reported as a success to the caller"

* tag 'for-7.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix incorrect return value after changing leaf in lookup_extent_data_ref()

13 days agothermal/drivers/brcmstb_thermal: Use max to simplify brcmstb_get_temp
Thorsten Blum [Thu, 2 Apr 2026 16:56:18 +0000 (18:56 +0200)] 
thermal/drivers/brcmstb_thermal: Use max to simplify brcmstb_get_temp

Use max() to simplify brcmstb_get_temp() and improve its readability.
Since avs_tmon_code_to_temp() returns an int, change the data type of
the local variable 't' from long to int.  No functional changes.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260402165616.895305-3-thorsten.blum@linux.dev
13 days agoselftests/bpf: Add more precision tracking tests for atomics
Daniel Borkmann [Tue, 31 Mar 2026 22:20:20 +0000 (00:20 +0200)] 
selftests/bpf: Add more precision tracking tests for atomics

Add verifier precision tracking tests for BPF atomic fetch operations.
Validate that backtrack_insn correctly propagates precision from the
fetch dst_reg to the stack slot for {fetch_add,xchg,cmpxchg} atomics.
For the first two src_reg gets the old memory value, and for the last
one r0. The fetched register is used for pointer arithmetic to trigger
backtracking. Also add coverage for fetch_{or,and,xor} flavors which
exercises the bitwise atomic fetch variants going through the same
insn->imm & BPF_FETCH check but with different imm values.

Add dual-precision regression tests for fetch_add and cmpxchg where
both the fetched value and a reread of the same stack slot are tracked
for precision. After the atomic operation, the stack slot is STACK_MISC,
so the ldx does not set INSN_F_STACK_ACCESS. These tests verify that
stack precision propagates solely through the atomic fetch's load side.

Add map-based tests for fetch_add and cmpxchg which validate that non-
stack atomic fetch completes precision tracking without falling back
to mark_all_scalars_precise. Lastly, add 32-bit variants for {fetch_add,
cmpxchg} on map values to cover the second valid atomic operand size.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_precision
  [...]
  + /etc/rcS.d/S50-startup
  ./test_progs -t verifier_precision
  [    1.697105] bpf_testmod: loading out-of-tree module taints kernel.
  [    1.700220] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  [    1.777043] tsc: Refined TSC clocksource calibration: 3407.986 MHz
  [    1.777619] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc6d7268, max_idle_ns: 440795260133 ns
  [    1.778658] clocksource: Switched to clocksource tsc
  #633/1   verifier_precision/bpf_neg:OK
  #633/2   verifier_precision/bpf_end_to_le:OK
  #633/3   verifier_precision/bpf_end_to_be:OK
  #633/4   verifier_precision/bpf_end_bswap:OK
  #633/5   verifier_precision/bpf_load_acquire:OK
  #633/6   verifier_precision/bpf_store_release:OK
  #633/7   verifier_precision/state_loop_first_last_equal:OK
  #633/8   verifier_precision/bpf_cond_op_r10:OK
  #633/9   verifier_precision/bpf_cond_op_not_r10:OK
  #633/10  verifier_precision/bpf_atomic_fetch_add_precision:OK
  #633/11  verifier_precision/bpf_atomic_xchg_precision:OK
  #633/12  verifier_precision/bpf_atomic_fetch_or_precision:OK
  #633/13  verifier_precision/bpf_atomic_fetch_and_precision:OK
  #633/14  verifier_precision/bpf_atomic_fetch_xor_precision:OK
  #633/15  verifier_precision/bpf_atomic_cmpxchg_precision:OK
  #633/16  verifier_precision/bpf_atomic_fetch_add_dual_precision:OK
  #633/17  verifier_precision/bpf_atomic_cmpxchg_dual_precision:OK
  #633/18  verifier_precision/bpf_atomic_fetch_add_map_precision:OK
  #633/19  verifier_precision/bpf_atomic_cmpxchg_map_precision:OK
  #633/20  verifier_precision/bpf_atomic_fetch_add_32bit_precision:OK
  #633/21  verifier_precision/bpf_atomic_cmpxchg_32bit_precision:OK
  #633/22  verifier_precision/bpf_neg_2:OK
  #633/23  verifier_precision/bpf_neg_3:OK
  #633/24  verifier_precision/bpf_neg_4:OK
  #633/25  verifier_precision/bpf_neg_5:OK
  #633     verifier_precision:OK
  Summary: 1/25 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260331222020.401848-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agobpf: Fix incorrect pruning due to atomic fetch precision tracking
Daniel Borkmann [Tue, 31 Mar 2026 22:20:19 +0000 (00:20 +0200)] 
bpf: Fix incorrect pruning due to atomic fetch precision tracking

When backtrack_insn encounters a BPF_STX instruction with BPF_ATOMIC
and BPF_FETCH, the src register (or r0 for BPF_CMPXCHG) also acts as
a destination, thus receiving the old value from the memory location.

The current backtracking logic does not account for this. It treats
atomic fetch operations the same as regular stores where the src
register is only an input. This leads the backtrack_insn to fail to
propagate precision to the stack location, which is then not marked
as precise!

Later, the verifier's path pruning can incorrectly consider two states
equivalent when they differ in terms of stack state. Meaning, two
branches can be treated as equivalent and thus get pruned when they
should not be seen as such.

Fix it as follows: Extend the BPF_LDX handling in backtrack_insn to
also cover atomic fetch operations via is_atomic_fetch_insn() helper.
When the fetch dst register is being tracked for precision, clear it,
and propagate precision over to the stack slot. For non-stack memory,
the precision walk stops at the atomic instruction, same as regular
BPF_LDX. This covers all fetch variants.

Before:

  0: (b7) r1 = 8                        ; R1=8
  1: (7b) *(u64 *)(r10 -8) = r1         ; R1=8 R10=fp0 fp-8=8
  2: (b7) r2 = 0                        ; R2=0
  3: (db) r2 = atomic64_fetch_add((u64 *)(r10 -8), r2)          ; R2=8 R10=fp0 fp-8=mmmmmmmm
  4: (bf) r3 = r10                      ; R3=fp0 R10=fp0
  5: (0f) r3 += r2
  mark_precise: frame0: last_idx 5 first_idx 0 subseq_idx -1
  mark_precise: frame0: regs=r2 stack= before 4: (bf) r3 = r10
  mark_precise: frame0: regs=r2 stack= before 3: (db) r2 = atomic64_fetch_add((u64 *)(r10 -8), r2)
  mark_precise: frame0: regs=r2 stack= before 2: (b7) r2 = 0
  6: R2=8 R3=fp8
  6: (b7) r0 = 0                        ; R0=0
  7: (95) exit

After:

  0: (b7) r1 = 8                        ; R1=8
  1: (7b) *(u64 *)(r10 -8) = r1         ; R1=8 R10=fp0 fp-8=8
  2: (b7) r2 = 0                        ; R2=0
  3: (db) r2 = atomic64_fetch_add((u64 *)(r10 -8), r2)          ; R2=8 R10=fp0 fp-8=mmmmmmmm
  4: (bf) r3 = r10                      ; R3=fp0 R10=fp0
  5: (0f) r3 += r2
  mark_precise: frame0: last_idx 5 first_idx 0 subseq_idx -1
  mark_precise: frame0: regs=r2 stack= before 4: (bf) r3 = r10
  mark_precise: frame0: regs=r2 stack= before 3: (db) r2 = atomic64_fetch_add((u64 *)(r10 -8), r2)
  mark_precise: frame0: regs= stack=-8 before 2: (b7) r2 = 0
  mark_precise: frame0: regs= stack=-8 before 1: (7b) *(u64 *)(r10 -8) = r1
  mark_precise: frame0: regs=r1 stack= before 0: (b7) r1 = 8
  6: R2=8 R3=fp8
  6: (b7) r0 = 0                        ; R0=0
  7: (95) exit

Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg")
Fixes: 5ca419f2864a ("bpf: Add BPF_FETCH field / create atomic_fetch_add instruction")
Reported-by: STAR Labs SG <info@starlabs.sg>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260331222020.401848-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agoMerge tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 2 Apr 2026 16:57:06 +0000 (09:57 -0700)] 
Merge tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "With fixes from wireless, bluetooth and netfilter included we're back
  to each PR carrying 30%+ more fixes than in previous era.

  The good news is that so far none of the "extra" fixes are themselves
  causing real regressions. Not sure how much comfort that is.

  Current release - fix to a fix:

   - netdevsim: fix build if SKB_EXTENSIONS=n

   - eth: stmmac: skip VLAN restore when VLAN hash ops are missing

  Previous releases - regressions:

   - wifi: iwlwifi: mvm: don't send a 6E related command when
     not supported

  Previous releases - always broken:

   - some info leak fixes

   - add missing clearing of skb->cb[] on ICMP paths from tunnels

   - ipv6:
      - flowlabel: defer exclusive option free until RCU teardown
      - avoid overflows in ip6_datagram_send_ctl()

   - mpls: add seqcount to protect platform_labels from OOB access

   - bridge: improve safety of parsing ND options

   - bluetooth: fix leaks, overflows and races in hci_sync

   - netfilter: add more input validation, some to address bugs directly
     some to prevent exploits from cooking up broken configurations

   - wifi:
      - ath: avoid poor performance due to stopping the wrong
        aggregation session
      - virt_wifi: remove SET_NETDEV_DEV to avoid use-after-free

   - eth:
      - fec: fix the PTP periodic output sysfs interface
      - enetc: safely reinitialize TX BD ring when it has unsent frames"

* tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
  eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64
  ipv6: avoid overflows in ip6_datagram_send_ctl()
  net: hsr: fix VLAN add unwind on slave errors
  net: hsr: serialize seq_blocks merge across nodes
  vsock: initialize child_ns_mode_locked in vsock_net_init()
  selftests/tc-testing: add tests for cls_fw and cls_flow on shared blocks
  net/sched: cls_flow: fix NULL pointer dereference on shared blocks
  net/sched: cls_fw: fix NULL pointer dereference on shared blocks
  net/x25: Fix overflow when accumulating packets
  net/x25: Fix potential double free of skb
  bnxt_en: Restore default stat ctxs for ULP when resource is available
  bnxt_en: Don't assume XDP is never enabled in bnxt_init_dflt_ring_mode()
  bnxt_en: Refactor some basic ring setup and adjustment logic
  net/mlx5: Fix switchdev mode rollback in case of failure
  net/mlx5: Avoid "No data available" when FW version queries fail
  net/mlx5: lag: Check for LAG device before creating debugfs
  net: macb: properly unregister fixed rate clocks
  net: macb: fix clk handling on PCI glue driver removal
  virtio_net: clamp rss_max_key_size to NETDEV_RSS_KEY_LEN
  net/sched: sch_netem: fix out-of-bounds access in packet corruption
  ...

13 days agoMerge tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 2 Apr 2026 16:53:16 +0000 (09:53 -0700)] 
Merge tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:

 - IOMMU-PT related compile breakage in for AMD driver

 - IOTLB flushing behavior when unmapped region is larger than requested
   due to page-sizes

 - Fix IOTLB flush behavior with empty gathers

* tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inline
  iommupt: Fix short gather if the unmap goes into a large mapping
  iommu: Do not call drivers for empty gathers

13 days agobpf: Reject sleepable kprobe_multi programs at attach time
Varun R Mallya [Wed, 1 Apr 2026 19:11:25 +0000 (00:41 +0530)] 
bpf: Reject sleepable kprobe_multi programs at attach time

kprobe.multi programs run in atomic/RCU context and cannot sleep.
However, bpf_kprobe_multi_link_attach() did not validate whether the
program being attached had the sleepable flag set, allowing sleepable
helpers such as bpf_copy_from_user() to be invoked from a non-sleepable
context.

This causes a "sleeping function called from invalid context" splat:

  BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:169
  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1787, name: sudo
  preempt_count: 1, expected: 0
  RCU nest depth: 2, expected: 0

Fix this by rejecting sleepable programs early in
bpf_kprobe_multi_link_attach(), before any further processing.

Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
Signed-off-by: Varun R Mallya <varunrmallya@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Leon Hwang <leon.hwang@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260401191126.440683-1-varunrmallya@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agobpf: reject direct access to nullable PTR_TO_BUF pointers
Qi Tang [Thu, 2 Apr 2026 09:29:22 +0000 (17:29 +0800)] 
bpf: reject direct access to nullable PTR_TO_BUF pointers

check_mem_access() matches PTR_TO_BUF via base_type() which strips
PTR_MAYBE_NULL, allowing direct dereference without a null check.

Map iterator ctx->key and ctx->value are PTR_TO_BUF | PTR_MAYBE_NULL.
On stop callbacks these are NULL, causing a kernel NULL dereference.

Add a type_may_be_null() guard to the PTR_TO_BUF branch, matching the
existing PTR_TO_BTF_ID pattern.

Fixes: 20b2aff4bc15 ("bpf: Introduce MEM_RDONLY flag")
Signed-off-by: Qi Tang <tpluszz77@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260402092923.38357-2-tpluszz77@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agoMerge tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 2 Apr 2026 16:41:21 +0000 (09:41 -0700)] 
Merge tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "People have been so busy for hunting and we're still getting more
  changes than wished for, but it doesn't look too scary; almost all
  changes are device-specific small fixes.

  I guess it's rather a casual bump, and no more Easter eggs are left
  for 7.0 (hopefully)...

   - Fixes for the recent regression on ctxfi driver

   - Fix missing INIT_LIST_HEAD() for ASoC card_aux_list

   - Usual HD- and USB-audio, and ASoC AMD quirk updates

   - ASoC fixes for AMD and Intel"

* tag 'sound-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
  ASoC: amd: ps: Fix missing leading zeros in subsystem_device SSID log
  ALSA: usb-audio: Exclude Scarlett 2i2 1st Gen (8016) from SKIP_IFACE_SETUP
  ALSA: hda/realtek: add quirk for Acer Swift SFG14-73
  ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14IMH9
  ASoC: Intel: boards: fix unmet dependency on PINCTRL
  ASoC: Intel: ehl_rt5660: Use the correct rtd->dev device in hw_params
  ALSA: ctxfi: Don't enumerate SPDIF1 at DAIO initialization
  ALSA: hda/realtek: Add quirk for Lenovo Yoga Slim 7 14AKP10
  ALSA: hda/realtek: add quirk for HP Laptop 15-fc0xxx
  ASoC: ep93xx: Fix unchecked clk_prepare_enable() and add rollback on failure
  ASoC: soc-core: call missing INIT_LIST_HEAD() for card_aux_list
  ALSA: hda/realtek: Add quirk for Samsung Book2 Pro 360 (NP950QED)
  ASoC: amd: yc: Add DMI entry for HP Laptop 15-fc0xxx
  ASoC: amd: yc: Add DMI quirk for ASUS Vivobook Pro 16X OLED M7601RM
  ALSA: hda/realtek: Add quirk for ASUS ROG Strix SCAR 15
  ALSA: usb-audio: Exclude Scarlett Solo 1st Gen from SKIP_IFACE_SETUP
  ALSA: caiaq: fix stack out-of-bounds read in init_card
  ALSA: ctxfi: Check the error for index mapping
  ALSA: ctxfi: Fix missing SPDIFI1 index handling
  ALSA: hda/realtek: add quirk for HP Victus 15-fb0xxx
  ...

13 days agoMerge tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy...
Linus Torvalds [Thu, 2 Apr 2026 16:34:22 +0000 (09:34 -0700)] 
Merge tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay

Pull auxdisplay fixes from Andy Shevchenko:

 - Fix NULL dereference in linedisp_release()

 - Fix ht16k33 DT bindings to avoid warnings

 - Handle errors in I²C transfers in lcd2s driver

* tag 'auxdisplay-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay:
  auxdisplay: line-display: fix NULL dereference in linedisp_release
  auxdisplay: lcd2s: add error handling for i2c transfers
  dt-bindings: auxdisplay: ht16k33: Use unevaluatedProperties to fix common property warning

13 days agoMerge branch 'bpf-migrate-bpf_task_work-and-file-dynptr-to-kmalloc_nolock'
Alexei Starovoitov [Thu, 2 Apr 2026 16:31:42 +0000 (09:31 -0700)] 
Merge branch 'bpf-migrate-bpf_task_work-and-file-dynptr-to-kmalloc_nolock'

Mykyta Yatsenko says:

====================
bpf: Migrate bpf_task_work and file dynptr to kmalloc_nolock

Now that kmalloc can be used from NMI context via kmalloc_nolock(),
migrate BPF internal allocations away from bpf_mem_alloc to use the
standard slab allocator.

Use kfree_rcu() for deferred freeing, which waits for a regular RCU
grace period before the memory is reclaimed. Sleepable BPF programs
hold rcu_read_lock_trace but not regular rcu_read_lock, so patch 1
adds explicit rcu_read_lock/unlock around the pointer-to-refcount
window to prevent kfree_rcu from freeing memory while a sleepable
program is still between reading the pointer and acquiring a
reference.

Patch 1 migrates bpf_task_work_ctx from bpf_mem_alloc/bpf_mem_free to
kmalloc_nolock/kfree_rcu.

Patch 2 migrates bpf_dynptr_file_impl from bpf_mem_alloc/bpf_mem_free
to kmalloc_nolock/kfree.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Changes in v2:
- Switch to scoped_guard in patch 1 (Kumar)
- Remove rcu gp wait in patch 2 (Kumar)
- Defer to irq_work when irqs disabled in patch 1
- use bpf_map_kmalloc_nolock() for bpf_task_work
- use kmalloc_nolock() for file dynptr
- Link to v1: https://lore.kernel.org/all/20260325-kmalloc_special-v1-0-269666afb1ea@meta.com/
====================

Link: https://patch.msgid.link/20260330-kmalloc_special-v2-0-c90403f92ff0@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agobpf: Migrate dynptr file to kmalloc_nolock
Mykyta Yatsenko [Mon, 30 Mar 2026 22:27:57 +0000 (15:27 -0700)] 
bpf: Migrate dynptr file to kmalloc_nolock

Replace bpf_mem_alloc/bpf_mem_free with kmalloc_nolock/kfree_nolock for
bpf_dynptr_file_impl, continuing the migration away from bpf_mem_alloc
now that kmalloc can be used from NMI context.

freader_cleanup() runs before kfree_nolock() while the dynptr still
holds exclusive access, so plain kfree_nolock() is safe — no concurrent
readers can access the object.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260330-kmalloc_special-v2-2-c90403f92ff0@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agobpf: Migrate bpf_task_work to kmalloc_nolock
Mykyta Yatsenko [Mon, 30 Mar 2026 22:27:56 +0000 (15:27 -0700)] 
bpf: Migrate bpf_task_work to kmalloc_nolock

Replace bpf_mem_alloc/bpf_mem_free with
kmalloc_nolock/kfree_rcu for bpf_task_work_ctx.

Replace guard(rcu_tasks_trace)() with guard(rcu)() in
bpf_task_work_irq(). The function only accesses ctx struct members
(not map values), so tasks trace protection is not needed - regular
RCU is sufficient since ctx is freed via kfree_rcu. The guard in
bpf_task_work_callback() remains as tasks trace since it accesses map
values from process context.

Sleepable BPF programs hold rcu_read_lock_trace but not
regular rcu_read_lock. Since kfree_rcu
waits for a regular RCU grace period, the ctx memory can be freed
while a sleepable program is still running. Add scoped_guard(rcu)
around the pointer read and refcount tryget in
bpf_task_work_acquire_ctx to close this race window.

Since kfree_rcu uses call_rcu internally which is not safe from
NMI context, defer destruction via irq_work when irqs are disabled.

For the lost-cmpxchg path the ctx was never published, so
kfree_nolock is safe.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260330-kmalloc_special-v2-1-c90403f92ff0@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agoMAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer
Gautham R. Shenoy [Thu, 2 Apr 2026 10:26:11 +0000 (15:56 +0530)] 
MAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer

Mario Limonciello has led amd-pstate maintenance in recent years and
has done excellent work. The amd-pstate driver is in good hands with
him. I am stepping down as co-maintainer as I move on to other things.

Add K Prateek Nayak as a reviewer. He has been actively contributing
to the driver including preferred-core and ITMT improvements, and has
been helping review amd-pstate patches for a while now.

Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Acked-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://lore.kernel.org/r/20260402102611.16519-1-gautham.shenoy@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agocpufreq: Pass the policy to cpufreq_driver->adjust_perf()
K Prateek Nayak [Mon, 16 Mar 2026 08:18:49 +0000 (08:18 +0000)] 
cpufreq: Pass the policy to cpufreq_driver->adjust_perf()

cpufreq_cpu_get() can sleep on PREEMPT_RT in presence of concurrent
writer(s), however amd-pstate depends on fetching the cpudata via the
policy's driver data which necessitates grabbing the reference.

Since schedutil governor can call "cpufreq_driver->update_perf()"
during sched_tick/enqueue/dequeue with rq_lock held and IRQs disabled,
fetching the policy object using the cpufreq_cpu_get() helper in the
scheduler fast-path leads to "BUG: scheduling while atomic" on
PREEMPT_RT [1].

Pass the cached cpufreq policy object in sg_policy to the update_perf()
instead of just the CPU. The CPU can be inferred using "policy->cpu".

The lifetime of cpufreq_policy object outlasts that of the governor and
the cpufreq driver (allocated when the CPU is onlined and only reclaimed
when the CPU is offlined / the CPU device is removed) which makes it
safe to be referenced throughout the governor's lifetime.

Closes:https://lore.kernel.org/all/20250731092316.3191-1-spasswolf@web.de/ [1]

Fixes: 1d215f0319c2 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State")
Reported-by: Bert Karwatzki <spasswolf@web.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Gary Guo <gary@garyguo.net> # Rust
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260316081849.19368-3-kprateek.nayak@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agocpufreq/amd-pstate: Pass the policy to amd_pstate_update()
K Prateek Nayak [Mon, 16 Mar 2026 08:18:48 +0000 (08:18 +0000)] 
cpufreq/amd-pstate: Pass the policy to amd_pstate_update()

All callers of amd_pstate_update() already have a reference to the
cpufreq_policy object.

Pass the entire policy object and grab the cpudata using
"policy->driver_data" instead of passing the cpudata and unnecessarily
grabbing another read-side reference to the cpufreq policy object when
it is already available in the caller.

No functional changes intended.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20260316081849.19368-2-kprateek.nayak@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agocpufreq/amd-pstate-ut: Add a unit test for raw EPP
Mario Limonciello (AMD) [Sun, 29 Mar 2026 20:38:11 +0000 (15:38 -0500)] 
cpufreq/amd-pstate-ut: Add a unit test for raw EPP

Ensure that all supported raw EPP values work properly.

Export the driver helpers used by the test module so the test can drive
raw EPP writes and temporarily disable dynamic EPP while it runs.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoMerge branch 'bpf-fix-abuse-of-kprobe_write_ctx-via-freplace'
Alexei Starovoitov [Thu, 2 Apr 2026 16:29:49 +0000 (09:29 -0700)] 
Merge branch 'bpf-fix-abuse-of-kprobe_write_ctx-via-freplace'

Leon Hwang says:

====================
bpf: Fix abuse of kprobe_write_ctx via freplace

The potential issue of kprobe_write_ctx+freplace was mentioned in
"bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs" [1].

It is true issue, that the test in patch #2 verifies that kprobe_write_ctx=false
kprobe progs can be abused to modify struct pt_regs via kprobe_write_ctx=true
freplace progs.

When struct pt_regs is modified, bpf_prog_test_run_opts() gets -EFAULT instead
of 0.

test_freplace_kprobe_write_ctx:FAIL:bpf_prog_test_run_opts unexpected error: -14 (errno 14)

We will disallow attaching freplace programs on kprobe programs with different
kprobe_write_ctx values.

Links:
[1] https://lore.kernel.org/bpf/CAP01T74w4KVMn9bEwpQXrk+bqcUxzb6VW1SQ_QvNy0A4EY-9Jg@mail.gmail.com/

Changes:
v2 -> v3:
* Add comment to the rejection of kprobe_write_ctx (per Jiri).
* Use libbpf_get_error() instead of errno in test (per Jiri).
* Collect Acked-by tags from Jiri and Song, thanks.
v2: https://lore.kernel.org/bpf/20260326141718.17731-1-leon.hwang@linux.dev/

v1 -> v2:
* Drop patch #1 in v1, as it wasn't an issue (per Toke).
* Check kprobe_write_ctx value at attach time instead of at load time, to
  prevent attaching kprobe_write_ctx=true freplace progs on
  kprobe_write_ctx=false kprobe progs (per Gemini/sashiko).
* Move kprobe_write_ctx test code to attach_probe.c and kprobe_write_ctx.c.
v1: https://lore.kernel.org/bpf/20260324150444.68166-1-leon.hwang@linux.dev/
====================

Link: https://patch.msgid.link/20260331145353.87606-1-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agoselftests/bpf: Add test to verify the fix of kprobe_write_ctx abuse
Leon Hwang [Tue, 31 Mar 2026 14:53:53 +0000 (22:53 +0800)] 
selftests/bpf: Add test to verify the fix of kprobe_write_ctx abuse

Add a test to verify the issue: kprobe_write_ctx can be abused to modify
struct pt_regs of kernel functions via kprobe_write_ctx=true freplace
progs.

Without the fix, the issue is verified:

kprobe_write_ctx=true freplace prog is allowed to attach to
kprobe_write_ctx=false kprobe prog. Then, the first arg of
bpf_fentry_test1 will be set as 0, and bpf_prog_test_run_opts() gets
-EFAULT instead of 0.

With the fix, the issue is rejected at attach time.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260331145353.87606-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agobpf: Fix abuse of kprobe_write_ctx via freplace
Leon Hwang [Tue, 31 Mar 2026 14:53:52 +0000 (22:53 +0800)] 
bpf: Fix abuse of kprobe_write_ctx via freplace

uprobe programs are allowed to modify struct pt_regs.

Since the actual program type of uprobe is KPROBE, it can be abused to
modify struct pt_regs via kprobe+freplace when the kprobe attaches to
kernel functions.

For example,

SEC("?kprobe")
int kprobe(struct pt_regs *regs)
{
return 0;
}

SEC("?freplace")
int freplace_kprobe(struct pt_regs *regs)
{
regs->di = 0;
return 0;
}

freplace_kprobe prog will attach to kprobe prog.
kprobe prog will attach to a kernel function.

Without this patch, when the kernel function runs, its first arg will
always be set as 0 via the freplace_kprobe prog.

To fix the abuse of kprobe_write_ctx=true via kprobe+freplace, disallow
attaching freplace programs on kprobe programs with different
kprobe_write_ctx values.

Fixes: 7384893d970e ("bpf: Allow uprobe program to change context registers")
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260331145353.87606-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 days agocpufreq/amd-pstate: Add support for raw EPP writes
Mario Limonciello (AMD) [Sun, 29 Mar 2026 20:38:10 +0000 (15:38 -0500)] 
cpufreq/amd-pstate: Add support for raw EPP writes

The energy performance preference field of the CPPC request MSR
supports values from 0 to 255, but the strings only offer 4 values.

The other values are useful for tuning the performance of some
workloads.

Add support for writing the raw energy performance preference value
to the sysfs file.  If the last value written was an integer then
an integer will be returned.  If the last value written was a string
then a string will be returned.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agocpufreq/amd-pstate: Add support for platform profile class
Mario Limonciello (AMD) [Sun, 29 Mar 2026 20:38:09 +0000 (15:38 -0500)] 
cpufreq/amd-pstate: Add support for platform profile class

The platform profile core allows multiple drivers and devices to
register platform profile support.

When the legacy platform profile interface is used all drivers will
adjust the platform profile as well.

Add support for registering every CPU with the platform profile handler
when dynamic EPP is enabled.

The end result will be that changing the platform profile will modify
EPP accordingly.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agocpufreq/amd-pstate: add kernel command line to override dynamic epp
Mario Limonciello (AMD) [Sun, 29 Mar 2026 20:38:08 +0000 (15:38 -0500)] 
cpufreq/amd-pstate: add kernel command line to override dynamic epp

Add `amd_dynamic_epp=enable` and `amd_dynamic_epp=disable` to override
the kernel configuration option `CONFIG_X86_AMD_PSTATE_DYNAMIC_EPP`
locally.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agocpufreq/amd-pstate: Add dynamic energy performance preference
Mario Limonciello (AMD) [Sun, 29 Mar 2026 20:38:07 +0000 (15:38 -0500)] 
cpufreq/amd-pstate: Add dynamic energy performance preference

Dynamic energy performance preference changes the EPP profile based on
whether the machine is running on AC or DC power.

A notification chain from the power supply core is used to adjust EPP
values on plug in or plug out events.

When enabled, the driver exposes a sysfs toggle for dynamic EPP, blocks
manual writes to energy_performance_preference while it "owns" the EPP
updates.

For non-server systems:
    * the default EPP for AC mode is `performance`.
    * the default EPP for DC mode is `balance_performance`.

For server systems dynamic EPP is mostly a no-op.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoDocumentation: amd-pstate: fix dead links in the reference section
Ninad Naik [Mon, 30 Mar 2026 19:08:55 +0000 (00:38 +0530)] 
Documentation: amd-pstate: fix dead links in the reference section

The links for AMD64 Architecture Programmer's Manual and PPR for AMD
Family 19h Model 51h, Revision A1 Processors redirect to a generic page.
Update the links to the working ones.

Signed-off-by: Ninad Naik <ninadnaik07@gmail.com>
Link: https://lore.kernel.org/r/20260330190855.1115304-1-ninadnaik07@gmail.com
Acked-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agocpufreq/amd-pstate: Cache the max frequency in cpudata
Mario Limonciello (AMD) [Thu, 26 Mar 2026 19:36:20 +0000 (14:36 -0500)] 
cpufreq/amd-pstate: Cache the max frequency in cpudata

The value of maximum frequency is fixed and never changes. Doing
calculations every time based off of perf is unnecessary.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20260326193620.649441-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoDocumentation/amd-pstate: Add documentation for amd_pstate_floor_{freq,count}
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:56 +0000 (17:17 +0530)] 
Documentation/amd-pstate: Add documentation for amd_pstate_floor_{freq,count}

Add documentation for the sysfs files
/sys/devices/system/cpu/cpufreq/policy*/amd_pstate_floor_freq
and
/sys/devices/system/cpu/cpufreq/policy*/amd_pstate_floor_count.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoDocumentation/amd-pstate: List amd_pstate_prefcore_ranking sysfs file
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:55 +0000 (17:17 +0530)] 
Documentation/amd-pstate: List amd_pstate_prefcore_ranking sysfs file

Add the missing amd_pstate_prefcore_ranking filenames in the sysfs
listing example leading to the descriptions of these
parameters. Clarify when will the file be visible.

Fixes: 15a2b764ea7c ("amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoDocumentation/amd-pstate: List amd_pstate_hw_prefcore sysfs file
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:54 +0000 (17:17 +0530)] 
Documentation/amd-pstate: List amd_pstate_hw_prefcore sysfs file

Add the missing amd_pstate_hw_prefcore filenames in the sysfs listing
example leading to the descriptions of these parameters. Clarify when
will the file be visible.

Fixes: b96b82d1af7f ("cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoamd-pstate-ut: Add a testcase to validate the visibility of driver attributes
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:53 +0000 (17:17 +0530)] 
amd-pstate-ut: Add a testcase to validate the visibility of driver attributes

amd-pstate driver has per-attribute visibility functions to
dynamically control which sysfs freq_attrs are exposed based on the
platform capabilities and the current amd_pstate mode. However, there
is no test coverage to validate that the driver's live attribute list
matches the expected visibility for each mode.

Add amd_pstate_ut_check_freq_attrs() to the amd-pstate unit test
module. For each enabled mode (passive, active, guided), the test
independently derives the expected visibility of each attribute:
  - Core attributes (max_freq, lowest_nonlinear_freq, highest_perf)
    are always expected.
  - Prefcore attributes (prefcore_ranking, hw_prefcore) are expected
    only when cpudata->hw_prefcore indicates platform support.
  - EPP attributes (energy_performance_preference,
    energy_performance_available_preferences) are expected only in
    active mode.
  - Floor frequency attributes (floor_freq, floor_count) are expected
    only when X86_FEATURE_CPPC_PERF_PRIO is present.

Compare these independent expectations against the live driver's attr
array, catching bugs such as attributes leaking into wrong modes or
visibility functions checking incorrect conditions.

Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoamd-pstate-ut: Add module parameter to select testcases
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:52 +0000 (17:17 +0530)] 
amd-pstate-ut: Add module parameter to select testcases

Currently when amd-pstate-ut test module is loaded, it runs all the
tests from amd_pstate_ut_cases[] array.

Add a module parameter named "test_list" that accepts a
comma-delimited list of test names, allowing users to run a
selected subset of tests. When the parameter is omitted or empty, all
tests are run as before.

Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoamd-pstate: Introduce a tracepoint trace_amd_pstate_cppc_req2()
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:51 +0000 (17:17 +0530)] 
amd-pstate: Introduce a tracepoint trace_amd_pstate_cppc_req2()

Introduce a new tracepoint trace_amd_pstate_cppc_req2() to track
updates to MSR_AMD_CPPC_REQ2.

Invoke this while changing the Floor Perf.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoamd-pstate: Add sysfs support for floor_freq and floor_count
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:50 +0000 (17:17 +0530)] 
amd-pstate: Add sysfs support for floor_freq and floor_count

When Floor Performance feature is supported by the platform, expose
two sysfs files:

   * amd_pstate_floor_freq to allow userspace to request the floor
     frequency for each CPU.

   * amd_pstate_floor_count which advertises the number of distinct
     levels of floor frequencies supported on this platform.

Reset the floor_perf to bios_floor_perf in the suspend, offline, and
exit paths, and restore the value to the cached user-request
floor_freq on the resume and online paths mirroring how bios_min_perf
is handled for MSR_AMD_CPPC_REQ.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoamd-pstate: Add support for CPPC_REQ2 and FLOOR_PERF
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:49 +0000 (17:17 +0530)] 
amd-pstate: Add support for CPPC_REQ2 and FLOOR_PERF

Some future AMD processors have feature named "CPPC Performance
Priority" which lets userspace specify different floor performance
levels for different CPUs. The platform firmware takes these different
floor performance levels into consideration while throttling the CPUs
under power/thermal constraints. The presence of this feature is
indicated by bit 16 of the EDX register for CPUID leaf
0x80000007. More details can be found in AMD Publication titled "AMD64
Collaborative Processor Performance Control (CPPC) Performance
Priority" Revision 1.10.

The number of distinct floor performance levels supported on the
platform will be advertised through the bits 32:39 of the
MSR_AMD_CPPC_CAP1. Bits 0:7 of a new MSR MSR_AMD_CPPC_REQ2
(0xc00102b5) will be used to specify the desired floor performance
level for that CPU.

Add support for the aforementioned MSR_AMD_CPPC_REQ2, and macros for
parsing and updating the relevant bits from MSR_AMD_CPPC_CAP1 and
MSR_AMD_CPPC_REQ2.

On boot if the default value of the MSR_AMD_CPPC_REQ2[7:0] (Floor
Perf) is lower than CPPC.lowest_perf, and thus invalid, initialize it
to MSR_AMD_CPPC_CAP1.nominal_perf which is a sane default value.

Save the boot-time floor_perf during amd_pstate_init_floor_perf(). In
a subsequent patch it will be restored in the suspend, offline, and
exit paths, mirroring how bios_min_perf is handled for
MSR_AMD_CPPC_REQ.

Link: https://docs.amd.com/v/u/en-US/69206_1.10_AMD64_CPPC_PUB
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agox86/cpufeatures: Add AMD CPPC Performance Priority feature.
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:48 +0000 (17:17 +0530)] 
x86/cpufeatures: Add AMD CPPC Performance Priority feature.

Some future AMD processors have feature named "CPPC Performance
Priority" which lets userspace specify different floor performance
levels for different CPUs. The platform firmware takes these different
floor performance levels into consideration while throttling the CPUs
under power/thermal constraints. The presence of this feature is
indicated by bit 16 of the EDX register for CPUID leaf
0x80000007. More details can be found in AMD Publication titled "AMD64
Collaborative Processor Performance Control (CPPC) Performance
Priority" Revision 1.10.

Define a new feature bit named X86_FEATURE_CPPC_PERF_PRIO to map to
CPUID 0x80000007.EDX[16].

Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoamd-pstate: Make certain freq_attrs conditionally visible
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:47 +0000 (17:17 +0530)] 
amd-pstate: Make certain freq_attrs conditionally visible

Certain amd_pstate freq_attrs such as amd_pstate_hw_prefcore and
amd_pstate_prefcore_ranking are enabled even when preferred core is
not supported on the platform.

Similarly there are common freq_attrs between the amd-pstate and the
amd-pstate-epp drivers (eg: amd_pstate_max_freq,
amd_pstate_lowest_nonlinear_freq, etc.) but are duplicated in two
different freq_attr structs.

Unify all the attributes in a single place and associate each of them
with a visibility function that determines whether the attribute
should be visible based on the underlying platform support and the
current amd_pstate mode.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoamd-pstate: Update cppc_req_cached in fast_switch case
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:46 +0000 (17:17 +0530)] 
amd-pstate: Update cppc_req_cached in fast_switch case

The function msr_update_perf() does not cache the new value that is
written to MSR_AMD_CPPC_REQ into the variable cpudata->cppc_req_cached
when the update is happening from the fast path.

Fix that by caching the value everytime the MSR_AMD_CPPC_REQ gets
updated.

This issue was discovered by Claude Opus 4.6 with the aid of Chris
Mason's AI review-prompts
(https://github.com/masoncl/review-prompts/tree/main/kernel).

Assisted-by: Claude:claude-opus-4.6 review-prompts/linux
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Fixes: fff395796917 ("cpufreq/amd-pstate: Always write EPP value when updating perf")
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agoamd-pstate: Fix memory leak in amd_pstate_epp_cpu_init()
Gautham R. Shenoy [Thu, 26 Mar 2026 11:47:45 +0000 (17:17 +0530)] 
amd-pstate: Fix memory leak in amd_pstate_epp_cpu_init()

On failure to set the epp, the function amd_pstate_epp_cpu_init()
returns with an error code without freeing the cpudata object that was
allocated at the beginning of the function.

Ensure that the cpudata object is freed before returning from the
function.

This memory leak was discovered by Claude Opus 4.6 with the aid of
Chris Mason's AI review-prompts
(https://github.com/masoncl/review-prompts/tree/main/kernel).

Assisted-by: Claude:claude-opus-4.6 review-prompts/linux
Fixes: f9a378ff6443 ("cpufreq/amd-pstate: Set different default EPP policy for Epyc and Ryzen")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
13 days agokbuild: vdso_install: drop build ID architecture allow-list
Thomas Weißschuh [Tue, 31 Mar 2026 17:50:22 +0000 (19:50 +0200)] 
kbuild: vdso_install: drop build ID architecture allow-list

Many architectures which do generate build IDs are missing from this
list. For example arm64, riscv, loongarch, mips.

Now that errors from readelf and binaries without any build ID are
handled gracefully, the allow-list is not necessary anymore, drop it.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260331-kbuild-vdso-install-v2-4-606d0dc6beca@weissschuh.net
Signed-off-by: Nicolas Schier <nsc@kernel.org>
13 days agokbuild: vdso_install: gracefully handle images without build ID
Thomas Weißschuh [Tue, 31 Mar 2026 17:50:21 +0000 (19:50 +0200)] 
kbuild: vdso_install: gracefully handle images without build ID

If the vDSO does not contain a build ID, skip the symlink step.
This will allow the removal of the explicit list of architectures.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260331-kbuild-vdso-install-v2-3-606d0dc6beca@weissschuh.net
Signed-off-by: Nicolas Schier <nsc@kernel.org>
13 days agokbuild: vdso_install: hide readelf warnings
Thomas Weißschuh [Tue, 31 Mar 2026 17:50:20 +0000 (19:50 +0200)] 
kbuild: vdso_install: hide readelf warnings

If 'readelf -n' encounters a note it does not recognize it emits a
warning. This for example happens when inspecting a compat vDSO for
which the main kernel toolchain was not used.
However the relevant build ID note is always readable, so the
warnings are pointless.

Hide the warnings to make it possible to extract build IDs for more
architectures in the future.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260331-kbuild-vdso-install-v2-2-606d0dc6beca@weissschuh.net
Signed-off-by: Nicolas Schier <nsc@kernel.org>
13 days agokbuild: vdso_install: split out the readelf invocation
Thomas Weißschuh [Tue, 31 Mar 2026 17:50:19 +0000 (19:50 +0200)] 
kbuild: vdso_install: split out the readelf invocation

Split up the logic as some upcoming changes to the readelf invocation
would create a very long line otherwise.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20260331-kbuild-vdso-install-v2-1-606d0dc6beca@weissschuh.net
Signed-off-by: Nicolas Schier <nsc@kernel.org>
13 days agoeth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64
Dimitri Daskalakis [Wed, 1 Apr 2026 16:28:48 +0000 (09:28 -0700)] 
eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64

On systems with 64K pages, RX queues will be wedged if users set the
descriptor count to the current minimum (16). Fbnic fragments large
pages into 4K chunks, and scales down the ring size accordingly. With
64K pages and 16 descriptors, the ring size mask is 0 and will never
be filled.

32 descriptors is another special case that wedges the RX rings.
Internally, the rings track pages for the head/tail pointers, not page
fragments. So with 32 descriptors, there's only 1 usable page as one
ring slot is kept empty to disambiguate between an empty/full ring.
As a result, the head pointer never advances and the HW stalls after
consuming 16 page fragments.

Fixes: 0cb4c0a13723 ("eth: fbnic: Implement Rx queue alloc/start/stop/free")
Signed-off-by: Dimitri Daskalakis <daskald@meta.com>
Link: https://patch.msgid.link/20260401162848.2335350-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>