Heiko Carstens [Mon, 8 Jun 2026 14:19:08 +0000 (16:19 +0200)]
s390/process: Fix kernel thread function pointer type
In case of a kernel thread __ret_from_fork() calls the specified function
indirectly. Fix the kernel thread function pointer, since kernel threads
return an int instead of void.
Fixes: 56e62a737028 ("s390: convert to generic entry") Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Kean Ren [Thu, 11 Jun 2026 02:37:57 +0000 (10:37 +0800)]
ASoC: SDCA: fix NULL pointer dereference in sdca_dev_unregister_functions
sdca_dev_unregister_functions() iterates over all SDCA function
descriptors and calls sdca_dev_unregister() on each func_dev without
checking for NULL. When a function registration has failed partway
through, or the device cleanup races with probe deferral, func_dev
entries may be NULL, leading to a kernel oops:
This was observed on a Lenovo ThinkPad X1 Carbon G14 (Panther Lake)
with the SOF audio driver probe failing due to missing Panther Lake
firmware, causing the subsequent cleanup of SoundWire devices to
trigger the crash.
Fix this with three changes:
1) Add a NULL guard in sdca_dev_unregister() so that callers do not
need to pre-validate the pointer (defense in depth).
2) In sdca_dev_unregister_functions(), skip NULL func_dev entries
and clear func_dev to NULL after unregistration, making the
function idempotent and safe against double-invocation.
3) In sdca_dev_register_functions(), roll back all previously
registered functions when a later one fails, so the function
array is never left in a partially-populated state.
Fixes: 4496d1c65bad ("ASoC: SDCA: add function devices") Signed-off-by: Kean Ren <rh_king@163.com> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://patch.msgid.link/20260611023757.1553960-1-rh_king@163.com Signed-off-by: Mark Brown <broonie@kernel.org>
Florian Fainelli [Mon, 25 May 2026 16:38:17 +0000 (17:38 +0100)]
ARM: 9476/1: mm: fix kexec and hibernation with CONFIG_CPU_TTBR0_PAN
Commit 7af5b901e847 ("ARM: 9358/2: Implement PAN for LPAE by TTBR0
page table walks disablement") implemented PAN for LPAE kernels by
setting TTBCR.EPD0 on every kernel entry, disabling TTBR0 page-table
walks while running in kernel mode. The commit correctly updated
cpu_suspend() in arch/arm/kernel/suspend.c, but missed two other code
paths that switch the CPU to the identity mapping before jumping to
low-PA (TTBR0-range) physical addresses:
1. setup_mm_for_reboot() in arch/arm/mm/idmap.c, used by the kexec
reboot path. With TTBCR.EPD0 still set, the subsequent branch to
the identity-mapped cpu_v7_reset causes a PrefetchAbort because the
TTBR0 page-table walk needed to resolve the identity-mapped address
is disabled. This manifests as a hard hang or "bad PC value" panic
on LPAE kernels booted on CPUs that strictly enforce EPD0 for
instruction fetch (e.g. Cortex-A53 in AArch32 mode) while the same
image may accidentally work on Cortex-A15 due to microarchitectural
differences in EPD0 enforcement.
2. arch_restore_image() in arch/arm/kernel/hibernate.c, which calls
cpu_switch_mm(idmap_pgd, &init_mm) directly without going through
setup_mm_for_reboot(), leaving TTBCR.EPD0 set while the identity
mapping is active.
Fix both sites by calling uaccess_save_and_enable() before switching
to the identity mapping, mirroring what the original commit did for
cpu_suspend().
Assisted-by: Cursor:claude-sonnet-4.6 Fixes: 7af5b901e847 ("ARM: 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement") Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Linus Walleij <linusw@kernel.org> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Karl Mehltretter [Sun, 24 May 2026 05:52:35 +0000 (06:52 +0100)]
ARM: 9475/1: entry: use byte load for KASAN VMAP stack shadow
Commit 44e9a3bb76e5 ("ARM: 9430/1: entry: Do a dummy read from
VMAP shadow") added a dummy read from the KASAN VMAP stack shadow in
__switch_to(). The read uses ldr, but the KASAN shadow address is
byte-granular and is not guaranteed to be word aligned.
ARMv5 faults unaligned word loads. With CONFIG_KASAN_VMALLOC and
CONFIG_VMAP_STACK enabled, ARM926/VersatilePB crashes in __switch_to()
with an alignment exception before reaching init.
Use ldrb for the dummy shadow access. The code only needs to fault in the
shadow mapping if the stack shadow is missing, so a byte load is sufficient
and matches the granularity of KASAN shadow memory.
Fixes: 44e9a3bb76e5 ("ARM: 9430/1: entry: Do a dummy read from VMAP shadow") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Karl Mehltretter <kmehltretter@gmail.com> Reviewed-by: Linus Walleij <linusw@kernel.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Karl Mehltretter [Sun, 24 May 2026 05:52:36 +0000 (06:52 +0100)]
ARM: 9474/1: io: avoid KASAN instrumentation of raw halfword I/O
For CPUs before ARMv6, __raw_readw() and __raw_writew() are implemented
as C volatile halfword accesses so the compiler can generate an access
sequence that is safe for those machines. With KASAN enabled, those C
accesses are instrumented as normal memory accesses.
That is not valid for MMIO. On ARM926/VersatilePB with KASAN enabled,
PL011 probing traps in __asan_store2() while registering the UART, because
the instrumented writew() tries to check KASAN shadow for an MMIO address.
Keep the existing volatile halfword access, but move the ARMv5 definitions
into __no_kasan_or_inline functions so raw MMIO halfword accesses are not
instrumented by KASAN. The ARMv6-and-newer inline assembly path is
unchanged.
Fixes: 421015713b30 ("ARM: 9017/2: Enable KASan for ARM") Cc: stable@vger.kernel.org # v5.11+ Signed-off-by: Karl Mehltretter <kmehltretter@gmail.com> Reviewed-by: Linus Walleij <linusw@kernel.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Vadim Fedorenko [Wed, 10 Jun 2026 22:28:43 +0000 (22:28 +0000)]
spi: xilinx: let transfers timeout in case of no IRQ
In case of failed HW the driver may not see an interrupt and will stuck
in waiting forever. We can avoid such situation by timing out of
transfers if the interrupt is not seen in a reasonable time.
This problem can be found on unload of ptp_ocp driver for TimeCard which
uses Xilinx SPI AXI and SPI-NOR flash memory. During tear-down process
spi-nor drivers send soft reset command which is not triggering an
interrupt stalling the unload process completely.
Rodrigo Vivi [Wed, 10 Jun 2026 15:25:49 +0000 (11:25 -0400)]
drm/xe: fix job timeout recovery for unstarted jobs and kernel queues
A job that GuC never scheduled (never started) indicates a GuC
scheduling failure; previously such jobs were silently errored out
instead of triggering a GT reset to recover. Trigger a GT reset and
resubmit them, but only when the queue was not already killed or banned:
an unstarted job on an already banned queue is the ban working as
intended and must neither clear the ban nor kick off a reset, otherwise
a banned userspace queue could be resurrected and spam GT resets.
Kernel queues are always recovered this way and wedge the device once
recovery attempts are exhausted, since kernel work must not silently
fail. A started job that times out on a userspace VM bind queue stays
banned rather than being reset and retried.
The queue is banned early in the timeout handler to signal the G2H
scheduling-done handler so it wakes the disable-scheduling waiter;
without it the waiter sleeps the full 5s timeout. When a reset is
warranted the ban is cleared before rearming so that
guc_exec_queue_start() can resubmit jobs after the GT reset - a
still-banned queue would block resubmission and cause an infinite TDR
loop. The already-banned case is gated out before this point via
skip_timeout_check, so it is unaffected.
v2: (Himal) Do it for any queue type, not just kernel/migration
v3: - (Sashiko and Sanjay): don't clear the ban / GT reset for already
killed/banned queues on unstarted-job timeout
- Update commit message
- (Matt) Add Fixes tag
Fixes: fe05cee4d953 ("drm/xe: Don't short circuit TDR on jobs not started") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Assisted-by: GitHub-Copilot:claude-sonnet-4.6 Assisted-by: GitHub-Copilot:claude-opus-4.8 Tested-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patch.msgid.link/20260610152548.404575-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit b1107d085e7e8ed15ba6f80c102528a9c8a6cb0e) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Wentao Liang [Wed, 10 Jun 2026 17:27:05 +0000 (10:27 -0700)]
drm/xe: fix refcount leak in xe_range_fence_insert()
xe_range_fence_insert() acquires a reference on fence via
dma_fence_get() and stores it in rfence->fence. It then calls
dma_fence_add_callback() and handles two cases: when the callback
is successfully registered (err == 0) the fence is transferred to
the tree for later cleanup; when the fence is already signaled
(err == -ENOENT) it manually drops the extra reference with
dma_fence_put(fence).
However, dma_fence_add_callback() can fail with other errors
(e.g. -EINVAL) and in that case the code falls through to the free:
label without releasing the acquired reference, leaking it.
Fix the leak by adding an else branch that calls dma_fence_put()
before jumping to free: for any error other than -ENOENT.
Merge branches 'acpi-processor', 'acpi-cppc' and 'acpi-pci'
Merge an ACPI processor driver update, two ACPI CPPC library updates
and ACPI PCI/CXL support updates for 7.2-rc1:
- Add cpuidle driver check in acpi_processor_register_idle_driver() to
avoid evaluating _CST unnecessarily (Tony W Wang-oc)
- Suppress UBSAN warning caused by field misuse during PCC-based
register access in the ACPI CPPC library (Jeremy Linton)
- Add support for CPPC v4 to the ACPI CPPC library (Sumit Gupta)
- Update the ACPI device enumeration code to honor _DEP for ACPI0016
PCI/CXL host bridges and make the ACPI PCI root driver clear _DEP
dependencies for PCI roots that have become operational (Chen Pei)
* acpi-processor:
ACPI: processor: Add cpuidle driver check in acpi_processor_register_idle_driver()
* acpi-cppc:
ACPI: CPPC: Suppress UBSAN warning caused by field misuse
ACPI: CPPC: Add support for CPPC v4
- Clean up lid handling in the ACPI button driver and
acpi_button_probe(), reorganize installing and removing event
handlers in that driver and switch it over to using devres-based
resource management during probe (Rafael Wysocki)
* acpi-button:
ACPI: button: Switch over to devres-based resource management
ACPI: button: Reorganize installing and removing event handlers
ACPI: button: Use string literals for generating netlink messages
ACPI: button: Clean up adding and removing lid procfs interface
ACPI: button: Merge two switch () statements in acpi_button_probe()
ACPI: button: Drop redundant variable from acpi_button_probe()
ACPI: button: Rework device verification during probe
ACPI: button: Use local pointer to platform device dev field in probe
ACPI: button: Eliminate redundant conditional statement
ACPI: button: Change return type of two functions to void
ACPI: button: Eliminate ternary operator from acpi_lid_evaluate_state()
ACPI: button: Use bool for representing boolean values
ACPI: button: Improve warning message regarding lid state
ACPI: button: Pass ACPI handle to acpi_lid_evaluate_state()
ACPI: button: Fix lid_device value leak past driver removal
Merge updates of core ACPI device drivers for 7.2-rc1:
- Fix multiple issues related to probe, removal and missing NVDIMM
device notifications in the ACPI NFIT driver (Rafael Wysocki)
- Add support for devres-based management of ACPI notify handlers to
the ACPI core (Rafael Wysocki)
- Switch multiple core ACPI device drivers (including the ACPI PAD,
ACPI video bus, ACPI HED, ACPI thermal zone, ACPI AC, ACPI battery,
and ACPI NFIT drivers) over to using devres-based resource management
during probe (Rafael Wysocki)
- Replace mutex_lock/unlock() with guard()/scoped_guard() in the ACPI
PMIC driver (Maxwell Doose)
- Fix message kref handling in the dead device path of the ACPI IPMI
address space handler (Yuho Choi)
- Use sysfs_emit() in idlecpus_show() in the ACPI processor aggregator
device (PAD) driver (Yury Norov)
- Clean up device_id_scheme initialization in the ACPI video bus driver
(Jean-Ralph Aviles)
* acpi-driver: (26 commits)
ACPI: IPMI: Fix message kref handling on dead device
ACPI: NFIT: core: Fix possible deadlock and missing notifications
ACPI: NFIT: core: Eliminate redundant local variable
ACPI: NFIT: core: Fix acpi_nfit_init() error cleanup
ACPI: NFIT: core: Fix possible NULL pointer dereference
ACPI: bus: Clean up devm_acpi_install_notify_handler()
ACPI: PAD: Use sysfs_emit() in idlecpus_show()
ACPI: video: Do not initialise device_id_scheme directly
ACPI: video: Switch over to devres-based resource management
ACPI: video: Use devm for video->entry and backlight cleanup
ACPI: video: Use devm action for freeing video devices
ACPI: video: Use devm action for video bus object cleanup
ACPI: video: Rearrange probe and remove code
ACPI: video: Reduce the number of auxiliary device dereferences
ACPI: PAD: Switch over to devres-based resource management
ACPI: PAD: Fix teardown ordering in acpi_pad_remove()
ACPI: PAD: Pass struct device pointer to acpi_pad_notify()
ACPI: PAD: Rearrange acpi_pad_notify()
ACPI: thermal: Switch over to devres-based resource management
ACPI: HED: Switch over to devres-based resource management
...
Merge ACPICA updates for 7.2-rc1 including the following changes:
- Add support for the Legacy Virtual Register (LVR) field in I2C serial
bus resource descriptors to ACPICA (Akhil R)
- Fix multiple issues related to bounds checks, input validation,
use-after-free, and integer overflow checks in the AML interpreter
in ACPICA (ikaros)
- Update the copyright year to 2026 in ACPICA files and make minor
changes related to ACPI 6.6 support (Pawel Chmielewski)
- Remove spurious precision from format used to dump parse trees in
ACPICA (David Laight)
- Add modern standby DSM GUIDs to ACPICA header files (Daniel Schaefer)
- Update D3hot/cold device power states definitions in ACPICA header
files (Aymeric Wibo)
- Fix NULL pointer dereference in acpi_ns_custom_package() (Weiming
Shi)
- Update ACPICA version to 20260408 (Saket Dumbre)
* acpica: (27 commits)
ACPICA: add boundary checks in two places
ACPICA: Add package limit checks in parser functions
ACPICA: Update version to 20260408
ACPICA: Update the copyright year to 2026
ACPICA: Remove spurious precision from format used to dump parse trees
ACPICA: Enhance OEM ID and Table ID validation in acpi_ex_load_table_op()
ACPICA: Fix NULL pointer dereference in acpi_ns_custom_package()
ACPICA: Enhance buffer validation in acpi_ut_walk_aml_resources()
ACPICA: Add validation for node in acpi_ns_build_normalized_path()
ACPICA: validate handler object type in two places
ACPICA: Improve argument parsing in acpi_ps_get_next_simple_arg()
ACPICA: Fix integer overflow in acpi_ex_opcode_3A_1T_1R() (mid_op)
ACPICA: Prevent adding invalid references
ACPICA: add boundary checks in acpi_ps_get_next_field()
ACPICA: validate byte_count in acpi_ps_get_next_package_length()
ACPICA: Fix use-after-free in acpi_ds_terminate_control_method()
ACPICA: fix I2C LVR item count in the conversion table
ACPICA: Mention the LVR bits
ACPICA: Change LVR to 8 bit value
ACPICA: Fetch LVR I2C resource descriptor
...
Dmitry Ilvokhin [Thu, 4 Jun 2026 07:15:07 +0000 (07:15 +0000)]
locking: Add contended_release tracepoint to sleepable locks
Add the contended_release trace event. This tracepoint fires on the
holder side when a contended lock is released, complementing the
existing contention_begin/contention_end tracepoints which fire on the
waiter side.
This enables correlating lock hold time under contention with waiter
events by lock address.
Add trace_contended_release()/trace_call__contended_release() calls to
the slowpath unlock paths of sleepable locks: mutex, rtmutex, semaphore,
rwsem, percpu-rwsem, and RT-specific rwbase locks.
Where possible, trace_contended_release() fires before the lock is
released and before the waiter is woken. For some lock types, the
tracepoint fires after the release but before the wake. Making the
placement consistent across all lock types is not worth the added
complexity.
For reader/writer locks, the tracepoint fires for every reader releasing
while a writer is waiting, not only for the last reader.
Dmitry Ilvokhin [Thu, 4 Jun 2026 07:15:06 +0000 (07:15 +0000)]
locking/percpu-rwsem: Extract __percpu_up_read()
Move the percpu_up_read() slowpath out of the inline function into a new
__percpu_up_read() to avoid binary size increase from adding a
tracepoint to an inlined function.
Peter Zijlstra [Wed, 10 Jun 2026 21:20:09 +0000 (23:20 +0200)]
futex: Optimize futex hash bucket access patterns
Breno reported significant c2c HITM in a futex hash heavy workload.
It turns out that the hash bucket to private hash table reverse pointer
(futex_hash_bucket::priv) was to blame. Notably when the hash buckets are
heavily contended, the: 'fph = bh->priv;' load in futex_hash() will typically
miss and consequently become quite expensive.
Since this load in particular is quite superfluous, removing it is fairly
straight forward. However, removing it does not in fact achieve anything much.
The pain moves to the next user, notably: futex_hash_put().
Therefore rework the whole private hash refcounting to avoid needing this back
pointer (and removing it). Instead of passing around 'struct futex_hash_bucket
*hb', pass around a new structure that contains it and the related 'struct
futex_private_hash *fph' pointer in tandem.
Funnily this turns out to remove more code than it adds and significantly
improves futex hash performance (as measured by 'perf bench futex hash'):
i=56: pick_task(rq_56)
pick_task_fair(rq_56)
cfs_rq->nr_queued == 0
goto idle
sched_balance_newidle(rq_56)
raw_spin_rq_unlock(rq_56)
// core-wide lock released
newidle_balance() pulls
task A: rq_57 -> rq_56
// task_rq(A) == rq_56 now
raw_spin_rq_lock(rq_56)
// core-wide lock re-acquired
return > 0
goto again
pick_task_fair(rq_56)
-> picks task A
rq_56->core_pick = task A
// first loop done
// rq_57->core_pick is still task A (set before lock release)
// but task_rq(A) == rq_56 now
next = rq_57->core_pick // = task A
put_prev_set_next_task(rq_57, prev, task A)
__set_next_task_fair(rq_57, task A)
hrtick_start_fair(rq_57, task A)
WARN_ON_ONCE(task_rq(task A) != rq_57)
// task_rq(A) == rq_56
IOW: by allowing pick_task_fair() to do newidle_balance and not returning
RETRY_TASK, it can end up selecting the same task on two CPUs. Restore the
previous state by never doing newidle when core scheduling is enabled.
Potin Lai [Thu, 11 Jun 2026 05:46:18 +0000 (13:46 +0800)]
hwmon: (pmbus/lm25066) Fix PMBus coefficients for LM5064/5066/5066i
Swap the high setting and low setting coefficients in the lm25066_coeff
table for LM5064, LM5066, and LM5066i. The coefficients were previously
mapped incorrectly, resulting in inverted current and power scaling.
Additionally, dynamically assign the exponent (R) registers inside the
probe's LM25066_DEV_SETUP_CL check. This ensures that the proper
exponent is applied (e.g., for LM25056, high setting power exponent
is -4, but low setting power exponent is -3).
Kiran Kumar K [Mon, 8 Jun 2026 09:54:55 +0000 (15:24 +0530)]
octeontx2-af: fix IP fragment flag corruption on custom KPU profile load
npc_cn20k_apply_custom_kpu() overwrites KPU profile entries with custom
firmware values and then calls npc_cn20k_update_action_entries_n_flags()
over all entries. Since the same function already ran during default
profile initialisation, entries not overridden by the custom firmware
get their flags translated twice, corrupting the CN20K-specific values.
Fix this by extracting the per-entry translation into a helper
npc_cn20k_translate_action_flags() and calling it as each custom entry
is loaded, removing the redundant batch call at the end.
Paolo Abeni [Thu, 11 Jun 2026 10:29:59 +0000 (12:29 +0200)]
Merge tag 'nf-26-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Revalidate bridge ports, add missing NULL checks to fetch the bridge
device by the port. From Florian Westphal.
2) Fix netdevice refcount leak in the error path of nft_fwd hardware
offload function, also from Florian.
3) Unregister helper expectfn callback on conntrack helper module
removal, otherwise dangling pointer remains in place,
from Weiming Shi.
4) Fix possible pointer infoleak in getsockopt() IPT_SO_GET_ENTRIES,
From Kyle Zeng.
5) Validate that device MAC header is present before nf_syslog
accesses it. From Xiang Mei.
6-8) Three patches to address a possible infoleak of stale stack
data in three nf_tables expressions, due to mismatch in the
_init() and _eval() function which is possible since 14fb07130c7d.
From Davide Ornaghi and Florian Westphal.
netfilter pull request 26-06-10
* tag 'nf-26-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_meta_bridge: fix stale stack leak via IIFHWADDR register
netfilter: nft_fib: fix stale stack leak via the OIFNAME register
netfilter: nft_exthdr: fix register tracking for F_PRESENT flag
netfilter: nf_log: validate MAC header was set before dumping it
netfilter: x_tables: avoid leaking percpu counter pointers
netfilter: nf_conntrack: destroy stale expectfn expectations on unregister
netfilter: nf_tables_offload: drop device refcount on error
netfilter: revalidate bridge ports
====================
Li Jun [Thu, 11 Jun 2026 01:00:45 +0000 (09:00 +0800)]
ASoC: loongson: Fix invalid position error in ls_pcm_pointer
The "invalid position" error occurred when the DMA position descriptor
returned an invalid address value (e.g., pos = -1048838144). This happened
because the `bytes_to_frames()` function returns a signed value, but when
`addr < runtime->dma_addr`, the subtraction produces a negative result that
gets interpreted as a large unsigned integer in comparisons.
when the addr is abnormal, for example,the DMA controller is abnormal in
hardware,x=0 should not be a point(x == runtime->buffer_size),but a range,
which includes the addr address being less than runtime ->dma1-adr, and
the addr exceeding the DMA address range.the value of pos should not better
a negative,return 0, maybe better.
1) xfrm: iptfs: preserve shared-frag marker in iptfs_consume_frags()
Propagate SKBFL_SHARED_FRAG when paged fragments are moved between
skbs so ESP can decide whether in-place crypto is safe.
2) xfrm: iptfs: fix use-after-free on first_skb in __input_process_payload
Replace the unlocked read of xtfs->ra_newskb with a local flag so a
concurrent reassembly can no longer free first_skb between
spin_unlock and the post-loop check.
3) xfrm: policy: fix use-after-free on inexact bin in xfrm_policy_bysel_ctx()
Prune the inexact bin under xfrm_policy_lock so a concurrent
xfrm_hash_rebuild() can no longer free it before xfrm_policy_kill()
dereferences it.
4) xfrm: iptfs: fix ABBA deadlock in iptfs_destroy_state()
Move hrtimer_cancel() for the output and drop timers ahead of their
spinlocks, breaking the softirq/lock cycle that could deadlock
against the timer callbacks on SMP.
5) xfrm: espintcp: do not reuse an in-progress partial send
Fail a new send when espintcp_push_msgs() returns with emsg->len
still set, so a blocking caller can no longer overwrite ctx->partial
while a previous transfer still owns it.
6) esp: fix page frag reference leak on skb_to_sgvec failure
Add a flag to esp_ssg_unref() to unconditionally unref the source
scatterlist, releasing the old page references that are otherwise
leaked when the second skb_to_sgvec() in esp_output_tail() fails.
Please pull or let me know if there are problems.
ipsec-2026-06-10
* tag 'ipsec-2026-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
esp: fix page frag reference leak on skb_to_sgvec failure
xfrm: espintcp: do not reuse an in-progress partial send
xfrm: iptfs: fix ABBA deadlock in iptfs_destroy_state()
xfrm: policy: fix use-after-free on inexact bin in xfrm_policy_bysel_ctx()
xfrm: iptfs: fix use-after-free on first_skb in __input_process_payload
xfrm: iptfs: preserve shared-frag marker in iptfs_consume_frags()
====================
Ido Schimmel [Tue, 9 Jun 2026 14:54:48 +0000 (17:54 +0300)]
ipv6: Fix a potential NPD in cleanup_prefix_route()
addrconf_get_prefix_route() can return the fib6_null_entry sentinel
entry which has a NULL fib6_table pointer. Therefore, before setting the
route's expiration time, check that we are not working with this entry,
as otherwise a NPD will be triggered [1].
Note that the other callers of addrconf_get_prefix_route() are not
susceptible to this bug:
1. addrconf_prefix_rcv(): Requests a route with the 'RTF_ADDRCONF |
RTF_PREFIX_RT' flags which are not set on fib6_null_entry.
2. modify_prefix_route(): Fixed by commit a747e02430df ("ipv6: avoid
possible NULL deref in modify_prefix_route()").
3. __ipv6_ifa_notify(): Calls ip6_del_rt() which specifically checks for
fib6_null_entry and returns an error.
Fixes: 5eb902b8e719 ("net/ipv6: Remove expired routes with a separated list of routes.") Reported-by: Ji'an Zhou <eilaimemedsnaimel@gmail.com> Reviewed-by: David Ahern <dahern@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260609145448.768318-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
For AML devices, there are some issues where the wrong module
indentified then configure PHY failed.
The module info buffers should be initialized to 0 before the firmware
returns information. And DECLARE_PHY_INTERFACE_MASK() does not guarantee
zeroed contents, so explicitly clear the temporary interface masks before
setting supported interfaces.
Rework txgbe_identify_module() to validate module identifiers through
explicit type checks instead of relying on transceiver_type heuristics.
When using the SFP module, transceiver_type could be a random value,
because it was read from an invalid register.
====================
Jiawen Wu [Mon, 8 Jun 2026 07:08:42 +0000 (15:08 +0800)]
net: txgbe: initialize PHY interface to 0
DECLARE_PHY_INTERFACE_MASK() does not guarantee zeroed contents. Add a
new macro DECLARE_PHY_INTERFACE_MASK_ZERO(), make the stack variable to
be zeroed before setting supported interfaces.
Jiawen Wu [Mon, 8 Jun 2026 07:08:41 +0000 (15:08 +0800)]
net: txgbe: distinguish module types by checking identifier
Rework txgbe_identify_module() to validate module identifiers through
explicit type checks instead of relying on transceiver_type heuristics.
When using the SFP module, transceiver_type could be a random value,
because it was read from an invalid register.
Jiawen Wu [Mon, 8 Jun 2026 07:08:40 +0000 (15:08 +0800)]
net: txgbe: initialize module info buffer
The module info buffer should be initialized to 0 before the firmware
returns information. Otherwise, there is a risk that the buffer field
not filled by the firmware is random value.
gpio: nomadik: remove dead DB8540 code from <gpio/gpio-nomadik.h>
DB8540 support was removed in commit b6d09f780761 ("pinctrl: nomadik:
Drop U8540/9540 support"), but a couple small pieces of related code
remained in <gpio/gpio-nomadik.h>. Remove them.
Discovered while searching for CONFIG_* symbols referenced in code but
not defined in any Kconfig file.
Now that the Kconfig space always enables CONFIG_X86_TSC (on x86),
remove !CONFIG_X86_TSC code from the x86 arch code.
We still keep the Kconfig option to catch any eventual code still
pending in maintainer or non-mainline trees, plus some drivers
have raw TSC timestamping hacks that use CONFIG_X86_TSC.
It's also still possible to disable TSC support runtime.
Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ahmed S . Darwish <darwi@linutronix.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: H . Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250425084216.3913608-13-mingo@kernel.org
This is v5 of the earlier XDP_PASS fix. The XDP_PASS change is
retained, and the series also fixes related RX/XDP buffer handling
issues found during review.
Tested with tools/testing/selftests/drivers/net/xdp.py on mvpp2
hardware.
====================
Til Kaiser [Sun, 7 Jun 2026 13:49:43 +0000 (15:49 +0200)]
net: mvpp2: build skb from XDP-adjusted data on XDP_PASS
When an XDP program uses bpf_xdp_adjust_head() or bpf_xdp_adjust_tail()
and then returns XDP_PASS, mvpp2 still builds the skb from fixed offsets
derived from the original RX descriptor. Packet geometry changes made by
the XDP program are therefore discarded before the skb reaches the stack.
Update rx_offset and rx_bytes from xdp.data and xdp.data_end for
XDP_PASS. This makes skb_reserve() and skb_put() reflect the packet seen
by XDP, and makes RX byte accounting for XDP_PASS follow the length of the
skb passed to the network stack.
Keep a separate rx_sync_size for page-pool recycling on skb allocation
failure, which must stay tied to the received buffer range.
Non-PASS verdicts continue to account the descriptor length because no skb
is passed up in those cases.
Til Kaiser [Sun, 7 Jun 2026 13:49:42 +0000 (15:49 +0200)]
net: mvpp2: refill RX buffers before XDP or skb use
The RX error path returns the current descriptor buffer to the hardware
BM pool. That is only valid while the driver still owns the buffer.
mvpp2_rx_refill() can fail after the current buffer has been handed to
XDP or attached to an skb. In those cases mvpp2_run_xdp() may have
recycled, redirected, or queued the page for XDP_TX, and an skb free also
retires the data buffer. Returning such a buffer to BM lets hardware DMA
into memory that is no longer owned by the RX ring.
Refill the BM pool before handing the current buffer to XDP or to the
skb. If the allocation fails there, drop the packet and return the
still-owned current buffer to BM, preserving the pool depth. Once the
refill succeeds, later local drops retire/free the current buffer instead
of returning it to BM.
Fixes: 07dd0a7aae7f ("mvpp2: add basic XDP support") Fixes: d6526926de73 ("net: mvpp2: fix memory leak in mvpp2_rx") Signed-off-by: Til Kaiser <mail@tk154.de> Link: https://patch.msgid.link/20260607134943.21996-4-mail@tk154.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Til Kaiser [Sun, 7 Jun 2026 13:49:41 +0000 (15:49 +0200)]
net: mvpp2: limit XDP frame size to the RX buffer
mvpp2 has short and long BM pools, and short pool buffers can be smaller
than PAGE_SIZE. The XDP path nevertheless initializes every xdp_buff with
PAGE_SIZE as frame size.
XDP helpers use frame_sz to validate tail growth and to derive the hard
end of the data area. Advertising PAGE_SIZE for short buffers can let
bpf_xdp_adjust_tail() grow a packet past the real allocation, corrupting
memory or later tripping skb tailroom checks.
Initialize the XDP buffer with bm_pool->frag_size so XDP tailroom matches
the actual buffer backing the packet.
Til Kaiser [Sun, 7 Jun 2026 13:49:40 +0000 (15:49 +0200)]
net: mvpp2: sync RX data at the hardware packet offset
mvpp2 programs the RX queue packet offset, so hardware writes received
data at dma_addr + MVPP2_SKB_HEADROOM. The current CPU sync starts at
dma_addr and only covers rx_bytes + MVPP2_MH_SIZE bytes, which syncs the
unused headroom and misses the same number of bytes at the packet tail.
On non-coherent DMA systems this can leave the CPU reading stale cache
contents for the end of the received frame.
Use dma_sync_single_range_for_cpu() with MVPP2_SKB_HEADROOM as the range
offset so the sync covers the Marvell header and packet data actually
written by hardware.
Eric Biggers [Sun, 31 May 2026 19:17:35 +0000 (12:17 -0700)]
crypto: xilinx-trng - Remove crypto_rng interface
Implementing the crypto_rng interface has no purpose, as it isn't used
in practice. It's being removed from other drivers too. Just remove
it. This leaves hwrng, which is actually used.
Tagging with 'Cc stable' due to the bugs that this removes:
- xtrng_trng_generate() sometimes returned success even when it didn't
fill in all the bytes.
- It was possible for xtrng_trng_generate() and
xtrng_hwrng_trng_read() to run concurrently and interfere with each
other, as the locking code in xtrng_hwrng_trng_read() was broken.
Fixes: 8979744aca80 ("crypto: xilinx - Add TRNG driver for Versal") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Sun, 31 May 2026 17:59:31 +0000 (10:59 -0700)]
crypto: exynos-rng - Remove exynos-rng driver
This driver has no purpose. It doesn't feed into the Linux RNG, nor
does it implement the hwrng interface. It is accessible only via the
"rng" algorithm type of AF_ALG, which isn't used in practice. Everyone
uses either the Linux RNG, or rarely /dev/hwrng.
Moreover, this is a PRNG whose only source of entropy is the 160-bit
seed the user passes in. So this can be used only by a user who already
has a source of cryptographically secure random numbers, such as
/dev/random. Which they can, and do, just use in the first place.
Just remove this driver. There's no need to keep useless code around.
Note that the other crypto_rng drivers in drivers/crypto/ are similarly
unused and are being removed too. This commit just handles exynos-rng.
Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Sat, 30 May 2026 20:26:23 +0000 (13:26 -0700)]
crypto: hisi-trng - Remove crypto_rng interface
drivers/crypto/hisilicon/trng/trng.c exposes the same hardware through
two completely separate interfaces, crypto_rng and hwrng. However, the
implementation of this is buggy because it permits generation operations
from these interfaces to run concurrently with each other, accessing the
same registers. That is, hisi_trng_generate() synchronizes with itself
but not with hisi_trng_read(). This results in potential repetition of
output from the RNG, output of non-random values, etc.
Fortunately, there's actually no point in hardware RNG drivers
implementing the crypto_rng interface. It's not actually used by
anything besides the "rng" algorithm type of AF_ALG, which in turn is
not actually used in practice. Other crypto_rng hardware drivers are
likewise being phased out, leaving just the hwrng support.
Thus, remove it to simplify the code and avoid conflict (and confusion)
with the hwrng interface which is the one that actually matters.
Fixes: e4d9d10ef4be ("crypto: hisilicon/trng - add support for PRNG") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Fri, 29 May 2026 23:32:08 +0000 (16:32 -0700)]
crypto: loongson - Remove broken and unused loongson-rng
The loongson-rng rng_alg has several vulnerabilities, including not
providing forward security, and a use-after-free bug due to the use of
wait_for_completion_interruptible().
Meanwhile, the rng_alg framework doesn't really have any purpose in the
first place other than to access the software algorithms crypto/drbg.c
and crypto/jitterentropy.c. Hardware-specific rng_algs have no
in-kernel user, and unlike hwrng there's no feed into the actual Linux
RNG. As such, there's really no point to this code. There are of
course other rng_alg drivers that are similarly unused, but they're
similarly in the process of being phased out, e.g.
https://lore.kernel.org/r/20260529193648.18172-1-ebiggers@kernel.org and
https://lore.kernel.org/r/20260529220430.34135-1-ebiggers@kernel.org
Given that, there's no point in fixing forward these vulnerabilities,
and it makes much more sense to simply roll back the addition of this
driver. If this platform provides TRNG (not PRNG) functionality, it
could make sense to add a hwrng driver, but it would be quite different.
Eric Biggers [Fri, 29 May 2026 22:04:30 +0000 (15:04 -0700)]
crypto: crypto4xx - Remove insecure and unused rng_alg
Remove crypto4xx_rng, as it is insecure and unused:
- It has only a 64-bit security strength, which is highly inadequate.
This can be seen by the fact that crypto4xx_hw_init() seeds it with
only 64 bits of entropy, and the fact that the original commit
mentions that it implements ANSI X9.17 Annex C.
Another issue was that this driver didn't implement the crypto_rng API
correctly, as crypto4xx_prng_generate() didn't return 0 on success.
- No user of this code is known. It's usable only theoretically via the
"rng" algorithm type of AF_ALG. But userspace actually just uses the
actual Linux RNG (/dev/random etc) instead. And rng_algs don't
contribute entropy to the actual Linux RNG either. (This may have
been confused with hwrng, which does contribute entropy.)
Fixes: d072bfa48853 ("crypto: crypto4xx - add prng crypto support") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Giovanni Cabiddu [Thu, 28 May 2026 15:57:44 +0000 (16:57 +0100)]
crypto: qat - validate RSA CRT component lengths
The generic RSA key parser (rsa_helper.c) bounds each CRT component (p,
q, dp, dq, qinv) by the modulus size n_sz, but qat_rsa_setkey_crt()
allocates half-size DMA buffers (key_sz / 2) and right-aligns each
component with:
memcpy(dst + half_key_sz - len, src, len)
When a CRT component is larger than half_key_sz the subtraction
underflows and memcpy writes past the DMA buffer, causing memory
corruption.
Add a len > half_key_sz check next to the existing !len check for each
of the five CRT components so the driver falls back to the non-CRT path
instead of writing out of bounds.
Fixes: 879f77e9071f ("crypto: qat - Add RSA CRT mode") Cc: stable@vger.kernel.org Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Reviewed-by: Laurent M Coquerel <laurent.m.coquerel@intel.com> Tested-by: Laurent M Coquerel <laurent.m.coquerel@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pei Xiao [Thu, 11 Jun 2026 00:44:14 +0000 (08:44 +0800)]
hwmon: (gpd-fan) Reject EC PWM value 0 as invalid
The EC firmware is expected to return values in [1, pwm_max]. A read of 0
is illegal and would cause underflow in the conversion formula. Explicitly
check for 0 and return -EIO.
Rosen Penev [Wed, 3 Jun 2026 03:11:35 +0000 (20:11 -0700)]
i2c: mxs: add missing kernel-doc for struct mxs_i2c_dev members
Add kernel-doc documentation for the struct members that were previously
undocumented. This fixes warnings when building with W=1 and ensures
the struct is fully documented per kernel-doc conventions.
Armin Wolf [Wed, 10 Jun 2026 18:01:41 +0000 (20:01 +0200)]
hwmon: (dell-smm) Add Dell Latitude 7530 to fan control whitelist
A user reported that the Dell Latitude 7530 needs to be whitelisted
for the special SMM calls necessary for globally enabling/disabling
BIOS fan control.
Marius Cristea [Wed, 10 Jun 2026 15:19:47 +0000 (18:19 +0300)]
hwmon: temperature: add support for EMC1812
This is the hwmon driver for Microchip EMC1812/13/14/15/33
Multichannel Low-Voltage Remote Diode Sensor Family.
EMC1812 has one external remote temperature monitoring channel.
EMC1813 has two external remote temperature monitoring channels.
EMC1814 has three external remote temperature monitoring channels,
channels 2 and 3 support anti parallel diode.
EMC1815 has four external remote temperature monitoring channels and
channels 1/2 and 3/4 support anti parallel diode.
EMC1833 has two external remote temperature monitoring channels and
channels 1 and 2 support anti parallel diode.
Resistance Error Correction is supported on channels 1/2 and 3/4.
Marius Cristea [Wed, 10 Jun 2026 15:19:46 +0000 (18:19 +0300)]
dt-bindings: hwmon: temperature: add support for EMC1812
This is the devicetree schema for Microchip EMC1812/13/14/15/33
Multichannel Low-Voltage Remote Diode Sensor Family. It also
updates the MAINTAINERS file to include the new driver.
EMC1812 has one external remote temperature monitoring channel.
EMC1813 has two external remote temperature monitoring channels.
EMC1814 has three external remote temperature monitoring channels and
channels 2 and 3 support anti parallel diode.
EMC1815 has four external remote temperature monitoring channels and
channels 1/2 and 3/4 support anti parallel diode.
EMC1833 has two external remote temperature monitoring channels and
channels 1 and 2 support anti parallel diode.
Resistance Error Correction is supported on channels 1/2 and 3/4.
Linus Torvalds [Wed, 10 Jun 2026 18:53:55 +0000 (11:53 -0700)]
Merge tag 'pm-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These address some remaining fallout after introducing dynamic EPP
support in the amd-pstate driver during the current development cycle:
- Restore allowing writing EPP of 0 when in performance mode in the
amd-pstate driver which was unnecessarily disallowed by one of the
recent updates (Mario Limonciello)
- Remove stale documentation of the epp_cached field in struct
amd_cpudata that has been dropped recently (Zhan Xusheng)"
* tag 'pm-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate: Fix setting EPP in performance mode
cpufreq/amd-pstate: drop stale @epp_cached kdoc
drm/xe: include all registered queues in TLB invalidation
Context-based TLB invalidation currently selects only scheduling-active
exec queues via q->ops->active(). During rebind flows, queues may be
suspended (or transitioning through resume) while still owning valid
translations, causing them to be skipped from invalidation and leading
to missed TLB invalidations on LR rebinds.
The underlying issue is a TOCTOU: q->guc->state bits are flipped lock-free
from enable_scheduling(), disable_scheduling{,_deregister}(), the
suspend/resume sched-msg handlers, handle_sched_done(), and
guc_exec_queue_stop(); nothing in send_tlb_inval_ctx_ppgtt() serializes
against them, so any state-based predicate can race.
Include all the registered queues so that TLB invalidations are not
missed. This is race-free because list membership on vm->exec_queues.list
is stable under vm->exec_queues.lock held by the caller. The performance
impact is expected to be minimal and harmless. If it does turn out to be
a concern, we can come back with a race-safe solution to ignore certain
queues.
Fixes: 6cdaa5346d6f ("drm/xe: Add context-based invalidation to GuC TLB invalidation backend") Assisted-by: Claude:claude-opus-4.6 Suggested-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260608162745.338725-2-tilak.tirumalesh.tangudu@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit aa625e1e9f0710e424fe4f0e3f032807df81b5b0) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Raag Jadav [Tue, 2 Jun 2026 04:48:43 +0000 (10:18 +0530)]
drm/xe/drm_ras: Add per node cleanup action
cleanup_node_param() is not registered for previous node in case of counter
allocation failure, which results in stale memory of previous node that
isn't cleaned up on unwind. Add per node cleanup action which guarantees
cleanup on unwind and also simplifies the cleanup logic.
Raag Jadav [Tue, 2 Jun 2026 04:48:42 +0000 (10:18 +0530)]
drm/xe/drm_ras: Make counter allocation drm managed
cleanup_node_param() is not registered for previous node in case of counter
allocation failure, which results in stale memory of previous node that
isn't cleaned up on unwind. Fix this using drm managed allocation, which is
guaranteed to be cleaned up on unwind.
Jani Nikula [Fri, 15 May 2026 16:09:20 +0000 (19:09 +0300)]
drm/xe/display: fix oops in suspend/shutdown without display
The xe driver keeps track of whether to probe display, and whether
display hardware is there, using xe->info.probe_display. It gets set to
false if there's no display after intel_display_device_probe(). However,
the display may also be disabled via fuses, detected at a later time in
intel_display_device_info_runtime_init().
In this case, the xe driver does for_each_intel_crtc() on uninitialized
mode config in xe_display_flush_cleanup_work(), leading to a NULL
pointer dereference, and generally calls display code with display info
cleared.
Check for intel_display_device_present() after
intel_display_device_info_runtime_init(), and reset
xe->info.probe_display as necessary. Also do unset_display_features()
for completeness, although display runtime init has already done
that. This will need to be unified across all cases later.
Move intel_display_device_info_runtime_init() call slightly earlier,
similar to i915, to avoid a bunch of unnecessary setup for no display
cases.
Note #1: The xe driver has no business doing low level display plumbing
like for_each_intel_crtc() to begin with. It all needs to happen in
display code.
Note #2: The actual bug is present already in commit 44e694958b95
("drm/xe/display: Implement display support"), but the oops was likely
introduced later at commit ddf6492e0e50 ("drm/xe/display: Make display
suspend/resume work on discrete").
Davide Ornaghi [Wed, 10 Jun 2026 10:39:13 +0000 (12:39 +0200)]
netfilter: nft_meta_bridge: fix stale stack leak via IIFHWADDR register
NFT_META_BRI_IIFHWADDR declares its destination register with
len = ETH_ALEN (6 bytes), which the register-init tracking rounds up to
two 32-bit registers (8 bytes). nft_meta_bridge_get_eval() then does
memcpy(dest, br_dev->dev_addr, ETH_ALEN), writing only 6 bytes and
leaving the upper 2 bytes of the second register as uninitialised
nft_do_chain() stack. A downstream load of that register span leaks
those stale bytes to userspace.
Zero the second register before the memcpy so the full declared span is
written.
Davide Ornaghi [Wed, 10 Jun 2026 10:39:12 +0000 (12:39 +0200)]
netfilter: nft_fib: fix stale stack leak via the OIFNAME register
For NFT_FIB_RESULT_OIFNAME the destination register is declared with
len = IFNAMSIZ (four 32-bit registers), but on the lookup-fail,
RTN_LOCAL and oif-mismatch paths nft_fib{4,6}_eval() only writes one
register via "*dest = 0". The remaining three registers are left as
whatever was on the stack in nft_do_chain()'s struct nft_regs, and a
downstream expression that loads the register span can leak that
uninitialised kernel stack to userspace.
The NFTA_FIB_F_PRESENT existence check has the same shape: it is only
meaningful for NFT_FIB_RESULT_OIF, yet it was accepted for any result type
while the eval stores a single byte via nft_reg_store8(), leaving the rest
of the declared span stale.
Fix both:
- replace the bare "*dest = 0" in the eval with nft_fib_store_result(),
which strscpy_pad()s the whole IFNAMSIZ for OIFNAME (and is already
used on the other early-return path), and
- restrict NFTA_FIB_F_PRESENT to NFT_FIB_RESULT_OIF and declare its
destination as a single u8, so the marked span matches the one byte
the eval writes.
netfilter: nft_exthdr: fix register tracking for F_PRESENT flag
nft_exthdr_init() passes user-controlled priv->len to
nft_parse_register_store(), which marks that many bytes in the
register bitmap as initialized. However, when NFT_EXTHDR_F_PRESENT
is set, the eval paths write only 1 byte (nft_reg_store8) or
4 bytes (*dest = 0 on TCP/DCCP error path). When len > 4,
registers beyond the first are never written, retaining
uninitialized stack data from nft_regs.
Bail out if userspace requests too much data when F_PRESENT is set.
Reported-by: Ji'an Zhou <eilaimemedsnaimel@gmail.com> Fixes: c078ca3b0c5b ("netfilter: nft_exthdr: Add support for existence check") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Xiang Mei [Tue, 9 Jun 2026 22:55:02 +0000 (15:55 -0700)]
netfilter: nf_log: validate MAC header was set before dumping it
The fallback path of dump_mac_header() guards the MAC header access
only with "skb->mac_header != skb->network_header", without checking
skb_mac_header_was_set(). When the MAC header is unset, mac_header is
0xffff, so the test passes and skb_mac_header(skb) returns
skb->head + 0xffff, ~64 KiB past the buffer; the loop then reads
dev->hard_header_len bytes out of bounds into the kernel log.
This is reachable via the netdev logger: nf_log_unknown_packet() calls
dump_mac_header() unconditionally, and an skb sent through AF_PACKET
with PACKET_QDISC_BYPASS reaches the egress hook with mac_header still
unset (__dev_queue_xmit(), which would reset it, is bypassed).
Add the skb_mac_header_was_set() check the ARPHRD_ETHER path already
uses, and replace the open-coded MAC header length test with
skb_mac_header_len(). Only skbs with an unset MAC header are affected;
valid ones are dumped as before.
The native and compat get-entries paths copy the fixed rule entry header
from the kernelized rule blob to userspace before overwriting the entry's
counter fields with a sanitized counter snapshot.
On SMP kernels, entry->counters.pcnt contains the percpu allocation
address used by x_tables rule counters. A caller can provide a userspace
buffer that faults during the initial fixed-header copy after pcnt has
been copied but before the later sanitized counter copy runs. The syscall
then returns -EFAULT while leaving the raw percpu pointer in userspace.
Copy only the fixed entry prefix before counters from the kernelized rule
blob, then copy the sanitized counter snapshot into the counter field.
Apply this ordering to the IPv4, IPv6, and ARP native and compat
get-entries implementations so a fault cannot expose the internal percpu
counter pointer.
Fixes: 71ae0dff02d7 ("netfilter: xtables: use percpu rule counters") Signed-off-by: Kyle Zeng <kylebot@openai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Weiming Shi [Wed, 3 Jun 2026 07:38:17 +0000 (00:38 -0700)]
netfilter: nf_conntrack: destroy stale expectfn expectations on unregister
NAT helpers such as nf_nat_h323 store a raw pointer to module text in
exp->expectfn (e.g. ip_nat_q931_expect). nf_ct_helper_expectfn_unregister()
only unlinks the callback descriptor and never walks the expectation table,
so an expectation pending at module removal survives with a dangling
exp->expectfn into freed module text.
When the expected connection arrives, init_conntrack() invokes
exp->expectfn(), now a stale pointer into the unloaded module. Reproduced
on a KASAN build by loading the H.323 helpers, creating a Q.931
expectation, unloading nf_nat_h323, then connecting to the expected port:
Reaching the dangling state requires CAP_SYS_MODULE in the initial user
namespace to remove a NAT helper that still has live expectations, so this
is a robustness fix; leaving an expectation pointing at freed text is wrong
regardless.
Add nf_ct_helper_expectfn_destroy(), which walks the expectation table and
drops every expectation whose ->expectfn matches the descriptor being torn
down. Call it from each NAT helper's exit path after the existing RCU grace
period, so no expectation outlives the code it points at and no extra
synchronize_rcu() is introduced. With the fix, the same reproducer runs to
completion without the Oops.
Fixes: f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port") Reported-by: Xiang Mei <xmei5@asu.edu> Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ebt_redirect_tg() dereferences br_port_get_rcu() return without a
NULL check, causing a kernel panic when the bridge port has been
removed between the original hook invocation and an NFQUEUE
reinject.
A mere NULL check isn't sufficient, however. As sashiko review
points out userspace can not only remove the port from the bridge,
it could also place the device in a different virtual device, e.g.
macvlan.
If this happens, we must drop the packet, there is no way for us to
reinject it into the bridge path.
Switch to _upper API, we don't need the bridge port structure.
Also, this fix keeps another bug intact:
Both nfnetlink_log and nfnetlink_queue use CONFIG_BRIDGE_NETFILTER
too aggressive, which prevents certain logging features when queueing
in bridge family: NETFILTER_FAMILY_BRIDGE can be enabled while the old
CONFIG_BRIDGE_NETFILTER cruft is off.
Fixes tag is a common ancestor, this was always broken.
Fixes: f350a0a87374 ("bridge: use rx_handler_data pointer to store net_bridge_port pointer") Reported-by: Ji'an Zhou <eilaimemedsnaimel@gmail.com> Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Felix Gu [Wed, 10 Jun 2026 12:08:17 +0000 (20:08 +0800)]
spi: rzv2h-rspi: Fix SPDR read access width for 16-bit RX
The RZ/V2H hardware manual (section 7.5.2.2.1) specifies that read access
size for the SPI Data Register (SPDR) are fixed at 32 bits. The
RZV2H_RSPI_RX macro for the 16-bit data path used readw(), violating
this requirement.
Switch to readl() for the 16-bit RX path to conform to the hardware
specification.
Fixes: 8b61c8919dff ("spi: Add driver for the RZ/V2H(P) RSPI IP") Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Felix Gu <ustc.gu@gmail.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com> Link: https://patch.msgid.link/20260610-rzv2h-rspi-v2-1-40c80b4a2c90@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
Breno Leitao [Mon, 8 Jun 2026 09:32:05 +0000 (02:32 -0700)]
rds: mark snapshot pages dirty in rds_info_getsockopt()
rds_info_getsockopt() pins the destination user pages with FOLL_WRITE and
the RDS_INFO_* producers memcpy the snapshot into them through
kmap_atomic(). Because that copy goes through the kernel direct map, the
dirty bit on the user PTE is never set, so unpin_user_pages() releases the
pages without marking them dirty. A file-backed destination page can then
be reclaimed without writeback, silently discarding the copied data.
Use unpin_user_pages_dirty_lock() with make_dirty=true so the modified
pages are marked dirty before they are unpinned.
Eric Dumazet [Mon, 8 Jun 2026 16:46:13 +0000 (16:46 +0000)]
ip6_vti: fix incorrect tunnel matching in vti6_tnl_lookup()
In vti6_tnl_lookup(), when an exact match for a tunnel fails,
the code falls back to searching for wildcard tunnels:
- Tunnels matching the packet's local address, with any remote address
wildcard remote).
- Tunnels matching the packet's remote address, with any local address
(wildcard local).
However, vti6 stores all these different types of tunnels in the same
hash table (ip6n->tnls_r_l) prone to hash collisions.
The bug is that the fallback search loops in vti6_tnl_lookup() were
missing checks to ensure that the candidate tunnel actually has
a wildcard address.
Fixes: fbe68ee87522 ("vti6: Add a lookup method for tunnels with wildcard endpoints.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20260608164613.933023-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiko Carstens [Tue, 9 Jun 2026 10:33:43 +0000 (12:33 +0200)]
s390/tishift: Convert __ashlti3(), __ashrti3(), __lshrti3() to C
There is no reason to have __ashlti3(), __ashrti3(), and __lshrti3()
implemented in assembler. Convert them all to C, which allows the
compiler to optimize the code if newer instructions allow that.
Heiko Carstens [Tue, 9 Jun 2026 10:33:42 +0000 (12:33 +0200)]
s390/memmove: Optimize backward copy case
memmove() copies byte wise for the backward copy case, when the mvc
instruction cannot be used. This is quite slow, but can be optimized
with the mvcrl instruction, which is available since z15.
Some numbers (measured on a shared z16 LPAR) show that the new
implementation is nearly always faster, except for the non realistic
one and two byte cases:
Heiko Carstens [Tue, 9 Jun 2026 10:33:41 +0000 (12:33 +0200)]
s390/string: Convert memset(16|32|64)() to C
Convert memset(16|32|64)() from assembler to C, which should make it
easier to read and change, if required. And it allows the compiler to
optimize the code, and use different instructions, except for the used
inline assemblies.
Heiko Carstens [Tue, 9 Jun 2026 10:33:40 +0000 (12:33 +0200)]
s390/string: Convert memcpy() to C
Convert memcpy() from assembler to C, which should make it easier to
read and change, if required. And it allows the compiler to optimize
the code, and use different instructions, except for the used inline
assemblies.
Heiko Carstens [Tue, 9 Jun 2026 10:33:39 +0000 (12:33 +0200)]
s390/string: Convert memset() to C
Convert memset() from assembler to C, which should make it easier to
read and change, if required. And it allows the compiler to optimize
the code, and use different instructions, except for the used inline
assemblies.
Heiko Carstens [Tue, 9 Jun 2026 10:33:38 +0000 (12:33 +0200)]
s390/string: Convert memmove() to C
Convert memmove() from assembler to C, which should make it easier to
read and change, if required. And it allows the compiler to optimize
the code, and use different instructions, except for the used inline
assemblies.
Heiko Carstens [Tue, 9 Jun 2026 10:33:36 +0000 (12:33 +0200)]
s390: Add .noinstr.text to boot and purgatory linker scripts
Upcoming changes will result in a .noinstr.text section within the
boot and purgatory string.o binary. Explicitly add the new section to
avoid orphaned warnings from the linker.
The purgatory code is compiled without the -march option. This means the
default architecture level of the compiler is used. This can cause
problems, e.g. if instructions used in inline assemblies are for a higher
architecture level than the default architecture level of the compiler.
Use z10 as minimum architecture level, similar to the boot code, to enforce
a defined architecture level set.
===================
Rust support on s390 requires a small set of architecture-specific pieces
before the generic Rust kernel infrastructure can be used.
The series wires up s390 as a Rust-capable 64-bit architecture, adds the
missing assembly interfaces needed by Rust for WARN/BUG reporting and for
static branches, adjusts bindgen parameters to avoid repr layout conflicts
caused by packed and aligned s390 structures, and fixes issues discovered
during testing.
s390 currently requires rustc with support for -Zpacked-stack, and the
minimum tool version gating is adjusted accordingly.
Jan Polensky [Mon, 1 Jun 2026 17:46:25 +0000 (19:46 +0200)]
s390: Enable Rust support
Enable building Rust code on s390 by wiring the architecture into the
kernel Rust infrastructure.
Add s390 to the Rust arch support documentation, provide the s390 Rust
target and required compiler flags, and set the bindgen target for
arch/s390. Adjust the Rust target generation and minimum rustc version
gating so the s390 setup is handled explicitly.
The Rust toolchain uses the "s390x" triple naming for the 64 bit target.
Rust support is currently incompatible with CONFIG_EXPOLINE, which
relies on compiler support for the -mindirect-branch= and
-mfunction_return= options. Therefore, select HAVE_RUST only when
EXPOLINE is disabled.
Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Gary Guo <gary@garyguo.net> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Jan Polensky [Mon, 1 Jun 2026 17:46:24 +0000 (19:46 +0200)]
s390/cmpxchg: Fix KASAN stack-out-of-bounds in atomic helpers
The __arch_cmpxchg1, __arch_cmpxchg2, __arch_xchg1, and __arch_xchg2
functions emulate 1-byte and 2-byte atomic operations using 4-byte
cmpxchg instructions, since s390 lacks native 1/2-byte cmpxchg support.
When KASAN is enabled, the READ_ONCE() operations in these functions
trigger stack-out-of-bounds warnings because they perform 4-byte reads
when only 1 or 2 bytes should be accessed.
Mark these functions as __no_sanitize_or_inline to prevent KASAN
instrumentation while maintaining correct functionality.
This resolves the following KASAN error during rust_atomics KUnit tests:
BUG: KASAN: stack-out-of-bounds in rust_helper_atomic_i8_xchg+0xb2/0xc0
Read of size 4 at addr 001bff7ffdbefcf0 by task kunit_try_catch/142
Jan Polensky [Mon, 1 Jun 2026 17:46:23 +0000 (19:46 +0200)]
rust: helpers: Add memchr wrapper for string operations
Add a dedicated string helper file with a memchr wrapper that uses the
kernel's instrumented memchr() function to ensure KASAN and FORTIFY_SOURCE
protections are preserved for Rust code.
Jan Polensky [Mon, 1 Jun 2026 17:46:22 +0000 (19:46 +0200)]
rust/bindgen_parameters: Mark s390 types as opaque to prevent repr conflicts
Bindgen attempts to generate Rust layouts for a number of s390 structs
that are packed but contain, or transitively contain, aligned fields.
Rust rejects such layouts with E0588 ("packed type cannot transitively
contain a #[repr(align)] type").
Add the affected s390 types to the opaque type list so bindgen emits
opaque blob types instead of full representations. This matches existing
workarounds for x86 types such as alt_instr and x86_msi_data.
Link: https://lore.kernel.org/all/e5c7aa10-590d-0d20-dd3b-385bee2377e7@intel.com/ Acked-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Gary Guo <gary@garyguo.net> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Jan Polensky [Mon, 1 Jun 2026 17:46:21 +0000 (19:46 +0200)]
s390/jump_label: Implement ARCH_STATIC_BRANCH_JUMP_ASM and ARCH_STATIC_BRANCH_ASM macros
Rust static branch support needs the s390 jump label instruction sequence
and __jump_table emission in a reusable form. The current implementation
embeds the sequence directly in the C asm goto blocks, which cannot be
shared with Rust.
Introduce ARCH_STATIC_BRANCH_ASM and ARCH_STATIC_BRANCH_JUMP_ASM to
describe the brcl sequences for the likely-false and likely-true cases
and to emit the same __jump_table entries as before. Switch the existing
C helpers to use the new macros to avoid duplication without changing
the generated code.
Acked-by: Gary Guo <gary@garyguo.net> Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Jan Polensky [Mon, 1 Jun 2026 17:46:20 +0000 (19:46 +0200)]
s390/bug: Provide ARCH_WARN_ASM for Rust WARN/BUG support
Rust WARN and BUG support relies on ARCH_WARN_ASM to emit __bug_table
entries. On s390 the macro is missing, so Rust code cannot generate
proper WARN/BUG metadata for the kernel's bug reporting infrastructure.
Define ARCH_WARN_ASM to produce the same assembly sequence and
__bug_table entry format as the existing s390 BUG handling, including
the monitor call. Define ARCH_WARN_REACHABLE as empty since s390 does
not provide reachability analysis for warning paths.
Acked-by: Gary Guo <gary@garyguo.net> Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Linus Torvalds [Wed, 10 Jun 2026 14:18:32 +0000 (07:18 -0700)]
Merge tag 'riscv-for-linux-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
- Fix the implementation of the CFI branch landing pad control prctl()s
to return -EINVAL if unknown control bits are set, rather than
silently ignoring the request; and add a kselftest for this case
- Fix unaligned access performance testing to happen earlier in boot,
which fixes a performance regression in the lib/checksum code
- Fix a binfmt_elf warning when dumping core (due to missing
.core_note_name for CFI registers)
* tag 'riscv-for-linux-7.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: cfi: reject unknown flags in PR_SET_CFI
riscv: Fix fast_unaligned_access_speed_key not getting initialized
riscv/ptrace: Use USER_REGSET_NOTE_TYPE for REGSET_CFI
Jann Horn [Fri, 5 Jun 2026 20:27:33 +0000 (22:27 +0200)]
namespace: restrict OPEN_TREE_NAMESPACE/FSMOUNT_NAMESPACE to directories
open_tree(..., OPEN_TREE_NAMESPACE) and
fsmount(..., FSMOUNT_NAMESPACE, ...) currently work on non-directories,
like regular files. That's bad for two reasons:
- It ends up mounting a regular file over the inherited namespace root,
which is a directory; mounting a non-directory over a directory is
normally explicitly forbidden, see for example do_move_mount()
- It causes setns() on the new namespace to set the cwd to a regular
file, which the rest of VFS does not expect
Fix it by restricting create_new_namespace() (which is used by both of
these flags) to directories.
Leave the behavior for OPEN_TREE_CLONE as-is, that seems unproblematic.
Fixes: 9b8a0ba68246 ("mount: add OPEN_TREE_NAMESPACE") Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pei Xiao [Wed, 10 Jun 2026 01:49:12 +0000 (09:49 +0800)]
hwmon: (gpd-fan): fix race condition between device removal and sysfs access
Replace the manual gpd_fan_remove() callback with a devres-managed
action using devm_add_action_or_reset(). The original remove hook
resets the fan to AUTOMATIC mode, but the hwmon sysfs interface
(registered with devm_hwmon_device_register_with_info()) remains
active until after the remove callback completes. This creates a
race window where a concurrent userspace sysfs access can interleave
with the EC I/O sequence, potentially corrupting EC registers.
Using devm_add_action_or_reset() registers the reset function as a
devres action. Due to the LIFO release order of devres, the hwmon
device is unregistered (sysfs removed) before the reset action
executes, eliminating the race condition.
Pei Xiao [Wed, 10 Jun 2026 01:49:11 +0000 (09:49 +0800)]
hwmon: (gpd-fan): upgrade log level from warn to err for platform device creation failure
When platform_create_bundle() fails, the error is fatal and prevents the
driver from loading. Use pr_err() instead of pr_warn() to clearly indicate
a critical failure.
Pei Xiao [Wed, 10 Jun 2026 01:49:10 +0000 (09:49 +0800)]
hwmon: (gpd-fan): Initialize EC before registering hwmon device
Move the gpd_init_ec() call to before devm_hwmon_device_register_with_info
in the probe function. With the previous ordering the hwmon device was
registered and exposed to userspace before the EC initialization
completes, creating a window where sysfs reads could return invalid values.
Some buggy firmware won't initialize EC properly on boot. Before its
initialization, reading RPM will always return 0, and writing PWM will have
no effect. So move gpd_init_ec to before hwmon device register.
Pei Xiao [Wed, 10 Jun 2026 01:49:09 +0000 (09:49 +0800)]
hwmon: (gpd-fan): drop global driver data and use per-device allocation
replace the global state gpd_driver_priv with per-device private data
(struct gpd_fan_data) allocated in probe. This allows the driver to
support multiple instances in the future and aligns with kernel best
practices.
ADPM12250 is a quarter brick DC/DC Power Module. It is a high power
non-isolated converter capable of delivering regulated 12V with
continuous power level of 2500W. Uses PMBus.
The pnp_device_id array is only used for module data to support
auto-loading the floppy module. So the .driver_data member is unused and
this assignment can be dropped.
While touching that array, align the coding style to what is used most
for these.
This patch doesn't modify the compiled array, only its representation
in source form benefits. The former was confirmed with x86 and arm64
builds.
Tomer Maimon [Tue, 9 Jun 2026 16:39:19 +0000 (19:39 +0300)]
spi: dt-bindings: nuvoton,npcm750-fiu: Convert to DT schema
Convert the Nuvoton NPCM FIU binding to DT schema format.
Document the required control registers and the optional direct-
mapped flash window separately, matching the driver behavior
when the direct mapping is not described.
Will Deacon [Wed, 10 Jun 2026 11:00:21 +0000 (12:00 +0100)]
arm64: errata: Mitigate TLBI errata on Microsoft Azure Cobalt 100 CPU
Commit fb091ff39479 ("arm64: Subscribe Microsoft Azure Cobalt 100 to ARM
Neoverse N2 errata") states that Microsoft Azure Cobalt 100 CPU "is a
Microsoft implemented CPU based on r0p0 of the ARM Neoverse N2 CPU, and
therefore suffers from all the same errata.".
So enable the workaround for the latest broadcast TLB invalidation bug
on these parts.
arm64: errata: Mitigate TLBI errata on NVIDIA Olympus CPU
NVIDIA Olympus cores are affected by the TLBI completion issue tracked as
CVE-2025-10263. The existing ARM64_ERRATUM_4118414 handling already uses
ARM64_WORKAROUND_REPEAT_TLBI to issue an additional broadcast TLBI;DSB
sequence and ensure affected memory write effects are globally observed.
Add MIDR_NVIDIA_OLYMPUS to the repeat-TLBI match list so the same
mitigation is enabled on affected Olympus systems. Also document the
NVIDIA Olympus erratum in the arm64 silicon errata table and list it in
the Kconfig help text.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
Mark Rutland [Tue, 9 Jun 2026 10:12:03 +0000 (11:12 +0100)]
arm64: errata: Mitigate TLBI errata on various Arm CPUs
A number of CPUs developed by Arm suffer from errata whereby a broadcast
TLBI;DSB sequence may complete before the global observation of writes
which are translated by an affected TLB entry.
These errata ONLY affect the completion of memory accesses which have
been translated by an invalidated TLB entry, and these errata DO NOT
affect the actual invalidation of TLB entries. TLB entries are removed
correctly.
This issue has been assigned CVE ID CVE-2025-10263.
To mitigate this issue, Arm recommends that software follows any
affected TLBI;DSB sequence with an additional TLBI;DSB, which will
ensure that all memory write effects affected by the first TLBI have
been globally observed. The additional TLBI can use any operation that
is broadcast to affected CPUs, and the additional DSB can use any option
that is sufficient to complete the additional TLBI.
The ARM64_WORKAROUND_REPEAT_TLBI workaround is sufficient to mitigate
the issue. Enable this workaround for affected CPUs, and update the
silicon errata documentation accordingly.
Note that due to the manner in which Arm develops IP and tracks errata,
some CPUs share a common erratum number.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>