]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
9 days agoKVM: selftests: arm64: Fix steal_time test after UAPI refactoring
Sebastian Ott [Mon, 4 May 2026 11:28:08 +0000 (13:28 +0200)] 
KVM: selftests: arm64: Fix steal_time test after UAPI refactoring

Fix the following failure to the steal_time test on arm64 by making
the timer address known to the guest.

==== Test Assertion Failure ====
  steal_time.c:229: !ret
  pid=18514 tid=18514 errno=22 - Invalid argument
     1  0x000000000040252f: check_steal_time_uapi at steal_time.c:229 (discriminator 20)
     2   (inlined by) main at steal_time.c:537 (discriminator 20)
     3  0x0000ffffa23d621b: ?? ??:0
     4  0x0000ffffa23d62fb: ?? ??:0
     5  0x0000000000402b6f: _start at ??:?
  KVM_SET_DEVICE_ATTR failed, rc: -1 errno: 22 (Invalid argument)

Fixes: 40351ed924dd ("KVM: selftests: Refactor UAPI tests into dedicated function")
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://patch.msgid.link/20260504112808.21276-1-sebott@redhat.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
9 days agoKVM: arm64: Handle permission faults with guest_memfd
Alexandru Elisei [Tue, 5 May 2026 09:49:13 +0000 (10:49 +0100)] 
KVM: arm64: Handle permission faults with guest_memfd

gmem_abort() calls kvm_pgtable_stage2_map() to make changes to stage 2. It
does this for both relaxing permissions on an existing mapping and to
install a missing mapping.

kvm_pgtable_stage2_map() doesn't make changes to stage 2 if there is an
existing, valid entry and the new entry modifies only the permissions.
This is checked in:

kvm_pgtable_stage2_map()
  stage2_map_walk_leaf()
     stage2_map_walker_try_leaf()
       stage2_pte_needs_update()

and if only the permissions differ, kvm_pgtable_stage2_map() returns
-EAGAIN and KVM returns to the guest to replay the instruction. The
assumption is that a concurrent fault on a different VCPU already mapped
the faulting IPA, and replaying the instruction will either succeed, or
cause a permission fault, which should be handled with
kvm_pgtable_stage2_relax_perms().

gmem_abort(), on a read or write fault on a system without DIC (instruction
cache invalidation required for data to instruction coherence), installs a
valid entry with read and write permissions, but without executable
permissions. On an execution fault on the same page, gmem_abort() attempts
to relax the permissions to allow execution, but calls
kvm_pgtable_stage2_map() to change the existing, valid, entry.
kvm_pgtable_stage2_map() returns -EAGAIN and KVM resumes execution from the
faulting instruction, which leads to an infinite loop of permission faults
on the same instruction.

Allow the guest to make progress by using kvm_pgtable_stage2_relax_perms()
to relax permissions.

Fixes: a7b57e099592 ("KVM: arm64: Handle guest_memfd-backed guest page faults")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260505094913.75317-1-alexandru.elisei@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
9 days agoKVM: arm64: nv: Consider the DS bit when translating TCR_EL2
Wei-Lin Chang [Tue, 5 May 2026 14:47:35 +0000 (15:47 +0100)] 
KVM: arm64: nv: Consider the DS bit when translating TCR_EL2

When running an nVHE L1, TCR_EL2 is mapped to TCR_EL1. Writes to the
register are trapped and written to TCR_EL1 after a translation.
Booting an nVHE L1 with 52-bit VA isn't working because the translation
was ignoring the DS bit set by the guest, hence causing repeating level
0 faults. Add it in the translation function.

Signed-off-by: Wei-Lin Chang <weilin.chang@arm.com>
Link: https://patch.msgid.link/20260505144735.1496530-1-weilin.chang@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
9 days agoKVM: arm64: Work around C1-Pro erratum 4193714 for protected guests
James Morse [Tue, 5 May 2026 16:52:03 +0000 (17:52 +0100)] 
KVM: arm64: Work around C1-Pro erratum 4193714 for protected guests

C1-Pro cores with SME have an erratum where TLBI+DSB does not complete
all outstanding SME accesses. Instead a DSB needs to be executed on the
affected CPUs. The implication is that pages cannot be unmapped from the
host Stage 2 and then provided to a protected guest or to the
hypervisor. Host SME accesses may still complete after this point.

This erratum breaks pKVM's guarantees, and the workaround is hard to
implement as EL2 and EL1 share a security state meaning EL1 can mask
IPIs sent by EL2, leading to interrupt blackouts.

Instead, do this in EL3. This has the advantage of a separate security
state, meaning lower EL cannot mask the IPI. It is also simpler for EL3
to know about CPUs that are off or in PSCI's CPU_SUSPEND.

Add the needed hook to host_stage2_set_owner_metadata_locked(). This
covers the cases where the host loses access to a page:

  __pkvm_host_donate_guest()
  __pkvm_guest_unshare_host()
  host_stage2_set_owner_locked() when owner_id == PKVM_ID_HYP

Since pKVM relies on the firmware call for correctness, check for the
firmware counterpart during protected KVM initialisation and fail the
pKVM initialisation if it is missing.

Signed-off-by: James Morse <james.morse@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Sudeep Holla <sudeep.holla@kernel.org>
Link: https://patch.msgid.link/20260505165205.2690919-1-catalin.marinas@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
9 days agosched/fair: Fix wakeup_preempt_fair() for not waking up task
Vincent Guittot [Sun, 3 May 2026 10:45:03 +0000 (12:45 +0200)] 
sched/fair: Fix wakeup_preempt_fair() for not waking up task

Make sure to only call pick_next_entity() on an non-empty cfs_rq.

The assumption that p is always enqueued and not delayed, is only true for
wakeup. If p was moved while delayed, pick_next_entity() will dequeue it and
the cfs might become empty. Test if there are still queued tasks before trying
again to determine if p could be the next one to be picked.

There are at least 2 cases:

When cfs becomes idle, it tries to pull tasks but if those pulled tasks are
delayed, they will be dequeued when attached to cfs. attach_tasks() ->
attach_task() -> wakeup_preempt(rq, p, 0);

A misfit task running on cfs A triggers a load balance to be pulled on a better
cpu, the load balance on cfs B starts an active load balance to pulled the
running misfit task. If there is a delayed dequeue task on cfs A, it can be
pulled instead of the previously running misfit task. attach_one_task() ->
attach_task() -> wakeup_preempt(rq, p, 0);

Fixes: ac8e69e69363 ("sched/fair: Fix wakeup_preempt_fair() vs delayed dequeue")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260503104503.1732682-1-vincent.guittot@linaro.org
9 days agosched/fair: Fix overflow in vruntime_eligible()
Zhan Xusheng [Fri, 1 May 2026 10:40:06 +0000 (12:40 +0200)] 
sched/fair: Fix overflow in vruntime_eligible()

Zhan Xusheng reported running into sporadic a s64 mult overflow in
vruntime_eligible().

When constructing a worst case scenario:

If you have cgroups, then you can have an entity of weight 2 (per
calc_group_shares()), and its vlag should then be bounded by: (slice+TICK_NSEC)
* NICE_0_LOAD, which is around 44 bits as per the comment on entity_key().

The other extreme is 100*NICE_0_LOAD, thus you get:

{key, weight}[] := {
  puny: { (slice + TICK_NSEC) * NICE_0_LOAD, 2               },
  max:  { 0,                                 100*NICE_0_LOAD },
}

The avg_vruntime() would end up being very close to 0 (which is
zero_vruntime), so no real help making that more accurate.

vruntime_eligible(puny) ends up with:

 avg  = 2 * puny.key (+ 0)
 load = 2 + 100 * NICE_0_LOAD

 avg >= puny.key * load

And that is: (slice + TICK_NSEC) * NICE_0_LOAD * NICE_0_LOAD * 100, which will
overflow s64.

Zhan suggested using __builtin_mul_overflow(), however after staring at
compiler output for various architectures using godbolt, it seems that using an
__int128 multiplication often results in better code.

Specifically, a number of architectures already compute the __int128 product to
determine the overflow. Eg. arm64 already has the 'smulh' instruction used. By
explicitly doing an __int128 multiply, it will emit the 'mul; smulh' pattern,
which modern cores can fuse (armv8-a clang-22.1.0). x86_64 has less branches
(no OF handling).

Since Linux has ARCH_SUPPORTS_INT128 to gate __int128 usage, also provide the
__builtin_mul_overflow() variant as a fallback.

[peterz: Changelog and __int128 bits]
Fixes: 556146ce5e94 ("sched/fair: Avoid overflow in enqueue_entity()")
Reported-by: Zhan Xusheng <zhanxusheng1024@gmail.com>
Closes: https://patch.msgid.link/20260415145742.10359-1-zhanxusheng%40xiaomi.com
Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260505103155.GN3102924%40noisy.programming.kicks-ass.net
9 days agoselftests/rseq: Expand for optimized RSEQ ABI v2
Thomas Gleixner [Sat, 25 Apr 2026 12:48:23 +0000 (14:48 +0200)] 
selftests/rseq: Expand for optimized RSEQ ABI v2

Update the selftests so they are executed for legacy (32 bytes RSEQ region)
and optimized RSEQ ABI v2 mode.

Fixes: d6200245c75e ("rseq: Allow registering RSEQ with slice extension")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224428.009121296%40kernel.org
Cc: stable@vger.kernel.org
9 days agorseq: Reenable performance optimizations conditionally
Thomas Gleixner [Sun, 26 Apr 2026 08:01:56 +0000 (10:01 +0200)] 
rseq: Reenable performance optimizations conditionally

Due to the incompatibility with TCMalloc the RSEQ optimizations and
extended features (time slice extensions) have been disabled and made
run-time conditional.

The original RSEQ implementation, which TCMalloc depends on, registers a 32
byte region (ORIG_RSEG_SIZE). This region has a 32 byte alignment
requirement.

The extension safe newer variant exposes the kernel RSEQ feature size via
getauxval(AT_RSEQ_FEATURE_SIZE) and the alignment requirement via
getauxval(AT_RSEQ_ALIGN). The alignment requirement is that the registered
RSEQ region is aligned to the next power of two of the feature size. The
kernel currently has a feature size of 33 bytes, which means the alignment
requirement is 64 bytes.

The TCMalloc RSEQ region is embedded into a cache line aligned data
structure starting at offset 32 bytes so that bytes 28-31 and the
cpu_id_start field at bytes 32-35 form a 64-bit little endian pointer with
the top-most bit (63 set) to check whether the kernel has overwritten
cpu_id_start with an actual CPU id value, which is guaranteed to not have
the top most bit set.

As this is part of their performance tuned magic, it's a pretty safe
assumption, that TCMalloc won't use a larger RSEQ size.

This allows the kernel to declare that registrations with a size greater
than the original size of 32 bytes, which is the cases since time slice
extensions got introduced, as RSEQ ABI v2 with the following differences to
the original behaviour:

  1) Unconditional updates of the user read only fields (CPU, node, MMCID)
     are removed. Those fields are only updated on registration, task
     migration and MMCID changes.

  2) Unconditional evaluation of the criticial section pointer is
     removed. It's only evaluated when user space was interrupted and was
     scheduled out or before delivering a signal in the interrupted
     context.

  3) The read/only requirement of the ID fields is enforced. When the
     kernel detects that userspace manipulated the fields, the process is
     terminated. This ensures that multiple entities (libraries) can
     utilize RSEQ without interfering.

  4) Todays extended RSEQ feature (time slice extensions) and future
     extensions are only enabled in the v2 enabled mode.

Registrations with the original size of 32 bytes operate in backwards
compatible legacy mode without performance improvements and extended
features.

Unfortunately that also affects users of older GLIBC versions which
register the original size of 32 bytes and do not evaluate the kernel
required size in the auxiliary vector AT_RSEQ_FEATURE_SIZE.

That's the result of the lack of enforcement in the original implementation
and the unwillingness of a single entity to cooperate with the larger
ecosystem for many years.

Implement the required registration changes by restructuring the spaghetti
code and adding the size/version check. Also add documentation about the
differences of legacy and optimized RSEQ V2 mode.

Thanks to Mathieu for pointing out the ORIG_RSEQ_SIZE constraints!

Fixes: d6200245c75e ("rseq: Allow registering RSEQ with slice extension")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224427.927160119%40kernel.org
Cc: stable@vger.kernel.org
9 days agorseq: Implement read only ABI enforcement for optimized RSEQ V2 mode
Thomas Gleixner [Sun, 26 Apr 2026 14:21:02 +0000 (16:21 +0200)] 
rseq: Implement read only ABI enforcement for optimized RSEQ V2 mode

The optimized RSEQ V2 mode requires that user space adheres to the ABI
specification and does not modify the read-only fields cpu_id_start,
cpu_id, node_id and mm_cid behind the kernel's back.

While the kernel does not rely on these fields, the adherence to this is a
fundamental prerequisite to allow multiple entities, e.g. libraries, in an
application to utilize the full potential of RSEQ without stepping on each
other toes.

Validate this adherence on every update of these fields. If the kernel
detects that user space modified the fields, the application is force
terminated.

Fixes: d6200245c75e ("rseq: Allow registering RSEQ with slice extension")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224427.845230956%40kernel.org
Cc: stable@vger.kernel.org
9 days agoselftests/rseq: Validate legacy behavior
Thomas Gleixner [Sun, 26 Apr 2026 15:51:07 +0000 (17:51 +0200)] 
selftests/rseq: Validate legacy behavior

The RSEQ legacy mode behavior requires that the ID fields in the rseq
region are unconditionally updated on every context switch and before
signal delivery even if not required by the ABI specification.

To ensure that this behavior is preserved for legacy users in the future,
add a test which validates that with a sleep() and a signal sent to self.

Provide a run script which prevents GLIBC from registering a RSEQ region,
so that the test can register it's own legacy sized region.

Fixes: 566d8015f7ee ("rseq: Avoid CPU/MM CID updates when no event pending")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://patch.msgid.link/20260428224427.764705536%40kernel.org
Cc: stable@vger.kernel.org
9 days agoovl: fix verity lazy-load guard broken by fsverity_active() semantic change
Colin Walters [Tue, 5 May 2026 22:42:57 +0000 (15:42 -0700)] 
ovl: fix verity lazy-load guard broken by fsverity_active() semantic change

Commit f77f281b6118 ("fsverity: use a hashtable to find the
fsverity_info") made fsverity_active() check whether the inode has the
verity flag, rather than whether the inode's fsverity_info is loaded.
This broke ovl_ensure_verity_loaded(), which wants to load the
fsverity_info for any verity inodes that haven't had it loaded yet.

Therefore, to check that the fsverity_info hasn't yet been loaded, use
fsverity_get_info(inode) == NULL instead of !fsverity_active(inode).

Also, since fsverity_get_info() now involves a hash table lookup, put
the more lightweight IS_VERITY() flag check first.

Fixes: f77f281b6118 ("fsverity: use a hashtable to find the fsverity_info")
Cc: stable@vger.kernel.org
Link: https://github.com/bootc-dev/bootc/issues/2174
Signed-off-by: Colin Walters <walters@verbum.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260505224257.23213-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
9 days agoMerge tag 'wireless-2026-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Wed, 6 May 2026 14:29:31 +0000 (07:29 -0700)] 
Merge tag 'wireless-2026-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Quite a number of fixes now:
 - mac80211
   - remove HT NSS validation to work with broken APs
     (with a kunit fix now)
   - remove 'static' that could cause races
   - check station link lookup before further processing
   - fix use-after-free due to delete in list iteration
   - remove AP station on assoc failures to fix crashes
 - ath12k
   - fix OF node refcount imbalance
   - fix queue flush ("REO update") in MLO
   - fix RCU assert
 - ath12k:
   - fix Kconfig with POWER_SEQUENCING
   - fix WMI buffer leaks on error conditions
   - don't use uninitialized stack data when processing RSSI events
   - fix logic for determining the peer ID in the RX path
 - ath5k: fix a potential stack buffer overwrite
 - rsi: fix thread lifetime race
 - brcmfmac: fix potential UAF
 - nl80211:
   - stricter permissions/checks for PMK and netns
   - fix netlink policy vs. code type confusion
 - cw1200: revert a broken locking change
 - various fixes to not trust values from firmware

* tag 'wireless-2026-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (25 commits)
  wifi: nl80211: re-check wiphy netns in nl80211_prepare_wdev_dump() continuation
  wifi: nl80211: require CAP_NET_ADMIN over the target netns in SET_WIPHY_NETNS
  wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage
  wifi: mac80211: remove station if connection prep fails
  wifi: mac80211: use safe list iteration in radar detect work
  wifi: libertas: notify firmware load wait on disconnect
  wifi: ath5k: do not access array OOB
  wifi: ath12k: fix peer_id usage in normal RX path
  wifi: ath12k: initialize RSSI dBm conversion event state
  wifi: ath12k: fix leak in some ath12k_wmi_xxx() functions
  wifi: cw1200: Revert "Fix locking in error paths"
  wifi: mac80211: tests: mark HT check strict
  wifi: rsi: fix kthread lifetime race between self-exit and external-stop
  wifi: mac80211: drop stray 'static' from fast-RX rx_result
  wifi: mac80211: check ieee80211_rx_data_set_link return in pubsta MLO path
  wifi: nl80211: require admin perm on SET_PMK / DEL_PMK
  wifi: libertas: fix integer underflow in process_cmdrequest()
  wifi: b43legacy: enforce bounds check on firmware key index in RX path
  wifi: b43: enforce bounds check on firmware key index in b43_rx()
  wifi: brcmfmac: Fix potential use-after-free issue when stopping watchdog task
  ...
====================

Link: https://patch.msgid.link/20260506110325.219675-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoMerge tag 'efi-fixes-for-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 6 May 2026 14:27:30 +0000 (07:27 -0700)] 
Merge tag 'efi-fixes-for-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

 - Fix issues in EFI graceful recovery on x86 introduced by changes to
   the kernel mode FPU APIs

 - I-cache coherency fixes for the LoongArch EFI stub

 - Locking fix for EFI pstore

 - Code tweak for efivarfs

* tag 'efi-fixes-for-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  x86/efi: Restore IRQ state in EFI page fault handler
  x86/efi: Fix graceful fault handling after FPU softirq changes
  efi/libstub: Synchronize instruction cache after kernel relocation
  efi/loongarch: Implement efi_cache_sync_image()
  efi/libstub: Move efi_relocate_kernel() into its only remaining user
  efi: pstore: Drop efivar lock when efi_pstore_open() returns with an error
  efivarfs: use QSTR() in efivarfs_alloc_dentry

9 days agoMerge tag 'asoc-fix-v7.1-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git...
Takashi Iwai [Wed, 6 May 2026 14:10:00 +0000 (16:10 +0200)] 
Merge tag 'asoc-fix-v7.1-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v7.1

Another batch of fixes, plus a couple of quirks (mostly AMD ones, as has
been the case recently).  All driver changes, including fixes for the
KUnit tests for the Cirrus drivers that could cause memory corruption.

9 days agospi: ch341: correct company name in MODULE_DESCRIPTION
Jiawei Liu [Wed, 6 May 2026 06:24:12 +0000 (14:24 +0800)] 
spi: ch341: correct company name in MODULE_DESCRIPTION

The company name "QiHeng Electronics" is incorrect.
The correct legal name is "Nanjing Qinheng Microelectronics Co., Ltd.".

Update the module description accordingly.

Signed-off-by: Jiawei Liu <ljw@wch.cn>
Link: https://patch.msgid.link/20260506062412.371034-1-ljw@wch.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
9 days agoregulator: qcom-rpmh: Fix index for pmh0101 ldo16
Fenglin Wu [Wed, 6 May 2026 09:28:51 +0000 (02:28 -0700)] 
regulator: qcom-rpmh: Fix index for pmh0101 ldo16

The wrong index is assigned to pmh0101 ldo16, which results incorrect
rpmh resource being used when the regulator device is voted. Fix it.

Fixes: 65efe5404d15 ("regulator: rpmh-regulator: Add RPMH regulator support for Glymur")
Signed-off-by: Fenglin Wu <fenglin.wu@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20260506-fix_pmh0101_ldo16_index-v1-1-cdc8708b01f4@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
9 days agoASoC: cs35l56: Fixes for driver cleanup
Mark Brown [Wed, 6 May 2026 12:22:53 +0000 (21:22 +0900)] 
ASoC: cs35l56: Fixes for driver cleanup

Richard Fitzgerald <rf@opensource.cirrus.com> says:

Two patches to fix cleanup during driver remove() and the error path
of probe().

The main purpose is to fix cleanup of the workqueue.

9 days agoASoC: cs35l56: Destroy workqueue in probe error path
Richard Fitzgerald [Tue, 5 May 2026 16:11:24 +0000 (17:11 +0100)] 
ASoC: cs35l56: Destroy workqueue in probe error path

The error path in cs35l56_common_probe() should call destroy_workqueue()
on the workqueue that was created by cs35l56_dsp_init().

Fixes: e49611252900 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20260505161124.3621000-3-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
9 days agoASoC: cs35l56: Don't use devres to unregister component
Richard Fitzgerald [Tue, 5 May 2026 16:11:23 +0000 (17:11 +0100)] 
ASoC: cs35l56: Don't use devres to unregister component

Manually call snd_soc_unregister_component() from cs35l56_remove()
instead of using devres cleanup. This ensures that the component is
destroyed before cs35l56_remove() starts cleanup of anything the
component code could be using.

Devres cleanup happens after the driver remove() callback, so if
snd_soc_register_component() is used, it will not be destroyed until
after cs35l56_remove() has returned. But there is some cleanup that
must be done in cs35l56_remove(), or wrapped in a custom devres
cleanup handler to ensure correct ordering. The simplest option is
to call snd_soc_unregister_component() at the start of cs35l56_remove().

Fixes: e49611252900 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Closes: https://sashiko.dev/#/patchset/20260501103002.2843735-1-rf%40opensource.cirrus.com
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20260505161124.3621000-2-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
9 days agoarm64/fpsimd: ptrace: zero target's fpsimd_state, not the tracer's
Breno Leitao [Tue, 5 May 2026 16:02:13 +0000 (09:02 -0700)] 
arm64/fpsimd: ptrace: zero target's fpsimd_state, not the tracer's

sve_set_common() is the backend for PTRACE_SETREGSET(NT_ARM_SVE) and
PTRACE_SETREGSET(NT_ARM_SSVE). Every write in the function operates on
the tracee (target) - except a single memset that uses current instead,
zeroing the tracer's saved V0-V31 / FPSR / FPCR shadow on every ptrace
SETREGSET call.

The memset is meant to give the tracee a defined zero register image
before the user-supplied payload is copied in (for partial writes,
header-only writes, and FPSIMD<->SVE format switches). Aiming it at
current both denies the tracee that clean slate and silently corrupts
the tracer.

The corruption of the tracer's saved FPSIMD state is not always
observable. Where the tracer's state is live on a CPU, this may be
reused without loading the corrupted state from memory, and will
eventually be written back over the corrupted state. Where the tracer's
state is saved in SVE_PT_REGS_SVE format, only the FPSR and FPCR are
clobbered, and the effective copy of the vectors is in the task's
sve_state.

Reproducible on an arm64 kernel with SVE: a single-threaded tracer that
loads a known pattern into V0-V31, issues PTRACE_SETREGSET(NT_ARM_SVE)
on a child, and reads V0-V31 back observes them all zeroed within tens
of thousands of iterations when a sibling thread keeps stealing the
FPSIMD CPU binding.

Fixes: 316283f276eb ("arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE")
Cc: <stable@vger.kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
9 days agoio_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
Maoyi Xie [Mon, 4 May 2026 15:37:55 +0000 (23:37 +0800)] 
io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER

io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
timespec from the caller via ext_arg->ts. It arms an ABS mode
hrtimer in __io_cqring_wait_schedule(). The conversion path in
io_uring/wait.c parses ext_arg->ts inline rather than going
through io_parse_user_time(). It therefore does not pick up the
time namespace conversion added by the previous patch.

Apply timens_ktime_to_host() to the parsed time on the
IORING_ENTER_ABS_TIMER branch. This mirrors the IORING_TIMEOUT_ABS
fix in io_parse_user_time(). Use ctx->clockid as the clock id.
ctx->clockid is set either at ring creation or via
IORING_REGISTER_CLOCK.

timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces. It is also a no-op for callers in the initial time
namespace. The fast path is unchanged.

Reproducer: in unshare --user --time, with a -10s monotonic
offset, call io_uring_enter with min_complete=1,
IORING_ENTER_ABS_TIMER, and ts = now + 1s. The call returns
-ETIME after <1ms instead of after the expected ~1s.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg
Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 days agoio_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
Maoyi Xie [Mon, 4 May 2026 15:37:54 +0000 (23:37 +0800)] 
io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS

io_uring's IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT accept a
timespec from the caller via io_parse_user_time(). With
IORING_TIMEOUT_ABS, the timestamp is an absolute deadline on the
selected clock. The clock is CLOCK_MONOTONIC by default.
CLOCK_BOOTTIME and CLOCK_REALTIME are also selectable.

A submitter inside a CLONE_NEWTIME time namespace observes
CLOCK_MONOTONIC and CLOCK_BOOTTIME shifted by the namespace's
offsets relative to the host. Every other ABS timer interface in
the kernel converts the caller's absolute time to host view via
timens_ktime_to_host() before arming an hrtimer:

  kernel/time/posix-timers.c    -- timer_settime(TIMER_ABSTIME)
  kernel/time/posix-stubs.c     -- clock_nanosleep(TIMER_ABSTIME)
  kernel/time/alarmtimer.c      -- alarm_timer_nsleep(TIMER_ABSTIME)
  fs/timerfd.c                  -- timerfd_settime(TFD_TIMER_ABSTIME)

io_parse_user_time() does not. As a result, an absolute timeout
submitted from within a time namespace is interpreted in host
view. That is generally a different point in time. It may already
be in the past, causing the timer to fire immediately, or far in
the future, causing the timer not to fire when expected.

Reproducer: in unshare --user --time, with a -10s monotonic
offset, submit IORING_OP_TIMEOUT with IORING_TIMEOUT_ABS and
deadline = now + 1s. The CQE is delivered after <1ms instead of
the expected ~1s.

Apply timens_ktime_to_host() to the parsed time when
IORING_TIMEOUT_ABS is set. Split the existing clock id resolver
in io_timeout_get_clock() into a flags only helper
io_flags_to_clock(), so io_parse_user_time() can resolve the
clock without a struct io_timeout_data.

timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces, e.g. CLOCK_REALTIME. It is also a no-op for callers
in the initial time namespace. The fast path is unchanged.

SQPOLL is also covered. The SQPOLL kernel thread is created via
create_io_thread() with CLONE_THREAD and no CLONE_NEW* flag.
copy_namespaces() therefore shares the submitter's nsproxy by
reference. Inside the SQPOLL kthread, current->nsproxy->time_ns
is the submitter's time_ns. timens_ktime_to_host() resolves
correctly.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260504153755.1293932-2-maoyi.xie@ntu.edu.sg
Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 days agoublk: validate physical_bs_shift, io_min_shift and io_opt_shift
Ming Lei [Wed, 6 May 2026 08:22:38 +0000 (16:22 +0800)] 
ublk: validate physical_bs_shift, io_min_shift and io_opt_shift

ublk_validate_params() checks logical_bs_shift is within
[9, PAGE_SHIFT] but has no upper bound for physical_bs_shift,
io_min_shift, or io_opt_shift. A malicious userspace can set any
of these to a large value (e.g., 44), causing undefined behavior
from `1 << shift` in ublk_ctrl_start_dev() since the result is
stored in 32-bit unsigned int.

Cap all three at ilog2(SZ_256M) (28). 256M is big enough to cover
all practical block sizes, and originates from the maximum physical
block size possible in NVMe (lba_size * (1 + npwg), where npwg is
16-bit).

Also zero out ub->params with memset() when copy_from_user() fails
or ublk_validate_params() returns error, so that no stale or partial
params survive for a subsequent START_DEV to consume.

Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260506082238.22363-1-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 days agowifi: nl80211: re-check wiphy netns in nl80211_prepare_wdev_dump() continuation
Maoyi Xie [Wed, 6 May 2026 06:48:54 +0000 (14:48 +0800)] 
wifi: nl80211: re-check wiphy netns in nl80211_prepare_wdev_dump() continuation

NL80211_CMD_GET_SCAN is implemented as a multi-call dumpit. The first
invocation of nl80211_prepare_wdev_dump() validates the requested wdev
against the caller's netns via __cfg80211_wdev_from_attrs(). Subsequent
invocations look up the same wiphy by its global index and do not check
that the wiphy is still in the caller's netns.

Add the same filter to the continuation path. If the wiphy's netns no
longer matches the caller's, return -ENODEV and the netlink dump
machinery terminates the walk cleanly.

Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260506064854.2207105-3-maoyixie.tju@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 days agowifi: nl80211: require CAP_NET_ADMIN over the target netns in SET_WIPHY_NETNS
Maoyi Xie [Wed, 6 May 2026 06:48:53 +0000 (14:48 +0800)] 
wifi: nl80211: require CAP_NET_ADMIN over the target netns in SET_WIPHY_NETNS

NL80211_CMD_SET_WIPHY_NETNS dispatches with GENL_UNS_ADMIN_PERM, which
verifies that the caller has CAP_NET_ADMIN for the source netns. It
doesn't verify that the caller has CAP_NET_ADMIN over the target netns
selected by NL80211_ATTR_NETNS_FD or NL80211_ATTR_PID.

This diverges from the convention enforced in
net/core/rtnetlink.c::rtnl_get_net_ns_capable():

    /* For now, the caller is required to have CAP_NET_ADMIN in
     * the user namespace owning the target net ns.
     */
    if (!sk_ns_capable(sk, net->user_ns, CAP_NET_ADMIN))
        return ERR_PTR(-EACCES);

A user with CAP_NET_ADMIN in their own user namespace can therefore
push a wiphy into an arbitrary netns (including init_net) over which
they have no privilege.

Mirror the rtnetlink convention by requiring CAP_NET_ADMIN in the
target netns before calling cfg80211_switch_netns().

Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260506064854.2207105-2-maoyixie.tju@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 days agowifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage
Johannes Berg [Tue, 5 May 2026 11:38:37 +0000 (13:38 +0200)] 
wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage

This is documented as a u8 and has a policy of NLA_U8, but uses
nla_get_u32() which means it's completely broken on big-endian.
Fix it to use nla_get_u8().

Fixes: 9bb7e0f24e7e ("cfg80211: add peer measurement with FTM initiator API")
Link: https://patch.msgid.link/20260505113837.260159-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 days agowifi: mac80211: remove station if connection prep fails
Johannes Berg [Tue, 5 May 2026 13:15:34 +0000 (15:15 +0200)] 
wifi: mac80211: remove station if connection prep fails

If connection preparation fails for MLO connections, then the
interface is completely reset to non-MLD. In this case, we must
not keep the station since it's related to the link of the vif
being removed. Delete an existing station. Any "new_sta" is
already being removed, so that doesn't need changes.

This fixes a use-after-free/double-free in debugfs if that's
enabled, because a vif going from MLD (and to MLD, but that's
not relevant here) recreates its entire debugfs.

Cc: stable@vger.kernel.org
Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260505151533.c4e52deb06ad.Iafe56cec7de8512626169496b134bce3a6c17010@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 days agoALSA: sparc/dbri: add missing fallthrough
Rosen Penev [Wed, 6 May 2026 03:18:54 +0000 (20:18 -0700)] 
ALSA: sparc/dbri: add missing fallthrough

Fixes compiler error with probably newer compilers:

sound/sparc/dbri.c:595:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
  595 |         case 1:
      |         ^
sound/sparc/dbri.c:595:2: note: insert 'break;' to avoid fall-through
  595 |         case 1:
      |         ^
      |         break;

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260506031854.780411-1-rosenp@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
9 days agoALSA: core: Serialize deferred fasync state checks
Cássio Gabriel [Wed, 6 May 2026 03:34:47 +0000 (00:34 -0300)] 
ALSA: core: Serialize deferred fasync state checks

snd_fasync_helper() updates fasync->on under snd_fasync_lock, and
snd_fasync_work_fn() now also evaluates fasync->on under the same
lock. snd_kill_fasync() still tests the flag before taking the lock,
leaving an unsynchronized read against FASYNC enable/disable updates.

Move the enabled-state check into the locked section.

Also clear fasync->on under snd_fasync_lock in snd_fasync_free()
before unlinking the pending entry. Together with the locked sender-side
check, this publishes teardown before flushing the deferred work and
prevents a racing sender from requeueing the entry after free has
started.

Fixes: ef34a0ae7a26 ("ALSA: core: Add async signal helpers")
Fixes: 8146cd333d23 ("ALSA: core: Fix potential data race at fasync handling")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260506-alsa-core-fasync-on-lock-v1-1-ea48c77d6ca4@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
9 days agoALSA: hda/realtek: Add mute LED fixup for HP Pavilion 15-cs1xxx
Rodrigo Faria [Tue, 5 May 2026 18:55:18 +0000 (19:55 +0100)] 
ALSA: hda/realtek: Add mute LED fixup for HP Pavilion 15-cs1xxx

Add a new fixup for the mute LED on the HP Pavilion 15-cs1xxx series
using the VREF on NID 0x1b.

The BIOS on these models (tested up to F.32) incorrectly reports
the mute LED on NID 0x18 via DMI OEM strings, which lacks VREF
capabilities. This fixup overrides the LED pin to the correct
NID 0x1b.

Signed-off-by: Rodrigo Faria <rodrigofilipefaria@gmail.com>
Link: https://patch.msgid.link/20260505185518.23625-1-rodrigofilipefaria@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
9 days agoALSA: seq: Fix UMP group 16 filtering
Cássio Gabriel [Wed, 6 May 2026 03:15:48 +0000 (00:15 -0300)] 
ALSA: seq: Fix UMP group 16 filtering

The sequencer UAPI defines group_filter as an unsigned int bitmap.
Bit 0 filters groupless messages and bits 1-16 filter UMP groups 1-16.

The internal snd_seq_client storage is only unsigned short, so bit 16
is truncated when userspace sets the filter. The same truncation affects
the automatic UMP client filter used to avoid delivery to inactive
groups, so events for group 16 cannot be filtered.

Store the internal bitmap as unsigned int and keep both userspace-provided
and automatically generated values limited to the defined UAPI bits.

Fixes: d2b706077792 ("ALSA: seq: Add UMP group filter")
Cc: stable@vger.kernel.org
Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com>
Link: https://patch.msgid.link/20260506-alsa-seq-ump-group16-filter-v1-1-b75160bf6993@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
9 days agotimers/migration: Fix another hotplug activation race
Frederic Weisbecker [Thu, 23 Apr 2026 16:53:49 +0000 (18:53 +0200)] 
timers/migration: Fix another hotplug activation race

The hotplug control CPU is assumed to be active in the hierarchy but
that doesn't imply that the root is active. If the current CPU is not
the one that activated the current hierarchy, and the CPU performing
this duty is still halfway through the tree, the root may still be
observed inactive. And this can break the activation of a new root as in
the following scenario:

1) Initially, the whole system has 64 CPUs and only CPU 63 is awake.

                   [GRP1:0]
                    active
                  /    |    \
                 /     |     \
         [GRP0:0]    [...]    [GRP0:7]
           idle      idle      active
         /   |   \               |
     CPU 0  CPU 1  ...         CPU 63
     idle   idle               active

2) CPU 63 goes idle _but_ due to a #VMEXIT it hasn't yet reached the
   [GRP1:0]->parent dereference (that would be NULL and stop the walk)
   in __walk_groups_from().

                   [GRP1:0]
                     idle
                  /    |    \
                 /     |     \
         [GRP0:0]    [...]    [GRP0:7]
           idle      idle       idle
         /   |   \                |
     CPU 0  CPU 1  ...         CPU 63
     idle   idle                idle

3) CPU 1 wakes up, activates GRP0:0 but didn't yet manage to propagate
   up to GRP1:0 due to yet another #VMEXIT.

                   [GRP1:0]
                     idle
                  /    |    \
                 /     |     \
         [GRP0:0]    [...]    [GRP0:7]
         active      idle       idle
         /   |   \                |
     CPU 0  CPU 1  ...         CPU 63
     idle  active               idle

3) CPU 0 wakes up and doesn't need to walk above GRP0:0 as it's CPU 1
   role.

                   [GRP1:0]
                     idle
                  /    |    \
                 /     |     \
         [GRP0:0]    [...]    [GRP0:7]
         active      idle       idle
         /   |   \                |
     CPU 0  CPU 1  ...         CPU 63
    active  active              idle

4) CPU 0 boots CPU 64. It creates a new root for it.

                             [GRP2:0]
                               idle
                           /          \
                          /            \
                   [GRP1:0]           [GRP1:1]
                   idle                 idle
                  /    |    \                \
                 /     |     \                \
         [GRP0:0]    [...]    [GRP0:7]      [GRP0:8]
         active      idle       idle          idle
         /   |   \                |            |
     CPU 0  CPU 1  ...         CPU 63        CPU 64
    active  active              idle         offline

5) CPU 0 activates the new root, but note that GRP1:0 is still idle,
   waiting for CPU 1 to resume from #VMEXIT and activate it.

                             [GRP2:0]
                              active
                           /          \
                          /            \
                   [GRP1:0]           [GRP1:1]
                   idle                 idle
                  /    |    \                \
                 /     |     \                \
         [GRP0:0]    [...]    [GRP0:7]      [GRP0:8]
         active      idle       idle          idle
         /   |   \                |            |
     CPU 0  CPU 1  ...         CPU 63        CPU 64
    active  active              idle         offline

6) CPU 63 resumes after #VMEXIT and sees the new GRP1:0 parent.
   Therefore it propagates the stale inactive state of GRP1:0 up to
   GRP2:0.

                             [GRP2:0]
                              idle
                           /          \
                          /            \
                   [GRP1:0]           [GRP1:1]
                   idle                 idle
                  /    |    \                \
                 /     |     \                \
         [GRP0:0]    [...]    [GRP0:7]      [GRP0:8]
         active      idle       idle          idle
         /   |   \                |            |
     CPU 0  CPU 1  ...         CPU 63        CPU 64
    active  active              idle         offline

7) CPU 1 resumes after #VMEXIT and finally activates GRP1:0. But it
   doesn't observe its parent link because no ordering enforced that.
   Therefore GRP2:0 is spuriously left idle.

                             [GRP2:0]
                              idle
                           /          \
                          /            \
                   [GRP1:0]           [GRP1:1]
                   active                 idle
                  /    |    \                \
                 /     |     \                \
         [GRP0:0]    [...]    [GRP0:7]      [GRP0:8]
         active      idle       idle          idle
         /   |   \                |            |
     CPU 0  CPU 1  ...         CPU 63        CPU 64
    active  active              idle         offline

Such races are highly theoretical and the problem would solve itself
once the old root ever becomes idle again. But it still leaves a taste
of discomfort.

Fix it with enforcing a fully ordered atomic read of the old root state
before propagating the activate state up to the new root. It has a two
directions ordering effect:

* Acquire + release of the latest old root state: If the hotplug control
  CPU is not the one that woke up the old root, make sure to acquire its
  active state and propagate it upwards through the ordered chain of
  activation (the acquire pairs with the cmpxchg() in tmigr_active_up()
  and subsequent releases will pair with atomic_read_acquire() and
  smp_mb__after_atomic() in tmigr_inactive_up()).

* Release: If the hotplug control CPU is not the one that must wake up
  the old root, but the CPU covering that is lagging behind its duty,
  publish the links from the old root to the new parents. This way the
  lagging CPU will propagate the active state itself.

Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260423165354.95152-2-frederic@kernel.org
9 days agoMerge tag 'loongarch-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 6 May 2026 02:44:46 +0000 (19:44 -0700)] 
Merge tag 'loongarch-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
 "Fix some build and runtime issues after 32BIT Kconfig option enabled,
  improve the platform-specific PCI controller compatibility, drop
  custom __arch_vdso_hres_capable(), and fix a lot of KVM bugs"

* tag 'loongarch-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Move unconditional delay into timer clear scenery
  LoongArch: KVM: Fix HW timer interrupt lost when inject interrupt by software
  LoongArch: KVM: Move AVEC interrupt injection into switch loop
  LoongArch: KVM: Use kvm_set_pte() in kvm_flush_pte()
  LoongArch: KVM: Fix missing EMULATE_FAIL in kvm_emu_mmio_read()
  LoongArch: KVM: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS
  LoongArch: KVM: Fix "unreliable stack" for kvm_exc_entry
  LoongArch: KVM: Compile switch.S directly into the kernel
  LoongArch: vDSO: Drop custom __arch_vdso_hres_capable()
  LoongArch: Fix potential ADE in loongson_gpu_fixup_dma_hang()
  LoongArch: Use per-root-bridge PCIH flag to skip mem resource fixup
  LoongArch: Fix SYM_SIGFUNC_START definition for 32BIT
  LoongArch: Specify -m32/-m64 explicitly for 32BIT/64BIT
  LoongArch: Make CONFIG_64BIT as the default option

9 days agoMerge branch 'xsk-fix-bugs-around-xsk-skb-allocation'
Jakub Kicinski [Wed, 6 May 2026 02:27:54 +0000 (19:27 -0700)] 
Merge branch 'xsk-fix-bugs-around-xsk-skb-allocation'

Jason Xing says:

====================
xsk: fix bugs around xsk skb allocation

There are rare issues around xsk_build_skb(). Some of them
were founded by Sashiko[1][2].

[1]: https://lore.kernel.org/all/20260415082654.21026-1-kerneljasonxing@gmail.com/
[2]: https://lore.kernel.org/all/20260418045644.28612-1-kerneljasonxing@gmail.com/
====================

Link: https://patch.msgid.link/20260502200722.53960-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoxsk: fix u64 descriptor address truncation on 32-bit architectures
Jason Xing [Sat, 2 May 2026 20:07:22 +0000 (23:07 +0300)] 
xsk: fix u64 descriptor address truncation on 32-bit architectures

In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
uintptr_t cast:

    skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);

On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
mode the chunk offset is encoded in bits 48-63 of the descriptor
address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
lost entirely. The completion queue then returns a truncated address to
userspace, making buffer recycling impossible.

Fix this by handling the 32-bit case directly in
xsk_skb_destructor_set_addr(): when !CONFIG_64BIT, allocate an
xsk_addrs struct (the same path already used for multi-descriptor
SKBs) to store the full u64 address. The existing tagged-pointer logic
in xsk_skb_destructor_is_addr() stays unchanged: slab pointers returned
from kmem_cache_zalloc() are always word-aligned and therefore have
bit 0 clear, which correctly identifies them as a struct pointer
rather than an inline tagged address on every architecture.

Factor the shared kmem_cache_zalloc + destructor_arg assignment into
__xsk_addrs_alloc() and add a wrapper xsk_addrs_alloc() that handles
the inline-to-list upgrade (is_addr check + get_addr + num_descs = 1).
The three former open-coded kmem_cache_zalloc call sites now reduce to
a single call each.

Propagate the -ENOMEM from xsk_skb_destructor_set_addr() through
xsk_skb_init_misc() so the caller can clean up the skb via kfree_skb()
before skb->destructor is installed.

The overhead is one extra kmem_cache_zalloc per first descriptor on
32-bit only; 64-bit builds are completely unchanged.

Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-9-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoxsk: fix xsk_addrs slab leak on multi-buffer error path
Jason Xing [Sat, 2 May 2026 20:07:21 +0000 (23:07 +0300)] 
xsk: fix xsk_addrs slab leak on multi-buffer error path

When xsk_build_skb() / xsk_build_skb_zerocopy() sees the first
continuation descriptor, it promotes destructor_arg from an inlined
address to a freshly allocated xsk_addrs (num_descs = 1). The counter
is bumped to >= 2 only at the very end of a successful build (by calling
xsk_inc_num_desc()).

If the build fails in between (e.g. alloc_page() returns NULL with
-EAGAIN, or the MAX_SKB_FRAGS overflow hits), we jump to free_err, skip
calling xsk_inc_num_desc() to increment num_descs and leave the half-built
skb attached to xs->skb for the app to retry. The skb now has
1) destructor_arg = a real xsk_addrs pointer,
2) num_descs = 1

If the app never retries and just close()s the socket, xsk_release()
calls xsk_drop_skb() -> xsk_consume_skb(), which decides whether to
free xsk_addrs by testing num_descs > 1:

    if (unlikely(num_descs > 1))
        kmem_cache_free(xsk_tx_generic_cache, destructor_arg);

Because num_descs is exactly 1 the branch is skipped and the
xsk_addrs object is leaked to the xsk_tx_generic_cache slab.

Fix it by directly testing if destructor_arg is still addr. Or else it
is modified and used to store the newly allocated memory from
xsk_tx_generic_cache regardless of increment of num_desc, which we
need to handle.

Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-8-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoxsk: avoid skb leak in XDP_TX_METADATA case
Jason Xing [Sat, 2 May 2026 20:07:20 +0000 (23:07 +0300)] 
xsk: avoid skb leak in XDP_TX_METADATA case

Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means no frag and xs->skb is
   NULL) and users enable metadata feature.
2. xsk_skb_metadata() returns a error code.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
4. there is no chance to free this skb anymore.

Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-7-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoxsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()
Jason Xing [Sat, 2 May 2026 20:07:19 +0000 (23:07 +0300)] 
xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()

Once xsk_skb_init_misc() has been called on an skb, its destructor is
set to xsk_destruct_skb(), which submits the descriptor address(es) to
the completion queue and advances the CQ producer. If such an skb is
subsequently freed via kfree_skb() along an error path - before the
skb has ever been handed to the driver - the destructor still runs and
submits a bogus, half-initialized address to the CQ.

Postpone the init phase when we believe the allocation of first frag is
successfully completed. Before this init, skb can be safely freed by
kfree_skb().

Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/
Fixes: c30d084960cf ("xsk: avoid overwriting skb fields for multi-buffer traffic")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-6-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoxsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
Jason Xing [Sat, 2 May 2026 20:07:18 +0000 (23:07 +0300)] 
xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path

When xsk_build_skb() processes multi-buffer packets in copy mode, the
first descriptor stores data into the skb linear area without adding
any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb
to accumulate subsequent descriptors.

If a continuation descriptor fails (e.g. alloc_page returns NULL with
-EAGAIN), we jump to free_err where the condition:

  if (skb && !skb_shinfo(skb)->nr_frags)
      kfree_skb(skb);

evaluates to true because nr_frags is still 0 (the first descriptor
used the linear area, not frags). This frees the skb while xs->skb
still points to it, creating a dangling pointer. On the next transmit
attempt or socket close, xs->skb is dereferenced, causing a
use-after-free or double-free.

Fix by using a !xs->skb check to handle first frag situation, ensuring
we only free skbs that were freshly allocated in this call
(xs->skb is NULL) and never free an in-progress multi-buffer skb that
the caller still references.

Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-5-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoxsk: handle NULL dereference of the skb without frags issue
Jason Xing [Sat, 2 May 2026 20:07:17 +0000 (23:07 +0300)] 
xsk: handle NULL dereference of the skb without frags issue

When a first descriptor (xs->skb == NULL) triggers -EOVERFLOW in
xsk_build_skb_zerocopy() (e.g., MAX_SKB_FRAGS exceeded), the
free_err -EOVERFLOW handler unconditionally dereferences xs->skb
via xsk_inc_num_desc(xs->skb) and xsk_drop_skb(xs->skb), causing
a NULL pointer dereference.

Fix this by guarding the existing xsk_inc_num_desc()/xsk_drop_skb()
calls with an xs->skb check (for the continuation case), and add
an else branch for the first-descriptor case that manually cancels
the one reserved CQ slot and increments invalid_descs by one to
account for the single invalid descriptor.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-4-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoxsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
Jason Xing [Sat, 2 May 2026 20:07:16 +0000 (23:07 +0300)] 
xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS

Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means xs->skb is NULL) and
   hit the limit MAX_SKB_FRAGS.
2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
   is why bug can be triggered.
4. there is no chance to free this skb anymore.

Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
call xsk_drop_skb(xs->skb) to do the right thing.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20260502200722.53960-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoxsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices
Jason Xing [Sat, 2 May 2026 20:07:15 +0000 (23:07 +0300)] 
xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices

skb_checksum_help() is a common helper that writes the folded
16-bit checksum back via skb->data + csum_start + csum_offset,
i.e. it relies on the skb's linear head and fails (with WARN_ONCE
and -EINVAL) when skb_headlen() is 0.

AF_XDP generic xmit takes two very different paths depending on the
netdev. Drivers that advertise IFF_TX_SKB_NO_LINEAR (e.g. virtio_net)
skip the "copy payload into a linear head" step on purpose as a
performance optimisation: xsk_build_skb_zerocopy() only attaches UMEM
pages as frags and never calls skb_put(), so skb_headlen() stays 0
for the whole skb. For these skbs there is simply no linear area for
skb_checksum_help() to write the csum into - the sw-csum fallback is
structurally inapplicable.

The patch tries to catch this and reject the combination with error at
setup time. Rejecting at bind() converts this silent per-packet failure
into a synchronous, actionable -EOPNOTSUPP at setup time. HW csum and
launch_time metadata on IFF_TX_SKB_NO_LINEAR drivers are unaffected
because they do not call skb_checksum_help().

Without the patch, every descriptor carrying 'XDP_TX_METADATA |
XDP_TXMD_FLAGS_CHECKSUM' produces:
1) a WARN_ONCE "offset (N) >= skb_headlen() (0)" from skb_checksum_help(),
2) sendmsg() returning -EINVAL without consuming the descriptor
   (invalid_descs is not incremented),
3) a wedged TX ring: __xsk_generic_xmit() does not advance the
    consumer on non-EOVERFLOW errors, so the next sendmsg() re-reads
    the same descriptor and re-hits the same WARN until the socket
    is closed.

Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/#t
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Link: https://patch.msgid.link/20260502200722.53960-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agopowerpc/pasemi: Drop redundant res assignment
Krzysztof Kozlowski [Tue, 17 Mar 2026 13:08:25 +0000 (14:08 +0100)] 
powerpc/pasemi: Drop redundant res assignment

Return value of pas_add_bridge() is not used, so code can be simplified
to fix W=1 clang warnings:

  arch/powerpc/platforms/pasemi/pci.c:275:6: error: variable 'res' set but not used [-Werror,-Wunused-but-set-variable]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260317130823.240279-4-krzysztof.kozlowski@oss.qualcomm.com
9 days agopowerpc/ps3: Drop redundant result assignment
Krzysztof Kozlowski [Tue, 17 Mar 2026 13:08:24 +0000 (14:08 +0100)] 
powerpc/ps3: Drop redundant result assignment

Return value of ps3_start_probe_thread() is not used, so code can be
simplified to fix W=1 clang warnings:

  arch/powerpc/platforms/ps3/device-init.c:953:6: error: variable 'result' set but not used [-Werror,-Wunused-but-set-variable]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260317130823.240279-3-krzysztof.kozlowski@oss.qualcomm.com
9 days agopowerpc/vdso: Drop -DCC_USING_PATCHABLE_FUNCTION_ENTRY from 32-bit flags with clang
Nathan Chancellor [Thu, 12 Mar 2026 00:39:56 +0000 (17:39 -0700)] 
powerpc/vdso: Drop -DCC_USING_PATCHABLE_FUNCTION_ENTRY from 32-bit flags with clang

After commit 73cdf24e81e4 ("powerpc64: make clang cross-build
friendly"), building 64-bit little endian + CONFIG_COMPAT=y with clang
results in many warnings along the lines of:

  $ cat arch/powerpc/configs/compat.config
  CONFIG_COMPAT=y

  $ make -skj"$(nproc)" ARCH=powerpc LLVM=1 ppc64le_defconfig compat.config arch/powerpc/kernel/vdso/
  ...
  In file included from <built-in>:4:
  In file included from lib/vdso/gettimeofday.c:6:
  In file included from include/vdso/datapage.h:15:
  In file included from include/vdso/cache.h:5:
  arch/powerpc/include/asm/cache.h:77:8: warning: unknown attribute 'patchable_function_entry' ignored [-Wunknown-attributes]
     77 | static inline u32 l1_icache_bytes(void)
        |        ^~~~~~
  include/linux/compiler_types.h:235:58: note: expanded from macro 'inline'
    235 | #define inline inline __gnu_inline __inline_maybe_unused notrace
        |                                                          ^~~~~~~
  include/linux/compiler_types.h:215:34: note: expanded from macro 'notrace'
    215 | #define notrace                 __attribute__((patchable_function_entry(0, 0)))
        |                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ...

arch/powerpc/Makefile adds -DCC_USING_PATCHABLE_FUNCTION_ENTRY to
KBUILD_CPPFLAGS, which is inherited by the 32-bit vDSO. However, the
32-bit little endian target does not support
'-fpatchable-function-entry', resulting in the warnings above.

Remove -DCC_USING_PATCHABLE_FUNCTION_ENTRY from the 32-bit vDSO flags
when building with clang to avoid the warnings.

Fixes: 73cdf24e81e4 ("powerpc64: make clang cross-build friendly")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260311-ppc-vdso-drop-cc-using-pfe-define-clang-v1-1-66c790e22650@kernel.org
9 days agoplatform/chrome: cros_ec_typec: Init mutex in Thunderbolt registration
Tzung-Bi Shih [Tue, 5 May 2026 05:34:03 +0000 (05:34 +0000)] 
platform/chrome: cros_ec_typec: Init mutex in Thunderbolt registration

cros_typec_register_thunderbolt() missed initializing the `adata->lock`
mutex.  This leads to a NULL dereference when the mutex is later
acquired (e.g. in cros_typec_altmode_work()).

Initialize the mutex in cros_typec_register_thunderbolt() to fix the
issue.

Cc: stable@vger.kernel.org
Fixes: 3b00be26b16a ("platform/chrome: cros_ec_typec: Thunderbolt support")
Reviewed-by: Benson Leung <bleung@chromium.org>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Link: https://lore.kernel.org/r/20260505053403.3335740-1-tzungbi@kernel.org
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
9 days agoMerge branch 'net-mlx5-fixes-for-socket-direct'
Jakub Kicinski [Wed, 6 May 2026 02:13:12 +0000 (19:13 -0700)] 
Merge branch 'net-mlx5-fixes-for-socket-direct'

Tariq Toukan says:

====================
net/mlx5: Fixes for Socket-Direct

This series fixes several race conditions and bugs in the mlx5
Socket-Direct (SD) single netdev flow.

Patch 1 serializes mlx5_sd_init()/mlx5_sd_cleanup() with
mlx5_devcom_comp_lock() and tracks the SD group state on the primary
device, preventing concurrent or duplicate bring-up/tear-down.

Patch 2 fixes the debugfs "multi-pf" directory being stored on the
calling device's sd struct instead of the primary's, which caused
memory leaks and recreation errors when cleanup ran from a different PF.

Patch 3 fixes a race where a secondary PF could access the primary's
auxiliary device after it had been unbound, by holding the primary's
device lock while operating on its auxiliary device.

Patch 4 fixes missing cleanup on ETH probe errors. The analogous gap on
the resume path requires introducing sd_suspend/resume APIs that only
destroy FW resources and is left for a follow-up series.
====================

Link: https://patch.msgid.link/20260504180206.268568-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet/mlx5e: SD, Fix race condition in secondary device probe/remove
Shay Drory [Mon, 4 May 2026 18:02:06 +0000 (21:02 +0300)] 
net/mlx5e: SD, Fix race condition in secondary device probe/remove

When utilizing Socket-Direct single netdev functionality the driver
resolves the actual auxiliary device using mlx5_sd_get_adev(). However,
the current implementation returns the primary ETH auxiliary device
without holding the device lock, leading to a potential race condition
where the ETH device could be unbound or removed concurrently during
probe, suspend, resume, or remove operations.[1]

Fix this by introducing mlx5_sd_put_adev() and updating
mlx5_sd_get_adev() so that secondaries devices would get a ref and
acquire the device lock of the returned auxiliary device. After the lock
is acquired, a second devcom check is needed[2].
In addition, update The callers to pair the get operation with the new
put operation, ensuring the lock is held while the auxiliary device is
being operated on and released afterwards.

The "primary" designation is determined once in sd_register(). It's set
before devcom is marked ready, and it never changes after that.
In Addition, The primary path never locks a secondary: When the primary
device invoke mlx5_sd_get_adev(), it sees dev == primary and returns.
no additional lock is taken.
Therefore lock ordering is always: secondary_lock -> primary_lock. The
reverse never happens, so ABBA deadlock is impossible.

[1]
for example:
BUG: kernel NULL pointer dereference, address: 0000000000000370
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP
CPU: 4 UID: 0 PID: 3945 Comm: bash Not tainted 6.19.0-rc3+ #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5e_dcbnl_dscp_app+0x23/0x100 [mlx5_core]
Call Trace:
 <TASK>
 mlx5e_remove+0x82/0x12a [mlx5_core]
 device_release_driver_internal+0x194/0x1f0
 bus_remove_device+0xc6/0x140
 device_del+0x159/0x3c0
 ? devl_param_driverinit_value_get+0x29/0x80
 mlx5_rescan_drivers_locked+0x92/0x160 [mlx5_core]
 mlx5_unregister_device+0x34/0x50 [mlx5_core]
 mlx5_uninit_one+0x43/0xb0 [mlx5_core]
 remove_one+0x4e/0xc0 [mlx5_core]
 pci_device_remove+0x39/0xa0
 device_release_driver_internal+0x194/0x1f0
 unbind_store+0x99/0xa0
 kernfs_fop_write_iter+0x12e/0x1e0
 vfs_write+0x215/0x3d0
 ksys_write+0x5f/0xd0
 do_syscall_64+0x55/0xe90
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

[2]
    CPU0 (primary)                     CPU1 (secondary)
==========================================================================
mlx5e_remove() (device_lock held)
                                     mlx5e_remove() (2nd device_lock held)
                                      mlx5_sd_get_adev()
                                       mlx5_devcom_comp_is_ready() => true
                                       device_lock(primary)
 mlx5_sd_get_adev() ==> ret adev
 _mlx5e_remove()
 mlx5_sd_cleanup()
 // mlx5e_remove finished
 // releasing device_lock
                                       //need another check here...
                                       mlx5_devcom_comp_is_ready() => false

Fixes: 381978d28317 ("net/mlx5e: Create single netdev per SD group")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504180206.268568-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet/mlx5e: SD, Fix missing cleanup on probe error
Shay Drory [Mon, 4 May 2026 18:02:05 +0000 (21:02 +0300)] 
net/mlx5e: SD, Fix missing cleanup on probe error

When _mlx5e_probe() fails, the preceding successful mlx5_sd_init() is
not undone. Auxiliary bus probe failure skips binding, so mlx5e_remove()
is never called for that adev and the matching mlx5_sd_cleanup() never
runs - leaking the per-dev SD struct.

Call mlx5_sd_cleanup() on the probe error path to balance
mlx5_sd_init().

A similar gap exists on the resume path: mlx5_sd_init() and
mlx5_sd_cleanup() are currently bundled with both probe/remove and
suspend/resume, even though only the FW alias state actually needs to
follow the suspend/resume lifecycle - the sd struct allocation and
devcom membership are software state that should track the full bound
lifetime. As a result, a failed resume can leave a still-bound device
with sd == NULL, which mlx5_sd_get_adev() can't distinguish from a
non-SD device. Fixing this requires sd_suspend/resume APIs which will
only destroy FW resources and is left for a follow-up series.

Fixes: 381978d28317 ("net/mlx5e: Create single netdev per SD group")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504180206.268568-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet/mlx5: SD, Keep multi-pf debugfs entries on primary
Shay Drory [Mon, 4 May 2026 18:02:04 +0000 (21:02 +0300)] 
net/mlx5: SD, Keep multi-pf debugfs entries on primary

mlx5_sd_init() creates the "multi-pf" debugfs directory under the
primary device debugfs root, but stored the dentry in the calling
device's sd struct. When sd_cleanup() run on a different PF,
this leads to using the wrong sd->dfs for removing entries, which
results in memory leak and an error in when re-creating the SD.[1]

Fix it by explicitly storing the debugfs dentry in the primary
device sd struct and use it for all per-group files.

[1]
debugfs: 'multi-pf' already exists in '0000:08:00.1'

Fixes: 4375130bf527 ("net/mlx5: SD, Add debugfs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504180206.268568-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet/mlx5: SD: Serialize init/cleanup
Shay Drory [Mon, 4 May 2026 18:02:03 +0000 (21:02 +0300)] 
net/mlx5: SD: Serialize init/cleanup

mlx5_sd_init() / mlx5_sd_cleanup() may run from multiple PFs in the same
Socket-Direct group. This can cause the SD bring-up/tear-down sequence
to be executed more than once or interleaved across PFs.

Protect SD init/cleanup with mlx5_devcom_comp_lock() and track the SD
group state on the primary device. Skip init if the primary is already
UP, and skip cleanup unless the primary is UP.

The state check on cleanup is needed because sd_register() drops the
devcom comp lock between marking the comp ready and assigning
primary_dev on each peer. A concurrent cleanup that acquires the lock
in this window would observe devcom_is_ready==true while primary_dev
is still NULL (causing mlx5_sd_get_primary() to return NULL) or while
the FW alias setup performed by mlx5_sd_init()'s body has not yet run
(causing sd_cmd_unset_primary() to dereference a NULL tx_ft). Gate the
cleanup body on primary_sd->state == MLX5_SD_STATE_UP, which is set
only at the very end of mlx5_sd_init() under the same comp lock - so
observing UP guarantees primary_dev, secondaries[], tx_ft, and dfs are
all populated. Also bail explicitly if mlx5_sd_get_primary() returns
NULL, in case state is checked on a peer whose primary_dev hasn't been
assigned yet.

In addition, move mlx5_devcom_comp_set_ready(false) from sd_unregister()
into the cleanup's locked section, including the !primary and
state != UP early-exit paths, so the device cannot unregister and free
its struct mlx5_sd while devcom is still marked ready. A concurrent
init acquiring the devcom lock will now observe devcom is no longer
ready and bail out immediately.

Fixes: 381978d28317 ("net/mlx5e: Create single netdev per SD group")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504180206.268568-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoarch/powerpc: Drop CONFIG_FIRMWARE_EDID from defconfig files
Thomas Zimmermann [Wed, 1 Apr 2026 08:30:03 +0000 (10:30 +0200)] 
arch/powerpc: Drop CONFIG_FIRMWARE_EDID from defconfig files

CONFIG_FIRMWARE_EDID=y depends on X86 or EFI_GENERIC_STUB. Neither is
true here, so drop the lines from the defconfig files.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260401083023.214426-1-tzimmermann@suse.de
9 days agoMerge branch 'net-mlx5e-psp-fixes'
Jakub Kicinski [Wed, 6 May 2026 02:09:07 +0000 (19:09 -0700)] 
Merge branch 'net-mlx5e-psp-fixes'

Tariq Toukan says:

====================
net/mlx5e: PSP fixes

This patchset provides bug fixes from Cosmin to the mlx5e PSP feature.
====================

Link: https://patch.msgid.link/20260504181100.269334-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet/mlx5e: psp: Hook PSP dev reg/unreg to profile enable/disable
Cosmin Ratiu [Mon, 4 May 2026 18:11:00 +0000 (21:11 +0300)] 
net/mlx5e: psp: Hook PSP dev reg/unreg to profile enable/disable

devlink reload while PSP connections are active does:

mlx5_unload_one_devl_locked() -> mlx5_detach_device()
-> _mlx5e_suspend()
  -> mlx5e_detach_netdev()
    -> profile->cleanup_rx
    -> profile->cleanup_tx
  -> mlx5e_destroy_mdev_resources() -> mlx5_core_dealloc_pd() fails:
...
mlx5_core 0000:08:00.0: mlx5_cmd_out_err:821:(pid 19722):
DEALLOC_PD(0x801) op_mod(0x0) failed, status bad resource state(0x9),
syndrome (0xef0c8a), err(-22)
...

The reason for failure is the existence of TX keys, which are removed by
the PSP dev unregistration happening in:
profile->cleanup() -> mlx5e_psp_unregister() -> mlx5e_psp_cleanup()
  -> psp_dev_unregister()
...but this isn't invoked in the devlink reload flow, only when changing
the NIC profile (e.g. when transitioning to switchdev mode) or on dev
teardown.

Move PSP device registration into mlx5e_nic_enable(), and unregistration
into the corresponding mlx5e_nic_disable(). These functions are called
during netdev attach/detach after RX & TX are set up.
This ensures that the keys will be gone by the time the PD is destroyed.

Fixes: 89ee2d92f66c ("net/mlx5e: Support PSP offload functionality")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504181100.269334-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet/mlx5e: psp: Expose only a fully initialized priv->psp
Cosmin Ratiu [Mon, 4 May 2026 18:10:59 +0000 (21:10 +0300)] 
net/mlx5e: psp: Expose only a fully initialized priv->psp

Currently, during PSP init, priv->psp is initialized to an incompletely
built psp struct. Additionally, on fs init failure priv->psp is reset to
NULL.

Change this so that only a fully initialized priv->psp is set, which
makes the code easier to reason about in failure scenarios.

Fixes: af2196f49480 ("net/mlx5e: Implement PSP operations .assoc_add and .assoc_del")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504181100.269334-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agonet/mlx5e: psp: Fix invalid access on PSP dev registration fail
Cosmin Ratiu [Mon, 4 May 2026 18:10:58 +0000 (21:10 +0300)] 
net/mlx5e: psp: Fix invalid access on PSP dev registration fail

priv->psp->psp is initialized with the PSP device as returned by
psp_dev_create(). This could also return an error, in which case a
future psp_dev_unregister() will result in unpleasantness.

Avoid that by using a local variable and only saving the PSP device when
registration succeeds.

In case psp_dev_create() fails, priv->psp and steering structs are left
in place, but they will be inert. The unchecked access of priv->psp in
mlx5e_psp_offload_handle_rx_skb() won't happen because without a PSP
device, there can be no SAs added and therefore no packets will be
successfully decrypted and be handed off to the SW handler.

Fixes: 89ee2d92f66c ("net/mlx5e: Support PSP offload functionality")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260504181100.269334-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agopowerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events
Shivani Nittor [Tue, 21 Apr 2026 15:06:28 +0000 (20:36 +0530)] 
powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events

The core-book3s PMU sampling code validates the SIER TYPE field
when PERF_SAMPLE_DATA_SRC is requested. The SIER TYPE field
indicates the instruction type and is only valid for
random sampling (marked events). To handle cases observed where
SIER TYPE could be zero even for marked events,validation was
added to drop such samples and increment event->lost_samples.

However, this validation was applied to all samples,
including continuous sampling. In continuous sampling mode,
the PMU does not set the SIER TYPE field, so it remains zero.
As a result, valid continuous samples were incorrectly
treated as invalid and dropped. Fixed this by gating the
SIER TYPE validation with mark_event, so the check runs only
for marked (random) events. Continuous samples now skip this
check and are recorded normally in the final data recording path.

Fixes: 2ffb26afa642 ("arch/powerpc/perf: Check the instruction type before creating sample with perf_mem_data_src")
Signed-off-by: Shivani Nittor <shivani@linux.ibm.com>
Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.ibm.com>
[Maddy: Fixed reviewed-by tag]
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260421150628.96500-1-shivani@linux.ibm.com
9 days agopowerpc/8xx: Fix interrupt mask in cpm1_gpiochip_add16()
Christophe Leroy (CS GROUP) [Tue, 21 Apr 2026 06:26:08 +0000 (08:26 +0200)] 
powerpc/8xx: Fix interrupt mask in cpm1_gpiochip_add16()

Allthough fsl,cpm1-gpio-irq-mask always contains a 16 bits value,
it is a standard u32 OF property as documented in
Documentation/devicetree/bindings/soc/fsl/cpm_qe/gpio.txt

The driver erroneously uses of_property_read_u16() leading to a
mask which is always 0.

Fix it by using of_property_read_u32() instead.

Fixes: 726bd223105c ("powerpc/8xx: Adding support of IRQ in MPC8xx GPIO")
Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/bb0b6d6c4543238c38d5d29a776d0674a8c0c180.1776752750.git.chleroy@kernel.org
9 days agonet: wwan: t7xx: validate port_count against message length in t7xx_port_enum_msg_handler
Pavitra Jha [Fri, 1 May 2026 11:07:12 +0000 (07:07 -0400)] 
net: wwan: t7xx: validate port_count against message length in t7xx_port_enum_msg_handler

t7xx_port_enum_msg_handler() uses the modem-supplied port_count field as
a loop bound over port_msg->data[] without checking that the message buffer
contains sufficient data. A modem sending port_count=65535 in a 12-byte
buffer triggers a slab-out-of-bounds read of up to 262140 bytes.

Add a sizeof(*port_msg) check before accessing the port message header
fields to guard against undersized messages.

Add a struct_size() check after extracting port_count and before the loop.

In t7xx_parse_host_rt_data(), guard the rt_feature header read with a
remaining-buffer check before accessing data_len, validate feat_data_len
against the actual remaining buffer to prevent OOB reads and signed
integer overflow on offset.

Pass msg_len from both call sites: skb->len at the DPMAIF path after
skb_pull(), and the validated feat_data_len at the handshake path.

Fixes: da45d2566a1d ("net: wwan: t7xx: Add control port")
Cc: stable@vger.kernel.org
Signed-off-by: Pavitra Jha <jhapavitra98@gmail.com>
Link: https://patch.msgid.link/20260501110713.145563-1-jhapavitra98@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agopowerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec
Sourabh Jain [Tue, 7 Apr 2026 12:43:45 +0000 (18:13 +0530)] 
powerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec

The kexec sequence invokes enter_vmx_ops() via copy_page() with the MMU
disabled. In this context, code must not rely on normal virtual address
translations or trigger page faults.

With KASAN enabled, functions get instrumented and may access shadow
memory using regular address translation. When executed with the MMU
off, this can lead to page faults (bad_page_fault) from which the
kernel cannot recover in the kexec path, resulting in a hang.

The kexec path sets preempt_count to HARDIRQ_OFFSET before entering
the MMU-off copy sequence.

current_thread_info()->preempt_count = HARDIRQ_OFFSET
  kexec_sequence(..., copy_with_mmu_off = 1)
    -> kexec_copy_flush(image)
         copy_segments()
           -> copy_page(dest, addr)
         bl enter_vmx_ops()
                   if (in_interrupt())
                     return 0
         beq .Lnonvmx_copy

Since kexec sets preempt_count to HARDIRQ_OFFSET, in_interrupt()
evaluates to true and enter_vmx_ops() returns early.

As in_interrupt() (and preempt_count()) are always inlined, mark
enter_vmx_ops() with __no_sanitize_address to avoid KASAN
instrumentation and shadow memory access with MMU disabled, helping
kexec boot fine with KASAN enabled.

Reported-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Reviewed-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260407124349.1698552-2-sourabhjain@linux.ibm.com
9 days agopowerpc/kdump: fix KASAN sanitization flag for core_$(BITS).o
Sourabh Jain [Tue, 7 Apr 2026 12:43:44 +0000 (18:13 +0530)] 
powerpc/kdump: fix KASAN sanitization flag for core_$(BITS).o

KASAN instrumentation is intended to be disabled for the kexec core
code, but the existing Makefile entry misses the object suffix. As a
result, the flag is not applied correctly to core_$(BITS).o.

So when KASAN is enabled, kexec_copy_flush and copy_segments in
kexec/core_64.c are instrumented, which can result in accesses to
shadow memory via normal address translation paths. Since these run
with the MMU disabled, such accesses may trigger page faults
(bad_page_fault) that cannot be handled in the kdump path, ultimately
causing a hang and preventing the kdump kernel from booting. The same
is true for kexec as well, since the same functions are used there.

Update the entry to include the “.o” suffix so that KASAN
instrumentation is properly disabled for this object file.

Fixes: 2ab2d5794f14 ("powerpc/kasan: Disable address sanitization in kexec paths")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/1dee8891-8bcc-46b4-93f3-fc3a774abd5b@linux.ibm.com/
Cc: stable@vger.kernel.org
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260407124349.1698552-1-sourabhjain@linux.ibm.com
9 days agopseries/papr-hvpipe: Fix style and checkpatch issues in enable_hvpipe_IRQ()
Ritesh Harjani (IBM) [Fri, 1 May 2026 04:11:48 +0000 (09:41 +0530)] 
pseries/papr-hvpipe: Fix style and checkpatch issues in enable_hvpipe_IRQ()

While at it let's also fix the similar style issue in
enable_hvpipe_IRQ() function. This also fixes a minor checkpatch warning
which I got due to an extra space before " ==".

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/1174f60d0ae128e773dbefd11dd8d46d69e7f50e.1777606826.git.ritesh.list@gmail.com
9 days agopseries/papr-hvpipe: Refactor and simplify hvpipe_rtas_recv_msg()
Ritesh Harjani (IBM) [Fri, 1 May 2026 04:11:47 +0000 (09:41 +0530)] 
pseries/papr-hvpipe: Refactor and simplify hvpipe_rtas_recv_msg()

Simplify hvpipe_rtas_recv_msg() by removing three levels of nesting...
if (!ret)
    if (buf)
if (size < bytes_written)
... this refactoring of the function bails out to "out:" label first, in case
of any error. This simplifies the init flow.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/bbe7ddf8b8e25c9be8fc5e2c4aea9e5fca128bf4.1777606826.git.ritesh.list@gmail.com
9 days agopseries/papr-hvpipe: Kill task_struct pointer from struct hvpipe_source_info
Ritesh Harjani (IBM) [Fri, 1 May 2026 04:11:46 +0000 (09:41 +0530)] 
pseries/papr-hvpipe: Kill task_struct pointer from struct hvpipe_source_info

We don't really use task_struct pointer for anything meaningful. So just
kill it for now, and we can bring back later if we need this for any
future debug purposes.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/895e061e45cdc95db36fa7f27aa1922b81eed867.1777606826.git.ritesh.list@gmail.com
9 days agopseries/papr-hvpipe: Simplify spin unlock usage in papr_hvpipe_handle_release()
Ritesh Harjani (IBM) [Fri, 1 May 2026 04:11:45 +0000 (09:41 +0530)] 
pseries/papr-hvpipe: Simplify spin unlock usage in papr_hvpipe_handle_release()

Once the src_info is removed from the global list, no one can access it.
This simplies the usage of spin_unlock_irqrestore() in
papr_hvpipe_handle_release()

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/4a980331557af3d10aada8576aaa16cddc691c65.1777606826.git.ritesh.list@gmail.com
9 days agopseries/papr-hvpipe: Fix the usage of copy_to_user()
Ritesh Harjani (IBM) [Fri, 1 May 2026 04:11:44 +0000 (09:41 +0530)] 
pseries/papr-hvpipe: Fix the usage of copy_to_user()

copy_to_user() return bytes_not_copied to the user buffer. If there was
an error writing bytes into the user buffer, i.e. if copy_to_user
returns a non-zero value, then we should simply return -EFAULT from the
->read() call.

Otherwise, in the non-patched version, we may end up mixing
"bytes_not_copied + bytes_copied (HVPIPE_HDR_LEN)" as the return value
to the user in ->read() call

Also let's make sure we clear the hvpipe_status flag, if we have
consumed the hvpipe msg by making the rtas call. ret = -EFAULT means
copy_to_user has failed but that still means that the msg was read from
the hvpipe, hence for both cases, success & -EFAULT, we should clear the
HVPIPE_MSG_AVAILABLE flag in hvpipe_status.

Cc: stable@vger.kernel.org
Fixes: cebdb522fd3edd1 ("powerpc/pseries: Receive payload with ibm,receive-hvpipe-msg RTAS")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/8fda3212a1ad48879c174e92f67472d9b9f1c3b7.1777606826.git.ritesh.list@gmail.com
9 days agopseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init()
Ritesh Harjani (IBM) [Fri, 1 May 2026 04:11:43 +0000 (09:41 +0530)] 
pseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init()

Remove such 3 levels of nesting patterns to check success return values
from function calls.

ret = enable_hvpipe_IRQ()
    if (!ret)
    ret = set_hvpipe_sys_param(1)
        if (!ret)
    ret = misc_register()

Instead just bail out to "out*:" labels, in case of any error. This
simplifies the init flow.

While at it let's also fix the following error handling logic:
We have already enabled interrupt sources and enabled hvpipe to received
interrupts, if misc_register() fails, we will destroy the workqueue, but
the HMC might send us a msg via hvpipe which will call, queue work on
the workqueue which might be destroyed.

So instead, let's reverse the order of enabling set_hvpipe_sys_param(1)
and in case of an error let's remove the misc dev by calling
misc_deregister().

Cc: stable@vger.kernel.org
Fixes: 39a08a4f94980 ("powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/f2141eafb80e7780395e03aa9a22e8a37be80513.1777606826.git.ritesh.list@gmail.com
9 days agopseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle()
Ritesh Harjani (IBM) [Fri, 1 May 2026 04:11:42 +0000 (09:41 +0530)] 
pseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle()

commit 6d3789d347a7 ("papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()"),
changed the create handle to FD_PREPARE(), but it caused kernel
null-ptr-deref because after call to retain_and_null_ptr(src_info),
src_info is re-used for adding it to the global list.

Getting the following kernel panic in papr_hvpipe_dev_create_handle()
when trying to add src_info to the list.
 Kernel attempted to write user page (0) - exploit attempt? (uid: 0)
 BUG: Kernel NULL pointer dereference on write at 0x00000000
 Faulting instruction address: 0xc0000000001b44a0
 Oops: Kernel access of bad area, sig: 11 [#1]
 ...
 Call Trace:
 papr_hvpipe_dev_ioctl+0x1f4/0x48c (unreliable)
 sys_ioctl+0x528/0x1064
 system_call_exception+0x128/0x360
 system_call_vectored_common+0x15c/0x2ec

Now, the error handling with FD_PREPARE's file cleanup and __free(kfree) auto
cleanup is getting too convoluted. This is mainly because we need to
ensure only 1 user get the srcID handle. To simplify this, we allocate
prepare the src_info in the beginning and add it to the global list
under a spinlock after checking that no duplicates exist.

This simplify the error handling where if the FD_ADD fails, we can
simply remove the src_info from the list and consume any pending msg in
hvpipe to be cleared, after src_info became visible in the global list.

Cc: stable@vger.kernel.org
Fixes: 6d3789d347a7 ("papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()")
Reported-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/31ad94bc89d44156ee700c5bd006cb47a748e3cb.1777606826.git.ritesh.list@gmail.com
9 days agopseries/papr-hvpipe: Prevent kernel stack memory leak to userspace
Ritesh Harjani (IBM) [Fri, 1 May 2026 04:11:41 +0000 (09:41 +0530)] 
pseries/papr-hvpipe: Prevent kernel stack memory leak to userspace

The hdr variable is allocated on the stack and only hdr.version and
hdr.flags are initialized explicitly. Because the struct papr_hvpipe_hdr
contains reserved padding bytes (reserved[3] and reserved2[40]), these
could leak the uninitialized bytes to userspace after copy_to_user().

This patch fixes that by initializing the whole struct to 0.

Cc: stable@vger.kernel.org
Fixes: cebdb522fd3ed ("powerpc/pseries: Receive payload with ibm,receive-hvpipe-msg RTAS")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/7bfe03b65a282c856ed8182d1871bb973c0b78f2.1777606826.git.ritesh.list@gmail.com
9 days agopseries/papr-hvpipe: Fix race with interrupt handler
Ritesh Harjani (IBM) [Fri, 1 May 2026 04:11:40 +0000 (09:41 +0530)] 
pseries/papr-hvpipe: Fix race with interrupt handler

While executing ->ioctl handler or ->release handler, if an interrupt
fires on the same cpu, then we can enter into a deadlock.

This patch fixes both these handlers to take spin_lock_irq{save|restore}
versions of the lock to prevent this deadlock.

Cc: stable@vger.kernel.org
Fixes: 814ef095f12c9 ("powerpc/pseries: Add papr-hvpipe char driver for HVPIPE interfaces")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/e4ed435c44fc191f2eb23c7907ba6f72f193e6aa.1777606826.git.ritesh.list@gmail.com
9 days agopowerpc/pseries/htmdump: Add memory configuration dump support to htmdump module
Athira Rajeev [Sat, 14 Mar 2026 13:29:53 +0000 (18:59 +0530)] 
powerpc/pseries/htmdump: Add memory configuration dump support to htmdump module

H_HTM (Hardware Trace Macro) hypervisor call has capability
to capture SystemMemory Configuration. This information
helps to understand the address mapping for the partitions
in the system.

Support dumping system memory configuration from Hardware
Trace Macro (HTM) function via debugfs interface. Under
debugfs folder "/sys/kernel/debug/powerpc/htmdump", add
file "htmsystem_mem".

The interface allows only read of this file which will present the
content of HTM buffer from the hcall. The 16th offset of HTM
buffer has value for the number of entries for array of processors.
Use this information to copy data to the debugfs file

Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260314132953.27269-1-atrajeev@linux.ibm.com
9 days agopowerpc/pseries/htmdump: Fix the offset value used in htm status dump
Athira Rajeev [Sat, 14 Mar 2026 13:24:32 +0000 (18:54 +0530)] 
powerpc/pseries/htmdump: Fix the offset value used in htm status dump

H_HTM call is invoked using three parameters specifying
the address of the buffer, size of the buffer and offset
where to read from. offset used was always zero.
"offset" is value from output buffer header that points
to next entry to dump. zero is the first entry to dump.
next entry is read from the output bufferbyte offset 0x8.
Update htmstatus_read() function to use right offset. Return
when offset points to -1

Fixes: 627cf584f4c3 ("powerpc/pseries/htmdump: Add htm status support to htmdump module")
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260314132432.25581-3-atrajeev@linux.ibm.com
9 days agopowerpc/pseries/htmdump: Fix the offset value used in processor configuration dump
Athira Rajeev [Sat, 14 Mar 2026 13:24:31 +0000 (18:54 +0530)] 
powerpc/pseries/htmdump: Fix the offset value used in processor configuration dump

H_HTM call is invoked using three parameters specifying
the address of the buffer, size of the buffer and offset
where to read from. offset used was always zero.
"offset" is value from output buffer header that points
to next entry to dump. zero is the first entry to dump.
next entry is read from the output bufferbyte offset 0x8.
Update htminfo_read() function to use right offset. Return
when offset points to -1

Fixes: dea7384e14e7 ("powerpc/pseries/htmdump: Add htm info support to htmdump module")
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260314132432.25581-2-atrajeev@linux.ibm.com
9 days agopowerpc/pseries/htmdump: Free the global buffers in htmdump module exit
Athira Rajeev [Sat, 14 Mar 2026 13:24:30 +0000 (18:54 +0530)] 
powerpc/pseries/htmdump: Free the global buffers in htmdump module exit

htmdump modules uses global memory buffers to capture
details like capabilities, status of specified HTM, read the
trace buffer. These are initialized during module init and
hence needs to be freed in module exit.

Patch adds freeing of the memory in module exit. The change
also includes minor clean up for the variable name. The
read call back for the debugfs interface file saves filp->private_data
to local variable name which is same as global variable
name for the memory buffers. Rename these local variable
names.

Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260314132432.25581-1-atrajeev@linux.ibm.com
9 days agonet/sched: sch_fq_codel: annotate data-races from fq_codel_dump_class_stats()
Eric Dumazet [Mon, 4 May 2026 16:38:42 +0000 (16:38 +0000)] 
net/sched: sch_fq_codel: annotate data-races from fq_codel_dump_class_stats()

fq_codel_dump_class_stats() acquires qdisc spinlock only when requested
to follow flow->head chain.

As we did in sch_cake recently, add the missing READ_ONCE()/WRITE_ONCE()
annotations.

Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260504163842.1162001-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoMerge tag 'nf-26-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Wed, 6 May 2026 00:55:25 +0000 (17:55 -0700)] 
Merge tag 'nf-26-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
IPVS fixes for net

The following batch contains IPVS fixes for net to address issues
from the latest net-next pull request.

Julian Anastasov made the following summary:

1-3) Fixes for the recently added resizable hash tables

4) dest from trash can be leaked if ip_vs_start_estimator() fails

5) fixed races and locking for the estimation kthreads

6) fix for wrong roundup_pow_of_two() usage in the resizable hash
   tables

7-8) v2 of the changes from Waiman Long to properly guard against
  the housekeeping_cpumask() updates:

  https://lore.kernel.org/netfilter-devel/20260331165015.2777765-1-longman@redhat.com/

  I added missing Fixes tag. The original description:

  Since commit 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred
  affinity management"), the HK_TYPE_KTHREAD housekeeping cpumask may no
  longer be correct in showing the actual CPU affinity of kthreads that
  have no predefined CPU affinity. As the ipvs networking code is still
  using HK_TYPE_KTHREAD, we need to make HK_TYPE_KTHREAD reflect the
  reality.

  This patch series makes HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
  and uses RCU to protect access to the HK_TYPE_KTHREAD housekeeping
  cpumask.

Julian plans to post a nf-next patch to limit the connections by using
"conn_max" sysctl. With Simon Horman, they agreed that this is an old
problem that we do not have a limit of connections and it is not a
stopper for this patchset.

* tag 'nf-26-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  sched/isolation: Make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
  ipvs: Guard access of HK_TYPE_KTHREAD cpumask with RCU
  ipvs: fix shift-out-of-bounds in ip_vs_rht_desired_size
  ipvs: fix races around est_mutex and est_cpulist
  ipvs: do not leak dest after get from dest trash
  ipvs: fix the spin_lock usage for RT build
  ipvs: fix races around the conn_lfactor and svc_lfactor sysctl vars
  ipvs: fixes for the new ip_vs_status info
====================

Link: https://patch.msgid.link/20260505001648.360569-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoMerge branch 'bnxt_en-bug-fixes'
Jakub Kicinski [Wed, 6 May 2026 00:36:16 +0000 (17:36 -0700)] 
Merge branch 'bnxt_en-bug-fixes'

Pavan Chebbi says:

====================
bnxt_en: Bug fixes

This patchset adds the following fixes for bnxt:

Patch #1 fixes DPC AER handling to make it more reliable

Patch #2 fixes incorrect capping bp->max_tpa based on what the FW
supports

Patch #3 fixes ignoring of VNIC configuration result when RDMA
driver is loading

Patch #4 fixes logic to make phase adjustment on the PPS OUT signal
====================

Link: https://patch.msgid.link/20260504083611.1383776-1-pavan.chebbi@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agobnxt_en: Use absolute target ns from ptp_clock_request
Pavan Chebbi [Mon, 4 May 2026 08:36:11 +0000 (14:06 +0530)] 
bnxt_en: Use absolute target ns from ptp_clock_request

There is no need to calculate the target PHC cycles required
to make phase adjustment on the PPS OUT signal. This is because
the application supplies absolute n_sec value in the future and
is already the actual desired target value.

Remove the unnecessary code.

Fixes: 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Tested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260504083611.1383776-5-pavan.chebbi@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agobnxt_en: Check return value of bnxt_hwrm_vnic_cfg
Kalesh AP [Mon, 4 May 2026 08:36:10 +0000 (14:06 +0530)] 
bnxt_en: Check return value of bnxt_hwrm_vnic_cfg

When the bnxt RDMA driver is loaded, it calls bnxt_register_dev().
As part of this, driver sends HWRM_VNIC_CFG firmware command
to configure the VNIC to operate in dual VNIC mode. Currently
the driver ignores the result of this firmware command. The RDMA
driver must know the result since it affects its functioning.

Check return value of call to bnxt_hwrm_vnic_cfg() in
bnxt_register_dev() and return failure on error.

Fixes: a588e4580a7e ("bnxt_en: Add interface to support RDMA driver.")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20260504083611.1383776-4-pavan.chebbi@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agobnxt_en: Set bp->max_tpa according to what the FW supports
Michael Chan [Mon, 4 May 2026 08:36:09 +0000 (14:06 +0530)] 
bnxt_en: Set bp->max_tpa according to what the FW supports

Fix the logic to set bp->max_tpa no higher than what the FW supports.
On P5 chips, some older FW sets max_tpa very low so we override it to
prevent performance regressions with the older FW.

Fixes: 79632e9ba386 ("bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Colin Winegarden <colin.winegarden@broadcom.com>
Reviewed-by: Rukhsana Ansari <rukhsana.ansari@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20260504083611.1383776-3-pavan.chebbi@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agobnxt_en: Delay for 5 seconds after AER DPC for all chips
Michael Chan [Mon, 4 May 2026 08:36:08 +0000 (14:06 +0530)] 
bnxt_en: Delay for 5 seconds after AER DPC for all chips

The FW on all chips is requiring a 5-second delay after Downstream
Port Containment (DPC) AER.  The previously added 900 msec delay was
not long enough in all cases because the chip's CRS (Configuration
Request Retry Status) mechanism is not always reliable.

Fixes: d5ab32e9b02d ("bnxt_en: Add delay to handle Downstream Port Containment (DPC) AER")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20260504083611.1383776-2-pavan.chebbi@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoipv6: Fix null-ptr-deref in fib6_mtu().
Kuniyuki Iwashima [Mon, 4 May 2026 06:43:13 +0000 (06:43 +0000)] 
ipv6: Fix null-ptr-deref in fib6_mtu().

syzbot reported null-ptr-deref in fib6_mtu(). [0]

When res->f6i->fib6_pmtu is 0 in fib6_mtu(), it fetches MTU from
__in6_dev_get(nh->fib_nh_dev)->cnf.mtu6.

However, __in6_dev_get() could return NULL when the device is
being unregistered.

Let's return 0 MTU if __in6_dev_get() returns NULL in fib6_mtu().

[0]:
Oops: general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7]
CPU: 0 UID: 0 PID: 7890 Comm: syz.2.502 Tainted: G             L      syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:fib6_mtu net/ipv6/route.c:1648 [inline]
RIP: 0010:rt6_insert_exception+0x9eb/0x10a0 net/ipv6/route.c:1753
Code: 3b 14 cf f7 45 85 f6 0f 85 1d 02 00 00 e8 7d 19 cf f7 48 8d bb e0 05 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 89
RSP: 0000:ffffc9000610f120 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000c001000
RDX: 00000000000000bc RSI: ffffffff8a38bc83 RDI: 00000000000005e0
RBP: ffff888052f06000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff888042d16c00
R13: ffff888042d16cc8 R14: 0000000000000001 R15: 0000000000000500
FS:  0000000000000000(0000) GS:ffff88809717d000(0063) knlGS:00000000f540db40
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 00000000f73c6d50 CR3: 000000006eff0000 CR4: 0000000000352ef0
Call Trace:
 <TASK>
 __ip6_rt_update_pmtu+0x555/0xd60 net/ipv6/route.c:2982
 ip6_update_pmtu+0x34f/0x3b0 net/ipv6/route.c:3014
 icmpv6_err+0x2a2/0x3f0 net/ipv6/icmp.c:82
 icmpv6_notify+0x35e/0x820 net/ipv6/icmp.c:1087
 icmpv6_rcv+0x10bf/0x1ae0 net/ipv6/icmp.c:1228
 ip6_protocol_deliver_rcu+0xf97/0x1500 net/ipv6/ip6_input.c:478
 ip6_input_finish+0x1e4/0x4a0 net/ipv6/ip6_input.c:529
 NF_HOOK include/linux/netfilter.h:318 [inline]
 NF_HOOK include/linux/netfilter.h:312 [inline]
 ip6_input+0x105/0x2f0 net/ipv6/ip6_input.c:540
 ip6_mc_input+0x513/0xf50 net/ipv6/ip6_input.c:630
 dst_input include/net/dst.h:480 [inline]
 ip6_rcv_finish net/ipv6/ip6_input.c:119 [inline]
 NF_HOOK include/linux/netfilter.h:318 [inline]
 NF_HOOK include/linux/netfilter.h:312 [inline]
 ipv6_rcv+0x34c/0x3d0 net/ipv6/ip6_input.c:351
 __netif_receive_skb_one_core+0x12d/0x1e0 net/core/dev.c:6202
 __netif_receive_skb+0x1f/0x120 net/core/dev.c:6315
 netif_receive_skb_internal net/core/dev.c:6401 [inline]
 netif_receive_skb+0x13b/0x7f0 net/core/dev.c:6460
 tun_rx_batched.isra.0+0x3f6/0x750 drivers/net/tun.c:1511
 tun_get_user+0x1e31/0x3c20 drivers/net/tun.c:1955
 tun_chr_write_iter+0xdc/0x200 drivers/net/tun.c:2001
 new_sync_write fs/read_write.c:595 [inline]
 vfs_write+0x6ac/0x1070 fs/read_write.c:688
 ksys_write+0x12a/0x250 fs/read_write.c:740
 do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline]
 do_int80_emulation+0x141/0x700 arch/x86/entry/syscall_32.c:172
 asm_int80_emulation+0x1a/0x20 arch/x86/include/asm/idtentry.h:621
RIP: 0023:0xf715616b
Code: 57 56 53 8b 44 24 14 f6 00 08 75 23 8b 44 24 18 8b 5c 24 1c 8b 4c 24 20 8b 54 24 24 8b 74 24 28 8b 7c 24 2c 8b 6c 24 30 cd 80 <5b> 5e 5f 5d c3 5b 5e 5f 5d e9 f7 a1 ff ff 66 90 66 90 66 90 90 53
RSP: 002b:00000000f540d44c EFLAGS: 00000246 ORIG_RAX: 0000000000000004
RAX: ffffffffffffffda RBX: 00000000000000c8 RCX: 0000000080000640
RDX: 000000000000007a RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 </TASK>

Fixes: dcd1f572954f ("net/ipv6: Remove fib6_idev")
Reported-by: syzbot+01f005f9c6387ca6f6dd@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69f83f22.170a0220.13cc2.0004.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260504064316.3820775-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agoipv6: default IPV6_SIT to m
Alyssa Ross [Sun, 3 May 2026 19:25:16 +0000 (21:25 +0200)] 
ipv6: default IPV6_SIT to m

This basically defaulted to m until recently, since IPV6 defaulted to
m.  Since IPV6 was changed to a boolean with a default of y, IPV6_SIT
started defaulting to built-in as well.  This results in a surprise
sit0 device by default for defconfig (and defconfig-derived config)
users at boot.  For me, this broke an (admittedly non-robust) script.
Preserve the behaviour of most configs by avoiding building this
module, that's probably overall seldom used compared to IPv6 as a
whole, into the kernel.

Fixes: 309b905deee59 ("ipv6: convert CONFIG_IPV6 to built-in only and clean up Kconfigs")
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260503192515.290900-2-hi@alyssa.is
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 days agodrm/xe/guc: Exclude indirect ring state page from ADS engine state size
Satyanarayana K V P [Mon, 4 May 2026 09:49:26 +0000 (09:49 +0000)] 
drm/xe/guc: Exclude indirect ring state page from ADS engine state size

The engine state size reported to GuC via ADS should only include the
engine state portion and should not include the indirect ring state page
that comes after it in the context image. The GuC uses this size to
overwrite the engine state in the LRC on watchdog resets and we don't
want it to overwrite the indirect ring state as well.

Fixes: d6219e1cd5e3 ("drm/xe: Add Indirect Ring State support")
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patch.msgid.link/20260504094924.3760713-4-satyanarayana.k.v.p@intel.com
(cherry picked from commit 3ec5f003f6c377beda8bd5438941f5a7795e1848)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
9 days agodrm/xe/pf: Fix MMIO access using PF view instead of VF view during migration
Shuicheng Lin [Wed, 29 Apr 2026 19:22:59 +0000 (19:22 +0000)] 
drm/xe/pf: Fix MMIO access using PF view instead of VF view during migration

pf_migration_mmio_save() and pf_migration_mmio_restore() initialize a
local VF-specific MMIO view via xe_mmio_init_vf_view() but then pass
&gt->mmio (the PF base) to all xe_mmio_read32()/xe_mmio_write32()
calls instead of the local &mmio. This causes the PF own SW flag
registers to be saved/restored rather than the target VF registers,
silently corrupting migration state.

Use the VF MMIO view for all register accesses, matching the correct
pattern used in pf_clear_vf_scratch_regs().

Fixes: b7c1b990f719 ("drm/xe/pf: Handle MMIO migration data as part of PF control")
Cc: Michał Winiarski <michal.winiarski@intel.com>
Assisted-by: Claude:claude-opus-4.6
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20260429192259.4009211-1-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit 7d9c39cfb31ff389490ca1308767c2807a9829a6)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
9 days agodrm/xe/pf: Fix EAGAIN sign in pf_migration_consume()
Shuicheng Lin [Tue, 28 Apr 2026 20:14:48 +0000 (20:14 +0000)] 
drm/xe/pf: Fix EAGAIN sign in pf_migration_consume()

PTR_ERR() returns a negative value, so comparing against the positive
EAGAIN is always true for ERR_PTR(-EAGAIN), causing pf_migration_consume()
to bail out instead of continuing to the remaining GTs. On multi-GT
platforms this can skip GTs that already have data ready.

Compare against -EAGAIN to match the intent (and the following line
that correctly uses -EAGAIN). While at it, gate PTR_ERR() with
IS_ERR().

v2: add IS_ERR() guard before PTR_ERR(). (Gustavo)

Fixes: 67df4a5cbc58 ("drm/xe/pf: Add data structures and handlers for migration rings")
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20260428201448.3999428-1-shuicheng.lin@intel.com
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit 9d770e72e1edb54beacfce5f402edb51632811e3)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
9 days agodrm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status()
Gustavo Sousa [Thu, 16 Apr 2026 18:17:19 +0000 (15:17 -0300)] 
drm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status()

When media GT is disabled via configfs, there is no allocation for
media_gt, which is kept as NULL.  In such scenario,
intel_hdcp_gsc_check_status() results in a kernel pagefault error due to
&gt->uc.gsc being evaluated as an invalid memory address.

Fix that by introducing a NULL check on media_gt and bailing out early
if so.

While at it, also drop the NULL check for gsc, since it can't be NULL if
media_gt is not NULL.

v2:
  - Get address for gsc only after checking that gt is not NULL.
    (Shuicheng)
  - Drop the NULL check for gsc. (Shuicheng)
v3:
  - Add "Fixes" and "Cc: <stable...>" tags. (Matt)

Fixes: 4af50beb4e0f ("drm/xe: Use gsc_proxy_init_done to check proxy status")
Cc: <stable@vger.kernel.org> # v6.10+
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://patch.msgid.link/20260416-check-for-null-media_gt-in-intel_hdcp_gsc_check_status-v2-1-9adb9fd3b621@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
(cherry picked from commit bfaf87e84ca3ca3f6e275f9ae56da47a8b55ffd1)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
9 days agoMerge tag 'wq-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 5 May 2026 23:09:31 +0000 (16:09 -0700)] 
Merge tag 'wq-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue fixes from Tejun Heo:

 - Fix devm_alloc_workqueue() passing a va_list as a positional arg to
   the variadic alloc_workqueue() macro, which garbled wq->name and
   skipped lockdep init on the devm path. Fold both noprof entry points
   onto a va_list helper.

   Also, annotate it using __printf(1, 0)

* tag 'wq-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Annotate alloc_workqueue_va() with __printf(1, 0)
  workqueue: fix devm_alloc_workqueue() va_list misuse

9 days agoMerge tag 'cgroup-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 5 May 2026 22:43:32 +0000 (15:43 -0700)] 
Merge tag 'cgroup-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - During v6.19, cgroup task unlink was moved from do_exit() to after the
   final task switch to satisfy a controller invariant. That left the kernel
   seeing tasks past exit_signals() longer than userspace expected, and
   several v7.0 follow-ups tried to bridge the gap by making rmdir wait for
   the kernel side. None held up.

   The latest is an A-A deadlock when rmdir is invoked by the reaper of
   zombies whose pidns teardown the rmdir itself is waiting on, which
   points at the synchronizing approach being fundamentally wrong.

   Take a different approach: drop the wait, leave rmdir's user-visible
   side returning as soon as cgroup.procs is empty, and defer the css
   percpu_ref kill that drives ->css_offline() until the cgroup is fully
   depopulated.

   Tagged for stable. Somewhat invasive but contained. The hope is that
   fixing forward sticks. If not, the fallback is to revert the entire
   chain and rework on the development branch.

   Note that this doesn't plug a pre-existing analogous race in
   cgroup_apply_control_disable() (controller disable via
   subtree_control). Not a regression. The development branch will do
   the more invasive restructuring needed for that.

 - Documentation update for cgroup-v1 charge-commit section that still
   referenced functions removed when the memcg hugetlb try-commit-cancel
   protocol was retired.

* tag 'cgroup-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  docs: cgroup-v1: Update charge-commit section
  cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated

9 days agoMerge tag 'sched_ext-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 5 May 2026 22:22:04 +0000 (15:22 -0700)] 
Merge tag 'sched_ext-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

 - Fix idle CPU selection returning prev_cpu outside the task's cpus_ptr
   when the BPF caller's allowed mask was wider. Stable backport.

 - Two opposite-direction gaps in scx_task_iter's cgroup-scoped mode
   versus the global mode:

    - Tasks past exit_signals() are filtered by the cgroup walk but kept
      by global. Sub-scheduler enable abort leaked __scx_init_task()
      state. Add a CSS_TASK_ITER_WITH_DEAD flag to cgroup's task
      iterator (scx_task_iter is its only user) and use it.

    - Tasks past sched_ext_dead() are still returned, tripping
      WARN_ON_ONCE() in callers or making them touch torn-down state.
      Mark and skip under the per-task rq lock.

* tag 'sched_ext-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: idle: Recheck prev_cpu after narrowing allowed mask
  sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked()
  cgroup, sched_ext: Include exiting tasks in cgroup iter

9 days agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Tue, 5 May 2026 21:38:31 +0000 (14:38 -0700)] 
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "All in drivers.

  The largest change is the ufs one which has to introduce a new
  function to check the power state before doing the update and the most
  widely encountered one is the obvious change to sg to not use
  GFP_ATOMIC"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: iscsi: reject invalid size Extended CDB AHS
  scsi: ufs: core: Fix bRefClkFreq write failure in HS-LSS mode
  scsi: hisi_sas: Fix sparse warnings in prep_ata_v3_hw()
  scsi: pmcraid: Fix typo in comments
  scsi: scsi_dh_alua: Increase default ALUA timeout to maximum spec value
  scsi: smartpqi: Silence a recursive lock warning
  scsi: mpt3sas: Limit NVMe request size to 2 MiB
  scsi: sg: Don't use GFP_ATOMIC in sg_start_req()
  scsi: target: configfs: Bound snprintf() return in tg_pt_gp_members_show()

9 days agoMerge tag 'fbdev-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Tue, 5 May 2026 21:25:44 +0000 (14:25 -0700)] 
Merge tag 'fbdev-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev fixes from Helge Deller:
 "Four small patches for fbdev, of which two are important: One fixes
  the bitmap font generation and the other prevents a possible
  use-after-free in udlfb:

   - Fix rotating fonts by 180 degrees (Thomas Zimmermann)

   - Drop duplicate include of linux/module.h in fb_defio (Chen Ni)

   - Add vm_ops in udlfb to prevent use-after-free (Rajat Gupta)

   - ipu-v3: clean up kernel-doc warnings (Randy Dunlap)"

* tag 'fbdev-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: udlfb: add vm_ops to dlfb_ops_mmap to prevent use-after-free
  lib/fonts: Fix bit position when rotating by 180 degrees
  fbdev: defio: Remove duplicate include of linux/module.h
  fbdev: ipu-v3: clean up kernel-doc warnings

9 days agoselinux: shrink critical section in sel_write_load()
Stephen Smalley [Thu, 30 Apr 2026 18:36:52 +0000 (14:36 -0400)] 
selinux: shrink critical section in sel_write_load()

Currently sel_write_load() takes the policy mutex earlier than
necessary. Move the taking of the mutex later. This avoids
holding it unnecessarily across the vmalloc() and copy_from_user()
of the policy data.

Cc: stable@vger.kernel.org
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
9 days agoselinux: allow multiple opens of /sys/fs/selinux/policy
Stephen Smalley [Tue, 5 May 2026 14:06:38 +0000 (10:06 -0400)] 
selinux: allow multiple opens of /sys/fs/selinux/policy

Currently there can only be a single open of /sys/fs/selinux/policy at
any time. This allows any process to block any other process from
reading the kernel policy. The original motivation seems to have been
a mix of preventing an inconsistent view of the policy size and
preventing userspace from allocating kernel memory without bound, but
this is arguably equally bad. Eliminate the policy_opened flag and
shrink the critical section that the policy mutex is held. While we
are making changes here, drop a couple of extraneous BUG_ONs.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/selinux/20100726193414.19538.64028.stgit@paris.rdu.redhat.com/
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
9 days agoselinux: prune /sys/fs/selinux/user
Stephen Smalley [Tue, 5 May 2026 12:49:50 +0000 (08:49 -0400)] 
selinux: prune /sys/fs/selinux/user

Remove the previously deprecated /sys/fs/selinux/user interface aside
from a residual stub for userspace compatibility.

Commit d7b6918e22c7 ("selinux: Deprecate /sys/fs/selinux/user") started
the deprecation process for /sys/fs/selinux/user:

    The selinuxfs "user" node allows userspace to request a list
    of security contexts that can be reached for a given SELinux
    user from a given starting context. This was used by libselinux
    when various login-style programs requested contexts for
    users, but libselinux stopped using it in 2020.
    Kernel support will be removed no sooner than Dec 2025.

A pr_warn() message has been in place since Linux v6.13, and a 5
second sleep was introduced since Linux v6.17 to help make it more
noticeable.

We are now past the stated deadline of Dec 2025, so remove the
underlying functionality and replace it with a stub that returns a
'0\0' buffer to avoid breaking userspace. This also avoids a local DoS
from logspam and an uninterruptible sleep delay.

Cc: stable@vger.kernel.org
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
9 days agoselinux: prune /sys/fs/selinux/disable
Stephen Smalley [Tue, 5 May 2026 12:49:49 +0000 (08:49 -0400)] 
selinux: prune /sys/fs/selinux/disable

Commit f22f9aaf6c3d ("selinux: remove the runtime disable
functionality") removed the underlying SELinux runtime disable
functionality but left everything else intact and started logging an
error message to warn any residual users.

Prune it to just log an error message once and to return count
(i.e. all bytes written successfully) to avoid breaking
userspace. This also fixes a local DoS from logspam.

Cc: stable@vger.kernel.org
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
9 days agoselinux: prune /sys/fs/selinux/checkreqprot
Stephen Smalley [Tue, 5 May 2026 12:49:48 +0000 (08:49 -0400)] 
selinux: prune /sys/fs/selinux/checkreqprot

commit a7e4676e8e2cb ("selinux: remove the 'checkreqprot'
functionality") removed the ability to modify the checkreqprot setting
but left everything except the updating of the checkreqprot value
intact. Aside from unnecessary processing, this could produce a local
DoS from log spam and incorrectly calls selinux_ima_measure_state() on
each write even though no state has changed. Prune it to just log an
error message once and return count (i.e. all bytes written
successfully) so that userspace never breaks.

Cc: stable@vger.kernel.org
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
10 days agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Tue, 5 May 2026 16:11:52 +0000 (09:11 -0700)] 
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:

 - Several error unwind misses on system calls in mlx5, mana, ocrdma,
   vmw_pvrdma, mlx4, and hns

 - More rxe bugs processing network packets

 - User triggerable races in mlx5 when destroying and creating the same
   same object when the FW returns the same object ID

 - Incorrect passing of an IPv6 address through netlink
   RDMA_NL_LS_OP_IP_RESOLVE

 - Add memory ordering for mlx5's lock avoidance pattenr

 - Protect mana from kernel memory overflow

 - Use safe patterns for xarray/radix_tree look up in mlx5 and hns

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (24 commits)
  RDMA/hns: Fix unlocked call to hns_roce_qp_remove()
  RDMA/hns: Fix xarray race in hns_roce_create_qp_common()
  RDMA/hns: Fix xarray race in hns_roce_create_srq()
  RDMA/mlx4: Fix mis-use of RCU in mlx4_srq_event()
  RDMA/mlx4: Fix resource leak on error in mlx4_ib_create_srq()
  RDMA/vmw_pvrdma: Fix double free on pvrdma_alloc_ucontext() error path
  RDMA/ocrdma: Don't NULL deref uctx on errors in ocrdma_copy_pd_uresp()
  RDMA/ocrdma: Clarify the mm_head searching
  RDMA/mana: Fix error unwind in mana_ib_create_qp_rss()
  RDMA/mana: Fix mana_destroy_wq_obj() cleanup in mana_ib_create_qp_rss()
  RDMA/mana: Remove user triggerable WARN_ON() in mana_ib_create_qp_rss()
  RDMA/mana: Validate rx_hash_key_len
  RDMA/mlx5: Add missing store/release for lock elision pattern
  RDMA/mlx5: Restore zero-init to mlx5_ib_modify_qp() ucmd
  RDMA/ionic: Fix typo in format string
  RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation
  RDMA/core: Fix rereg_mr use-after-free race
  IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg()
  RDMA/mlx5: Fix UAF in DCT destroy due to race with create
  RDMA/mlx5: Fix UAF in SRQ destroy due to race with create
  ...

10 days agowifi: mac80211: use safe list iteration in radar detect work
Benjamin Berg [Tue, 5 May 2026 13:15:40 +0000 (15:15 +0200)] 
wifi: mac80211: use safe list iteration in radar detect work

The call to ieee80211_dfs_cac_cancel can cause the iterated chanctx to
be freed and removed from the list. Guard against this to avoid a
slab-use-after-free error.

Cc: stable@vger.kernel.org
Fixes: bca8bc0399ac ("wifi: mac80211: handle ieee80211_radar_detected() for MLO")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20260505151539.236d63a1b736.I35dbb9e96a2d4a480be208770fdd99ba3b817b79@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
10 days agoMerge tag 'ath-current-20260505' of git://git.kernel.org/pub/scm/linux/kernel/git...
Johannes Berg [Tue, 5 May 2026 15:52:32 +0000 (17:52 +0200)] 
Merge tag 'ath-current-20260505' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath

Jeff Johnson says:
==================
ath.git update for v7.1-rc3

Fix an ath5k potential stack buffer overwrite.
Fix several issues in ath12k:
- WMI buffer leaks on error conditions
- use of uninitialized stack data when processing RSSI events
- incorrect logic for determining the peer ID in the RX path
==================

Signed-off-by: Johannes Berg <johannes.berg@intel.com>