Shay Drory [Mon, 4 May 2026 18:02:04 +0000 (21:02 +0300)]
net/mlx5: SD, Keep multi-pf debugfs entries on primary
mlx5_sd_init() creates the "multi-pf" debugfs directory under the
primary device debugfs root, but stored the dentry in the calling
device's sd struct. When sd_cleanup() run on a different PF,
this leads to using the wrong sd->dfs for removing entries, which
results in memory leak and an error in when re-creating the SD.[1]
Fix it by explicitly storing the debugfs dentry in the primary
device sd struct and use it for all per-group files.
[1]
debugfs: 'multi-pf' already exists in '0000:08:00.1'
Shay Drory [Mon, 4 May 2026 18:02:03 +0000 (21:02 +0300)]
net/mlx5: SD: Serialize init/cleanup
mlx5_sd_init() / mlx5_sd_cleanup() may run from multiple PFs in the same
Socket-Direct group. This can cause the SD bring-up/tear-down sequence
to be executed more than once or interleaved across PFs.
Protect SD init/cleanup with mlx5_devcom_comp_lock() and track the SD
group state on the primary device. Skip init if the primary is already
UP, and skip cleanup unless the primary is UP.
The state check on cleanup is needed because sd_register() drops the
devcom comp lock between marking the comp ready and assigning
primary_dev on each peer. A concurrent cleanup that acquires the lock
in this window would observe devcom_is_ready==true while primary_dev
is still NULL (causing mlx5_sd_get_primary() to return NULL) or while
the FW alias setup performed by mlx5_sd_init()'s body has not yet run
(causing sd_cmd_unset_primary() to dereference a NULL tx_ft). Gate the
cleanup body on primary_sd->state == MLX5_SD_STATE_UP, which is set
only at the very end of mlx5_sd_init() under the same comp lock - so
observing UP guarantees primary_dev, secondaries[], tx_ft, and dfs are
all populated. Also bail explicitly if mlx5_sd_get_primary() returns
NULL, in case state is checked on a peer whose primary_dev hasn't been
assigned yet.
In addition, move mlx5_devcom_comp_set_ready(false) from sd_unregister()
into the cleanup's locked section, including the !primary and
state != UP early-exit paths, so the device cannot unregister and free
its struct mlx5_sd while devcom is still marked ready. A concurrent
init acquiring the devcom lock will now observe devcom is no longer
ready and bail out immediately.
Fixes: 381978d28317 ("net/mlx5e: Create single netdev per SD group") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504180206.268568-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The reason for failure is the existence of TX keys, which are removed by
the PSP dev unregistration happening in:
profile->cleanup() -> mlx5e_psp_unregister() -> mlx5e_psp_cleanup()
-> psp_dev_unregister()
...but this isn't invoked in the devlink reload flow, only when changing
the NIC profile (e.g. when transitioning to switchdev mode) or on dev
teardown.
Move PSP device registration into mlx5e_nic_enable(), and unregistration
into the corresponding mlx5e_nic_disable(). These functions are called
during netdev attach/detach after RX & TX are set up.
This ensures that the keys will be gone by the time the PD is destroyed.
Cosmin Ratiu [Mon, 4 May 2026 18:10:58 +0000 (21:10 +0300)]
net/mlx5e: psp: Fix invalid access on PSP dev registration fail
priv->psp->psp is initialized with the PSP device as returned by
psp_dev_create(). This could also return an error, in which case a
future psp_dev_unregister() will result in unpleasantness.
Avoid that by using a local variable and only saving the PSP device when
registration succeeds.
In case psp_dev_create() fails, priv->psp and steering structs are left
in place, but they will be inert. The unchecked access of priv->psp in
mlx5e_psp_offload_handle_rx_skb() won't happen because without a PSP
device, there can be no SAs added and therefore no packets will be
successfully decrypted and be handed off to the SW handler.
powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events
The core-book3s PMU sampling code validates the SIER TYPE field
when PERF_SAMPLE_DATA_SRC is requested. The SIER TYPE field
indicates the instruction type and is only valid for
random sampling (marked events). To handle cases observed where
SIER TYPE could be zero even for marked events,validation was
added to drop such samples and increment event->lost_samples.
However, this validation was applied to all samples,
including continuous sampling. In continuous sampling mode,
the PMU does not set the SIER TYPE field, so it remains zero.
As a result, valid continuous samples were incorrectly
treated as invalid and dropped. Fixed this by gating the
SIER TYPE validation with mark_event, so the check runs only
for marked (random) events. Continuous samples now skip this
check and are recorded normally in the final data recording path.
Fixes: 2ffb26afa642 ("arch/powerpc/perf: Check the instruction type before creating sample with perf_mem_data_src") Signed-off-by: Shivani Nittor <shivani@linux.ibm.com> Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com> Reviewed-by: Athira Rajeev <atrajeev@linux.ibm.com>
[Maddy: Fixed reviewed-by tag] Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260421150628.96500-1-shivani@linux.ibm.com
powerpc/8xx: Fix interrupt mask in cpm1_gpiochip_add16()
Allthough fsl,cpm1-gpio-irq-mask always contains a 16 bits value,
it is a standard u32 OF property as documented in
Documentation/devicetree/bindings/soc/fsl/cpm_qe/gpio.txt
The driver erroneously uses of_property_read_u16() leading to a
mask which is always 0.
Pavitra Jha [Fri, 1 May 2026 11:07:12 +0000 (07:07 -0400)]
net: wwan: t7xx: validate port_count against message length in t7xx_port_enum_msg_handler
t7xx_port_enum_msg_handler() uses the modem-supplied port_count field as
a loop bound over port_msg->data[] without checking that the message buffer
contains sufficient data. A modem sending port_count=65535 in a 12-byte
buffer triggers a slab-out-of-bounds read of up to 262140 bytes.
Add a sizeof(*port_msg) check before accessing the port message header
fields to guard against undersized messages.
Add a struct_size() check after extracting port_count and before the loop.
In t7xx_parse_host_rt_data(), guard the rt_feature header read with a
remaining-buffer check before accessing data_len, validate feat_data_len
against the actual remaining buffer to prevent OOB reads and signed
integer overflow on offset.
Pass msg_len from both call sites: skb->len at the DPMAIF path after
skb_pull(), and the validated feat_data_len at the handshake path.
powerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec
The kexec sequence invokes enter_vmx_ops() via copy_page() with the MMU
disabled. In this context, code must not rely on normal virtual address
translations or trigger page faults.
With KASAN enabled, functions get instrumented and may access shadow
memory using regular address translation. When executed with the MMU
off, this can lead to page faults (bad_page_fault) from which the
kernel cannot recover in the kexec path, resulting in a hang.
The kexec path sets preempt_count to HARDIRQ_OFFSET before entering
the MMU-off copy sequence.
Since kexec sets preempt_count to HARDIRQ_OFFSET, in_interrupt()
evaluates to true and enter_vmx_ops() returns early.
As in_interrupt() (and preempt_count()) are always inlined, mark
enter_vmx_ops() with __no_sanitize_address to avoid KASAN
instrumentation and shadow memory access with MMU disabled, helping
kexec boot fine with KASAN enabled.
powerpc/kdump: fix KASAN sanitization flag for core_$(BITS).o
KASAN instrumentation is intended to be disabled for the kexec core
code, but the existing Makefile entry misses the object suffix. As a
result, the flag is not applied correctly to core_$(BITS).o.
So when KASAN is enabled, kexec_copy_flush and copy_segments in
kexec/core_64.c are instrumented, which can result in accesses to
shadow memory via normal address translation paths. Since these run
with the MMU disabled, such accesses may trigger page faults
(bad_page_fault) that cannot be handled in the kdump path, ultimately
causing a hang and preventing the kdump kernel from booting. The same
is true for kexec as well, since the same functions are used there.
Update the entry to include the “.o” suffix so that KASAN
instrumentation is properly disabled for this object file.
pseries/papr-hvpipe: Fix style and checkpatch issues in enable_hvpipe_IRQ()
While at it let's also fix the similar style issue in
enable_hvpipe_IRQ() function. This also fixes a minor checkpatch warning
which I got due to an extra space before " ==".
pseries/papr-hvpipe: Refactor and simplify hvpipe_rtas_recv_msg()
Simplify hvpipe_rtas_recv_msg() by removing three levels of nesting...
if (!ret)
if (buf)
if (size < bytes_written)
... this refactoring of the function bails out to "out:" label first, in case
of any error. This simplifies the init flow.
pseries/papr-hvpipe: Kill task_struct pointer from struct hvpipe_source_info
We don't really use task_struct pointer for anything meaningful. So just
kill it for now, and we can bring back later if we need this for any
future debug purposes.
pseries/papr-hvpipe: Simplify spin unlock usage in papr_hvpipe_handle_release()
Once the src_info is removed from the global list, no one can access it.
This simplies the usage of spin_unlock_irqrestore() in
papr_hvpipe_handle_release()
pseries/papr-hvpipe: Fix the usage of copy_to_user()
copy_to_user() return bytes_not_copied to the user buffer. If there was
an error writing bytes into the user buffer, i.e. if copy_to_user
returns a non-zero value, then we should simply return -EFAULT from the
->read() call.
Otherwise, in the non-patched version, we may end up mixing
"bytes_not_copied + bytes_copied (HVPIPE_HDR_LEN)" as the return value
to the user in ->read() call
Also let's make sure we clear the hvpipe_status flag, if we have
consumed the hvpipe msg by making the rtas call. ret = -EFAULT means
copy_to_user has failed but that still means that the msg was read from
the hvpipe, hence for both cases, success & -EFAULT, we should clear the
HVPIPE_MSG_AVAILABLE flag in hvpipe_status.
pseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init()
Remove such 3 levels of nesting patterns to check success return values
from function calls.
ret = enable_hvpipe_IRQ()
if (!ret)
ret = set_hvpipe_sys_param(1)
if (!ret)
ret = misc_register()
Instead just bail out to "out*:" labels, in case of any error. This
simplifies the init flow.
While at it let's also fix the following error handling logic:
We have already enabled interrupt sources and enabled hvpipe to received
interrupts, if misc_register() fails, we will destroy the workqueue, but
the HMC might send us a msg via hvpipe which will call, queue work on
the workqueue which might be destroyed.
So instead, let's reverse the order of enabling set_hvpipe_sys_param(1)
and in case of an error let's remove the misc dev by calling
misc_deregister().
pseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle()
commit 6d3789d347a7 ("papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()"),
changed the create handle to FD_PREPARE(), but it caused kernel
null-ptr-deref because after call to retain_and_null_ptr(src_info),
src_info is re-used for adding it to the global list.
Getting the following kernel panic in papr_hvpipe_dev_create_handle()
when trying to add src_info to the list.
Kernel attempted to write user page (0) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on write at 0x00000000
Faulting instruction address: 0xc0000000001b44a0
Oops: Kernel access of bad area, sig: 11 [#1]
...
Call Trace:
papr_hvpipe_dev_ioctl+0x1f4/0x48c (unreliable)
sys_ioctl+0x528/0x1064
system_call_exception+0x128/0x360
system_call_vectored_common+0x15c/0x2ec
Now, the error handling with FD_PREPARE's file cleanup and __free(kfree) auto
cleanup is getting too convoluted. This is mainly because we need to
ensure only 1 user get the srcID handle. To simplify this, we allocate
prepare the src_info in the beginning and add it to the global list
under a spinlock after checking that no duplicates exist.
This simplify the error handling where if the FD_ADD fails, we can
simply remove the src_info from the list and consume any pending msg in
hvpipe to be cleared, after src_info became visible in the global list.
pseries/papr-hvpipe: Prevent kernel stack memory leak to userspace
The hdr variable is allocated on the stack and only hdr.version and
hdr.flags are initialized explicitly. Because the struct papr_hvpipe_hdr
contains reserved padding bytes (reserved[3] and reserved2[40]), these
could leak the uninitialized bytes to userspace after copy_to_user().
This patch fixes that by initializing the whole struct to 0.
Athira Rajeev [Sat, 14 Mar 2026 13:29:53 +0000 (18:59 +0530)]
powerpc/pseries/htmdump: Add memory configuration dump support to htmdump module
H_HTM (Hardware Trace Macro) hypervisor call has capability
to capture SystemMemory Configuration. This information
helps to understand the address mapping for the partitions
in the system.
Support dumping system memory configuration from Hardware
Trace Macro (HTM) function via debugfs interface. Under
debugfs folder "/sys/kernel/debug/powerpc/htmdump", add
file "htmsystem_mem".
The interface allows only read of this file which will present the
content of HTM buffer from the hcall. The 16th offset of HTM
buffer has value for the number of entries for array of processors.
Use this information to copy data to the debugfs file
Athira Rajeev [Sat, 14 Mar 2026 13:24:32 +0000 (18:54 +0530)]
powerpc/pseries/htmdump: Fix the offset value used in htm status dump
H_HTM call is invoked using three parameters specifying
the address of the buffer, size of the buffer and offset
where to read from. offset used was always zero.
"offset" is value from output buffer header that points
to next entry to dump. zero is the first entry to dump.
next entry is read from the output bufferbyte offset 0x8.
Update htmstatus_read() function to use right offset. Return
when offset points to -1
Athira Rajeev [Sat, 14 Mar 2026 13:24:31 +0000 (18:54 +0530)]
powerpc/pseries/htmdump: Fix the offset value used in processor configuration dump
H_HTM call is invoked using three parameters specifying
the address of the buffer, size of the buffer and offset
where to read from. offset used was always zero.
"offset" is value from output buffer header that points
to next entry to dump. zero is the first entry to dump.
next entry is read from the output bufferbyte offset 0x8.
Update htminfo_read() function to use right offset. Return
when offset points to -1
Athira Rajeev [Sat, 14 Mar 2026 13:24:30 +0000 (18:54 +0530)]
powerpc/pseries/htmdump: Free the global buffers in htmdump module exit
htmdump modules uses global memory buffers to capture
details like capabilities, status of specified HTM, read the
trace buffer. These are initialized during module init and
hence needs to be freed in module exit.
Patch adds freeing of the memory in module exit. The change
also includes minor clean up for the variable name. The
read call back for the debugfs interface file saves filp->private_data
to local variable name which is same as global variable
name for the memory buffers. Rename these local variable
names.
Eric Dumazet [Mon, 4 May 2026 16:38:42 +0000 (16:38 +0000)]
net/sched: sch_fq_codel: annotate data-races from fq_codel_dump_class_stats()
fq_codel_dump_class_stats() acquires qdisc spinlock only when requested
to follow flow->head chain.
As we did in sch_cake recently, add the missing READ_ONCE()/WRITE_ONCE()
annotations.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260504163842.1162001-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
I added missing Fixes tag. The original description:
Since commit 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred
affinity management"), the HK_TYPE_KTHREAD housekeeping cpumask may no
longer be correct in showing the actual CPU affinity of kthreads that
have no predefined CPU affinity. As the ipvs networking code is still
using HK_TYPE_KTHREAD, we need to make HK_TYPE_KTHREAD reflect the
reality.
This patch series makes HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
and uses RCU to protect access to the HK_TYPE_KTHREAD housekeeping
cpumask.
Julian plans to post a nf-next patch to limit the connections by using
"conn_max" sysctl. With Simon Horman, they agreed that this is an old
problem that we do not have a limit of connections and it is not a
stopper for this patchset.
* tag 'nf-26-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
sched/isolation: Make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
ipvs: Guard access of HK_TYPE_KTHREAD cpumask with RCU
ipvs: fix shift-out-of-bounds in ip_vs_rht_desired_size
ipvs: fix races around est_mutex and est_cpulist
ipvs: do not leak dest after get from dest trash
ipvs: fix the spin_lock usage for RT build
ipvs: fix races around the conn_lfactor and svc_lfactor sysctl vars
ipvs: fixes for the new ip_vs_status info
====================
Pavan Chebbi [Mon, 4 May 2026 08:36:11 +0000 (14:06 +0530)]
bnxt_en: Use absolute target ns from ptp_clock_request
There is no need to calculate the target PHC cycles required
to make phase adjustment on the PPS OUT signal. This is because
the application supplies absolute n_sec value in the future and
is already the actual desired target value.
Remove the unnecessary code.
Fixes: 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Tested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260504083611.1383776-5-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kalesh AP [Mon, 4 May 2026 08:36:10 +0000 (14:06 +0530)]
bnxt_en: Check return value of bnxt_hwrm_vnic_cfg
When the bnxt RDMA driver is loaded, it calls bnxt_register_dev().
As part of this, driver sends HWRM_VNIC_CFG firmware command
to configure the VNIC to operate in dual VNIC mode. Currently
the driver ignores the result of this firmware command. The RDMA
driver must know the result since it affects its functioning.
Check return value of call to bnxt_hwrm_vnic_cfg() in
bnxt_register_dev() and return failure on error.
Fixes: a588e4580a7e ("bnxt_en: Add interface to support RDMA driver.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20260504083611.1383776-4-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 4 May 2026 08:36:09 +0000 (14:06 +0530)]
bnxt_en: Set bp->max_tpa according to what the FW supports
Fix the logic to set bp->max_tpa no higher than what the FW supports.
On P5 chips, some older FW sets max_tpa very low so we override it to
prevent performance regressions with the older FW.
Fixes: 79632e9ba386 ("bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Colin Winegarden <colin.winegarden@broadcom.com> Reviewed-by: Rukhsana Ansari <rukhsana.ansari@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20260504083611.1383776-3-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 4 May 2026 08:36:08 +0000 (14:06 +0530)]
bnxt_en: Delay for 5 seconds after AER DPC for all chips
The FW on all chips is requiring a 5-second delay after Downstream
Port Containment (DPC) AER. The previously added 900 msec delay was
not long enough in all cases because the chip's CRS (Configuration
Request Retry Status) mechanism is not always reliable.
Fixes: d5ab32e9b02d ("bnxt_en: Add delay to handle Downstream Port Containment (DPC) AER") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20260504083611.1383776-2-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alyssa Ross [Sun, 3 May 2026 19:25:16 +0000 (21:25 +0200)]
ipv6: default IPV6_SIT to m
This basically defaulted to m until recently, since IPV6 defaulted to
m. Since IPV6 was changed to a boolean with a default of y, IPV6_SIT
started defaulting to built-in as well. This results in a surprise
sit0 device by default for defconfig (and defconfig-derived config)
users at boot. For me, this broke an (admittedly non-robust) script.
Preserve the behaviour of most configs by avoiding building this
module, that's probably overall seldom used compared to IPv6 as a
whole, into the kernel.
Fixes: 309b905deee59 ("ipv6: convert CONFIG_IPV6 to built-in only and clean up Kconfigs") Signed-off-by: Alyssa Ross <hi@alyssa.is> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260503192515.290900-2-hi@alyssa.is Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drm/xe/guc: Exclude indirect ring state page from ADS engine state size
The engine state size reported to GuC via ADS should only include the
engine state portion and should not include the indirect ring state page
that comes after it in the context image. The GuC uses this size to
overwrite the engine state in the LRC on watchdog resets and we don't
want it to overwrite the indirect ring state as well.
Fixes: d6219e1cd5e3 ("drm/xe: Add Indirect Ring State support") Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260504094924.3760713-4-satyanarayana.k.v.p@intel.com
(cherry picked from commit 3ec5f003f6c377beda8bd5438941f5a7795e1848) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Shuicheng Lin [Wed, 29 Apr 2026 19:22:59 +0000 (19:22 +0000)]
drm/xe/pf: Fix MMIO access using PF view instead of VF view during migration
pf_migration_mmio_save() and pf_migration_mmio_restore() initialize a
local VF-specific MMIO view via xe_mmio_init_vf_view() but then pass
>->mmio (the PF base) to all xe_mmio_read32()/xe_mmio_write32()
calls instead of the local &mmio. This causes the PF own SW flag
registers to be saved/restored rather than the target VF registers,
silently corrupting migration state.
Use the VF MMIO view for all register accesses, matching the correct
pattern used in pf_clear_vf_scratch_regs().
Fixes: b7c1b990f719 ("drm/xe/pf: Handle MMIO migration data as part of PF control") Cc: Michał Winiarski <michal.winiarski@intel.com> Assisted-by: Claude:claude-opus-4.6 Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260429192259.4009211-1-shuicheng.lin@intel.com Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
(cherry picked from commit 7d9c39cfb31ff389490ca1308767c2807a9829a6) Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Shuicheng Lin [Tue, 28 Apr 2026 20:14:48 +0000 (20:14 +0000)]
drm/xe/pf: Fix EAGAIN sign in pf_migration_consume()
PTR_ERR() returns a negative value, so comparing against the positive
EAGAIN is always true for ERR_PTR(-EAGAIN), causing pf_migration_consume()
to bail out instead of continuing to the remaining GTs. On multi-GT
platforms this can skip GTs that already have data ready.
Compare against -EAGAIN to match the intent (and the following line
that correctly uses -EAGAIN). While at it, gate PTR_ERR() with
IS_ERR().
v2: add IS_ERR() guard before PTR_ERR(). (Gustavo)
drm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status()
When media GT is disabled via configfs, there is no allocation for
media_gt, which is kept as NULL. In such scenario,
intel_hdcp_gsc_check_status() results in a kernel pagefault error due to
>->uc.gsc being evaluated as an invalid memory address.
Fix that by introducing a NULL check on media_gt and bailing out early
if so.
While at it, also drop the NULL check for gsc, since it can't be NULL if
media_gt is not NULL.
v2:
- Get address for gsc only after checking that gt is not NULL.
(Shuicheng)
- Drop the NULL check for gsc. (Shuicheng)
v3:
- Add "Fixes" and "Cc: <stable...>" tags. (Matt)
Linus Torvalds [Tue, 5 May 2026 23:09:31 +0000 (16:09 -0700)]
Merge tag 'wq-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
- Fix devm_alloc_workqueue() passing a va_list as a positional arg to
the variadic alloc_workqueue() macro, which garbled wq->name and
skipped lockdep init on the devm path. Fold both noprof entry points
onto a va_list helper.
Also, annotate it using __printf(1, 0)
* tag 'wq-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Annotate alloc_workqueue_va() with __printf(1, 0)
workqueue: fix devm_alloc_workqueue() va_list misuse
Linus Torvalds [Tue, 5 May 2026 22:43:32 +0000 (15:43 -0700)]
Merge tag 'cgroup-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- During v6.19, cgroup task unlink was moved from do_exit() to after the
final task switch to satisfy a controller invariant. That left the kernel
seeing tasks past exit_signals() longer than userspace expected, and
several v7.0 follow-ups tried to bridge the gap by making rmdir wait for
the kernel side. None held up.
The latest is an A-A deadlock when rmdir is invoked by the reaper of
zombies whose pidns teardown the rmdir itself is waiting on, which
points at the synchronizing approach being fundamentally wrong.
Take a different approach: drop the wait, leave rmdir's user-visible
side returning as soon as cgroup.procs is empty, and defer the css
percpu_ref kill that drives ->css_offline() until the cgroup is fully
depopulated.
Tagged for stable. Somewhat invasive but contained. The hope is that
fixing forward sticks. If not, the fallback is to revert the entire
chain and rework on the development branch.
Note that this doesn't plug a pre-existing analogous race in
cgroup_apply_control_disable() (controller disable via
subtree_control). Not a regression. The development branch will do
the more invasive restructuring needed for that.
- Documentation update for cgroup-v1 charge-commit section that still
referenced functions removed when the memcg hugetlb try-commit-cancel
protocol was retired.
* tag 'cgroup-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1: Update charge-commit section
cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated
Linus Torvalds [Tue, 5 May 2026 22:22:04 +0000 (15:22 -0700)]
Merge tag 'sched_ext-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix idle CPU selection returning prev_cpu outside the task's cpus_ptr
when the BPF caller's allowed mask was wider. Stable backport.
- Two opposite-direction gaps in scx_task_iter's cgroup-scoped mode
versus the global mode:
- Tasks past exit_signals() are filtered by the cgroup walk but kept
by global. Sub-scheduler enable abort leaked __scx_init_task()
state. Add a CSS_TASK_ITER_WITH_DEAD flag to cgroup's task
iterator (scx_task_iter is its only user) and use it.
- Tasks past sched_ext_dead() are still returned, tripping
WARN_ON_ONCE() in callers or making them touch torn-down state.
Mark and skip under the per-task rq lock.
* tag 'sched_ext-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: idle: Recheck prev_cpu after narrowing allowed mask
sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked()
cgroup, sched_ext: Include exiting tasks in cgroup iter
Linus Torvalds [Tue, 5 May 2026 21:38:31 +0000 (14:38 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"All in drivers.
The largest change is the ufs one which has to introduce a new
function to check the power state before doing the update and the most
widely encountered one is the obvious change to sg to not use
GFP_ATOMIC"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: iscsi: reject invalid size Extended CDB AHS
scsi: ufs: core: Fix bRefClkFreq write failure in HS-LSS mode
scsi: hisi_sas: Fix sparse warnings in prep_ata_v3_hw()
scsi: pmcraid: Fix typo in comments
scsi: scsi_dh_alua: Increase default ALUA timeout to maximum spec value
scsi: smartpqi: Silence a recursive lock warning
scsi: mpt3sas: Limit NVMe request size to 2 MiB
scsi: sg: Don't use GFP_ATOMIC in sg_start_req()
scsi: target: configfs: Bound snprintf() return in tg_pt_gp_members_show()
Linus Torvalds [Tue, 5 May 2026 21:25:44 +0000 (14:25 -0700)]
Merge tag 'fbdev-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
"Four small patches for fbdev, of which two are important: One fixes
the bitmap font generation and the other prevents a possible
use-after-free in udlfb:
- Fix rotating fonts by 180 degrees (Thomas Zimmermann)
- Drop duplicate include of linux/module.h in fb_defio (Chen Ni)
- Add vm_ops in udlfb to prevent use-after-free (Rajat Gupta)
- ipu-v3: clean up kernel-doc warnings (Randy Dunlap)"
* tag 'fbdev-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: udlfb: add vm_ops to dlfb_ops_mmap to prevent use-after-free
lib/fonts: Fix bit position when rotating by 180 degrees
fbdev: defio: Remove duplicate include of linux/module.h
fbdev: ipu-v3: clean up kernel-doc warnings
Stephen Smalley [Thu, 30 Apr 2026 18:36:52 +0000 (14:36 -0400)]
selinux: shrink critical section in sel_write_load()
Currently sel_write_load() takes the policy mutex earlier than
necessary. Move the taking of the mutex later. This avoids
holding it unnecessarily across the vmalloc() and copy_from_user()
of the policy data.
Cc: stable@vger.kernel.org Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Stephen Smalley [Tue, 5 May 2026 14:06:38 +0000 (10:06 -0400)]
selinux: allow multiple opens of /sys/fs/selinux/policy
Currently there can only be a single open of /sys/fs/selinux/policy at
any time. This allows any process to block any other process from
reading the kernel policy. The original motivation seems to have been
a mix of preventing an inconsistent view of the policy size and
preventing userspace from allocating kernel memory without bound, but
this is arguably equally bad. Eliminate the policy_opened flag and
shrink the critical section that the policy mutex is held. While we
are making changes here, drop a couple of extraneous BUG_ONs.
Stephen Smalley [Tue, 5 May 2026 12:49:50 +0000 (08:49 -0400)]
selinux: prune /sys/fs/selinux/user
Remove the previously deprecated /sys/fs/selinux/user interface aside
from a residual stub for userspace compatibility.
Commit d7b6918e22c7 ("selinux: Deprecate /sys/fs/selinux/user") started
the deprecation process for /sys/fs/selinux/user:
The selinuxfs "user" node allows userspace to request a list
of security contexts that can be reached for a given SELinux
user from a given starting context. This was used by libselinux
when various login-style programs requested contexts for
users, but libselinux stopped using it in 2020.
Kernel support will be removed no sooner than Dec 2025.
A pr_warn() message has been in place since Linux v6.13, and a 5
second sleep was introduced since Linux v6.17 to help make it more
noticeable.
We are now past the stated deadline of Dec 2025, so remove the
underlying functionality and replace it with a stub that returns a
'0\0' buffer to avoid breaking userspace. This also avoids a local DoS
from logspam and an uninterruptible sleep delay.
Cc: stable@vger.kernel.org Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Stephen Smalley [Tue, 5 May 2026 12:49:49 +0000 (08:49 -0400)]
selinux: prune /sys/fs/selinux/disable
Commit f22f9aaf6c3d ("selinux: remove the runtime disable
functionality") removed the underlying SELinux runtime disable
functionality but left everything else intact and started logging an
error message to warn any residual users.
Prune it to just log an error message once and to return count
(i.e. all bytes written successfully) to avoid breaking
userspace. This also fixes a local DoS from logspam.
Cc: stable@vger.kernel.org Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Stephen Smalley [Tue, 5 May 2026 12:49:48 +0000 (08:49 -0400)]
selinux: prune /sys/fs/selinux/checkreqprot
commit a7e4676e8e2cb ("selinux: remove the 'checkreqprot'
functionality") removed the ability to modify the checkreqprot setting
but left everything except the updating of the checkreqprot value
intact. Aside from unnecessary processing, this could produce a local
DoS from log spam and incorrectly calls selinux_ima_measure_state() on
each write even though no state has changed. Prune it to just log an
error message once and return count (i.e. all bytes written
successfully) so that userspace never breaks.
Cc: stable@vger.kernel.org Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Linus Torvalds [Tue, 5 May 2026 16:11:52 +0000 (09:11 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
- Several error unwind misses on system calls in mlx5, mana, ocrdma,
vmw_pvrdma, mlx4, and hns
- More rxe bugs processing network packets
- User triggerable races in mlx5 when destroying and creating the same
same object when the FW returns the same object ID
- Incorrect passing of an IPv6 address through netlink
RDMA_NL_LS_OP_IP_RESOLVE
- Add memory ordering for mlx5's lock avoidance pattenr
- Protect mana from kernel memory overflow
- Use safe patterns for xarray/radix_tree look up in mlx5 and hns
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (24 commits)
RDMA/hns: Fix unlocked call to hns_roce_qp_remove()
RDMA/hns: Fix xarray race in hns_roce_create_qp_common()
RDMA/hns: Fix xarray race in hns_roce_create_srq()
RDMA/mlx4: Fix mis-use of RCU in mlx4_srq_event()
RDMA/mlx4: Fix resource leak on error in mlx4_ib_create_srq()
RDMA/vmw_pvrdma: Fix double free on pvrdma_alloc_ucontext() error path
RDMA/ocrdma: Don't NULL deref uctx on errors in ocrdma_copy_pd_uresp()
RDMA/ocrdma: Clarify the mm_head searching
RDMA/mana: Fix error unwind in mana_ib_create_qp_rss()
RDMA/mana: Fix mana_destroy_wq_obj() cleanup in mana_ib_create_qp_rss()
RDMA/mana: Remove user triggerable WARN_ON() in mana_ib_create_qp_rss()
RDMA/mana: Validate rx_hash_key_len
RDMA/mlx5: Add missing store/release for lock elision pattern
RDMA/mlx5: Restore zero-init to mlx5_ib_modify_qp() ucmd
RDMA/ionic: Fix typo in format string
RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation
RDMA/core: Fix rereg_mr use-after-free race
IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg()
RDMA/mlx5: Fix UAF in DCT destroy due to race with create
RDMA/mlx5: Fix UAF in SRQ destroy due to race with create
...
Benjamin Berg [Tue, 5 May 2026 13:15:40 +0000 (15:15 +0200)]
wifi: mac80211: use safe list iteration in radar detect work
The call to ieee80211_dfs_cac_cancel can cause the iterated chanctx to
be freed and removed from the list. Guard against this to avoid a
slab-use-after-free error.
Johannes Berg [Tue, 5 May 2026 15:52:32 +0000 (17:52 +0200)]
Merge tag 'ath-current-20260505' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath
Jeff Johnson says:
==================
ath.git update for v7.1-rc3
Fix an ath5k potential stack buffer overwrite.
Fix several issues in ath12k:
- WMI buffer leaks on error conditions
- use of uninitialized stack data when processing RSSI events
- incorrect logic for determining the peer ID in the RX path
==================
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
ffa_sched_recv_cb_update() used list_for_each_entry_safe() to search for
a matching partition and then tested the iterator against NULL. That is
not a valid end-of-list check for circular lists and can fall through
with an invalid pointer. Use a normal iterator and detect the not-found
case correctly before touching the partition state.
firmware: arm_ffa: Snapshot notifier callbacks under lock
Both notification handlers currently look up a notifier callback under
notify_lock, drop the lock, and then dereference the returned
notifier entry. A concurrent unregister can delete and free that
entry in the gap, leaving the handler to dereference stale memory.
Copy the callback pointer and callback data while notify_lock is
still held and invoke the callback only after the lock is dropped.
This keeps the existing callback execution model while removing the
use-after-free window in both the framework and non-framework
notification paths.
firmware: arm_ffa: Align RxTx buffer size before mapping
Commit 83210251fd70 ("firmware: arm_ffa: Use the correct buffer size during
RXTX_MAP") advertises PAGE_ALIGN(rxtx_bufsz) to firmware when mapping the
buffers but the driver continues to stores the minimum FF-A buffer size
in drv_info->rxtx_bufsz which is used elsewhere in the driver.
Align the size before storing it so that the allocation, validation and
FFA_RXTX_MAP all use the same buffer size.
Framework notifications carry an indirect message in the shared RX
buffer. Validate the reported offset and size before using them, reject
zero-length payloads, and ensure that any non-header payload starts at
the UUID field rather than in the middle of the message header.
Use the validated offset and size values for both kmemdup() and the UUID
parsing path so malformed firmware data cannot drive an out-of-bounds
read or an oversized allocation.
firmware: arm_ffa: Keep framework RX release under lock
The framework notification handler drops rx_lock before issuing
FFA_RX_RELEASE, leaving a window where another RX-buffer user can
start a new FF-A transaction before ownership has actually been
returned to firmware.
Move the FFA_RX_RELEASE calls so they execute while rx_lock is still
held on both the kmemdup() failure path and the normal success path.
While doing that, switch the handler to scoped_guard() to keep the
critical section explicit.
The register-based PARTITION_INFO_GET path trusted the firmware-provided
indices when copying partition descriptors into the caller buffer.
Reject inconsistent counts or index progressions so the copy loop cannot
write past the allocated array.
Sunil Khatri [Mon, 4 May 2026 12:51:17 +0000 (18:21 +0530)]
drm/amdgpu/userq: fix access to stale wptr mapping
Use drm_exec to take both locks i.e vm root bo and
wptr_obj bo to access the mapping data properly.
This fixes the security issue of unmap the wptr_obj while
a queue creation is in progress and passing other
bo at same address.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1fc6c8ab45dbee096469c08c13f6099d57a52d6c) Cc: stable@vger.kernel.org
drm/amdkfd: Check if there are kfd porcesses using adev by kfd_processes_count
During gpu hot-unplug need check if there are kfd porcesses still using the
being removed gpu before clean resources of the device. Current driver checks
if kfd_processes_table is empty. kfd processes are not terminated after
removed from kfd_processes_table immediately. They are still alive and may
access the device until kfd_process_wq work queue got ran.
Check kfd->kfd_processes_count value that is updated after kfd process got
uninitialized when its ref becomes zero.
Fixes: 6cca686dfce7 ("drm/amdkfd: kfd driver supports hot unplug/replug amdgpu devices") Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d12d05c4bc4c15585130af43e897923ff292df7b)
Philip Yang [Mon, 27 Apr 2026 13:30:23 +0000 (09:30 -0400)]
drm/amdgpu: zero-initialize GART table on allocation
GART TLB is flushed after unmapping but not after mapping. Since
amdgpu_bo_create_kernel() does not zero-initialize the buffer, when a
single PTE is written the TLB may speculatively load other uninitialized
entries from the same cacheline. Those garbage entries can appear valid,
and a subsequent write to another PTE in the same cacheline may cause the
GPU to use a stale garbage PTE from the TLB.
Fix this by calling memset_io() to zero-initialize the GART table with
gart_pte_flags immediately after allocation.
Using AMDGPU_GEM_CREATE_VRAM_CLEARED, SDMA-based clear will not work
since SDMA needs GART to be initialized to work.
Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d9af8263b82b6eaa60c5718e0c6631c5037e4b24) Cc: stable@vger.kernel.org
John B. Moore [Mon, 27 Apr 2026 21:06:28 +0000 (16:06 -0500)]
drm/amdgpu/sdma4: replace BUG_ON with WARN_ON in fence emission
sdma_v4_0_ring_emit_fence() contains two BUG_ON(addr & 0x3) assertions
that verify fence writeback addresses are dword-aligned. These
assertions can be reached from unprivileged userspace via crafted
DRM_IOCTL_AMDGPU_CS submissions, causing a fatal kernel panic in a
scheduler worker thread.
Replace both BUG_ON() calls with WARN_ON() to log the condition without
crashing the kernel. A misaligned fence address at this point indicates
a driver bug, but crashing the kernel is never the correct response when
the assertion is reachable from userspace.
The CS IOCTL path is the correct place to filter invalid submissions;
the ring emission callback is too late to do anything about it.
Fixes: 2130f89ced2c ("drm/amdgpu: add SDMA v4.0 implementation (v2)") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: John B. Moore <jbmoore61@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b90250bd933afd1ba94d86d6b13821997b22b18e) Cc: stable@vger.kernel.org
Alex Deucher [Mon, 27 Apr 2026 15:40:25 +0000 (11:40 -0400)]
drm/radeon: add missing revision check for CI
The memory level workarounds only apply to revision 0 SKUs.
Link: https://gitlab.freedesktop.org/drm/amd/-/work_items/1816 Fixes: 127e056e2a82 ("drm/radeon: fix mclk vddc configuration for cards for hawaii") Fixes: 21b8a369046f ("drm/radeon: fix dram timing for certain hawaii boards") Fixes: 90b2fee35cb9 ("drm/radeon: fix dpm mc init for certain hawaii boards") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4d8dcc14311515077062b5740f39f427075de5c9) Cc: stable@vger.kernel.org
Alex Deucher [Tue, 28 Apr 2026 14:42:49 +0000 (10:42 -0400)]
drm/amdgpu/pm: align Hawaii mclk workaround with radeon
Align the hawaii mclk workaround with radeon and windows.
Link: https://gitlab.freedesktop.org/drm/amd/-/work_items/1816 Fixes: 9f4b35411cfe ("drm/amd/powerplay: add CI asics support to smumgr (v3)") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9649528b637f668c5af9f2b83ca4ad8576ae2121) Cc: stable@vger.kernel.org
Alex Deucher [Mon, 27 Apr 2026 15:38:58 +0000 (11:38 -0400)]
drm/amdgpu/pm: add missing revision check for CI
The ci_populate_all_memory_levels() workaround only
applies to revision 0 SKUs.
Link: https://gitlab.freedesktop.org/drm/amd/-/work_items/1816 Fixes: 9f4b35411cfe ("drm/amd/powerplay: add CI asics support to smumgr (v3)") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1db15ba8f72f400bbad8ae0ce24fafc43429d4bd) Cc: stable@vger.kernel.org
John B. Moore [Tue, 28 Apr 2026 16:35:12 +0000 (11:35 -0500)]
drm/amdgpu/gfx9: drop unnecessary 64-bit fence flag check in KIQ
Remove the BUG_ON(flags & AMDGPU_FENCE_FLAG_64BIT) assertion from
gfx_v9_0_ring_emit_fence_kiq(). The KIQ hardware supports 64-bit
fence writes; the 32-bit writeback address constraint is an
upper-layer convention, not a hardware limitation. The check serves
no purpose and should not be present.
Found by code inspection while investigating related BUG_ON
assertions in the GFX and compute ring emission paths.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: John B. Moore <jbmoore61@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1b1101a46a426bb4328116bb5273c326a2780389) Cc: stable@vger.kernel.org
Felix Kuehling [Mon, 20 Apr 2026 15:55:57 +0000 (11:55 -0400)]
drm/amdkfd: Make all TLB-flushes heavy-weight
With only one sequence number we cannot track the need for legacy vs
heavy-weight flushes reliably. Always use heavy-weight.
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c1a3ff1d327820cd9a52bc1056b98681fc088949) Cc: stable@vger.kernel.org
Thomas Gleixner [Sun, 26 Apr 2026 16:13:54 +0000 (18:13 +0200)]
selftests/rseq: Make registration flexible for legacy and optimized mode
rseq_register_current_thread() either uses the glibc registered RSEQ region
or registers it's own region with the legacy size of 32 bytes.
That worked so far, but becomes a problem when the kernel implements a
distinction between legacy and performance optimized behavior based on the
registration size as that does not allow to test both modes with the self
test suite.
Add two arguments to the function. One to enforce that the registration is
not using libc provided mode and one to tell the registration to use the
legacy size and not the kernel advertised size.
Rename it and make the original one a inline wrapper which preserves the
existing behavior.
Fixes: 566d8015f7ee ("rseq: Avoid CPU/MM CID updates when no event pending") Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://patch.msgid.link/20260428224427.677889423%40kernel.org Cc: stable@vger.kernel.org
Thomas Gleixner [Fri, 24 Apr 2026 22:47:54 +0000 (00:47 +0200)]
rseq: Revert to historical performance killing behaviour
The recent RSEQ optimization work broke the TCMalloc abuse of the RSEQ ABI
as it not longer unconditionally updates the CPU, node, mm_cid fields,
which are documented as read only for user space. Due to the observed
behavior of the kernel it was possible for TCMalloc to overwrite the
cpu_id_start field for their own purposes and rely on the kernel to update
it unconditionally after each context switch and before signal delivery.
The RSEQ ABI only guarantees that these fields are updated when the data
changes, i.e. the task is migrated or the MMCID of the task changes due to
switching from or to per CPU ownership mode.
The optimization work eliminated the unconditional updates and reduced them
to the documented ABI guarantees, which results in a massive performance
win for syscall, scheduling heavy work loads, which in turn breaks the
TCMalloc expectations.
There have been several options discussed to restore the TCMalloc
functionality while preserving the optimization benefits. They all end up
in a series of hard to maintain workarounds, which in the worst case
introduce overhead for everyone, e.g. in the scheduler.
The requirements of TCMalloc and the optimization work are diametral and
the required work arounds are a maintainence burden. They end up as fragile
constructs, which are blocking further optimization work and are pretty
much guaranteed to cause more subtle issues down the road.
The optimization work heavily depends on the generic entry code, which is
not used by all architectures yet. So the rework preserved the original
mechanism moslty unmodified to keep the support for architectures, which
handle rseq in their own exit to user space loop. That code is currently
optimized out by the compiler on architectures which use the generic entry
code.
This allows to revert back to the original behaviour by replacing the
compile time constant conditions with a runtime condition where required,
which disables the optimization and the dependend time slice extension
feature until the run-time condition can be enabled in the RSEQ
registration code on a per task basis again.
The following changes are required to restore the original behavior, which
makes TCMalloc work again:
1) Replace the compile time constant conditionals with runtime
conditionals where appropriate to prevent the compiler from optimizing
the legacy mode out
2) Enforce unconditional update of IDs on context switch for the
non-optimized v1 mode
3) Enforce update of IDs in the pre signal delivery path for the
non-optimized v1 mode
4) Enforce update of IDs in the membarrier(RSEQ) IPI for the
non-optimized v1 mode
5) Make time slice and future extensions depend on optimized v2 mode
This brings back the full performance problems, but preserves the v2
optimization code and for generic entry code using architectures also the
TIF_RSEQ optimization which avoids a full evaluation of the exit to user
mode loop in many cases.
Fixes: 566d8015f7ee ("rseq: Avoid CPU/MM CID updates when no event pending") Reported-by: Mathias Stearn <mathias@mongodb.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Closes: https://lore.kernel.org/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com Link: https://patch.msgid.link/20260428224427.517051752%40kernel.org Cc: stable@vger.kernel.org
Dipayaan Roy [Fri, 1 May 2026 02:47:12 +0000 (19:47 -0700)]
net: mana: Fix crash from unvalidated SHM offset read from BAR0 during FLR
During Function Level Reset recovery, the MANA driver reads
hardware BAR0 registers that may temporarily contain garbage values.
The SHM (Shared Memory) offset read from GDMA_REG_SHM_OFFSET is used
to compute gc->shm_base, which is later dereferenced via readl() in
mana_smc_poll_register(). If the hardware returns an unaligned or
out-of-range value, the driver must not blindly use it, as this would
propagate the hardware error into a kernel crash.
The following crash was observed on an arm64 Hyper-V guest running
kernel 6.17.0-3013-azure during VF reset recovery triggered by HWC
timeout.
From the crash signature x20 = ffff8000a200001b, this address
ends in 0x1b which is not 4-byte aligned, so the 'ldr w1, [x20]'
instruction (readl) triggers the arm64 alignment fault (FSC = 0x21).
The root cause is in mana_gd_init_vf_regs(), which computes:
The offset is used without any validation. The same problem exists
in mana_gd_init_pf_regs() for sriov_base_off and sriov_shm_off.
Fix this by validating all offsets before use:
- VF: check shm_off is within BAR0, properly aligned to 4 bytes
(readl requirement), and leaves room for the full 256-bit
(32-byte) SMC aperture.
- PF: check sriov_base_off is within BAR0, aligned to 8 bytes
(readq requirement), and leaves room to safely read the
sriov_shm_off register at sriov_base_off + GDMA_PF_REG_SHM_OFF.
Then check sriov_shm_off leaves room for the full SMC aperture.
All arithmetic uses subtraction rather than addition to avoid
integer overflow on garbage values.
Define SMC_APERTURE_SIZE (32 bytes, derived from the 256-bit aperture
width)
Return -EPROTO on invalid values. The existing recovery path in
mana_serv_reset() already handles -EPROTO by falling through to PCI
device rescan, giving the hardware another chance to present valid
register values after reset.
Nan Li [Fri, 1 May 2026 01:08:44 +0000 (09:08 +0800)]
net/rds: handle zerocopy send cleanup before the message is queued
A zerocopy send can fail after user pages have been pinned but before
the message is attached to the sending socket.
The purge path currently infers zerocopy state from rm->m_rs, so an
unqueued message can be cleaned up as if it owned normal payload pages.
However, zerocopy ownership is really determined by the presence of
op_mmp_znotifier, regardless of whether the message has reached the
socket queue.
Capture op_mmp_znotifier up front in rds_message_purge() and use it as
the cleanup discriminator. If the message is already associated with a
socket, keep the existing completion path. Otherwise, drop the pinned
page accounting directly and release the notifier before putting the
payload pages.
This keeps early send failure cleanup consistent with the zerocopy
lifetime rules without changing the normal queued completion path.
Fixes: 0cebaccef3ac ("rds: zerocopy Tx support.") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Co-developed-by: Xiao Liu <lx24@stu.ynu.edu.cn> Signed-off-by: Xiao Liu <lx24@stu.ynu.edu.cn> Signed-off-by: Nan Li <tonanli66@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/d2ea98a6313d5467bac00f7c9fef8c7acddb9258.1777550074.git.tonanli66@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
openvswitch: fix self-deadlock on release of tunnel vports
Two patches - the fix for the actual bug and the selftest that reproduces it.
I missed the self-deadlock in the original patch that introduced the issue,
because testing required code modification in the ovs-vswitchd to force it to
use legacy tunnel ports. I thought I made the change correctly, but apparently
something went wrong and the tests were run with the standard LWT infra instead.
The selftest added in this patch set will at least prevent this kind of mistakes
in the future.
I mentioned, however, that these tunnel vports are legacy and not actually used
by ovs-vswitchd. RTM_NEWLINK + COLLECT_METADATA is used in conjunction with the
standard OVS_VPORT_TYPE_NETDEV instead since 2017. The code to use the legacy
tunnels still exists in ovs-vswitchd however, but only as a fallback for older
kernels and we're planning to remove it in the next release. I'll be sending an
RFC to remove support for these legacy tunnel types from the kernel, as they
serve no real purpose today and only increase the uAPI surface for CVEs, but
we need to fix the known bugs for stable versions.
selftests: openvswitch: add tests for tunnel vport refcounting
There were a few issues found with the tunnel vport types around the
vport destruction code. Add some basic tests, so at least we know that
they can be properly added and removed without obvious issues.
The test creates OVS datapath, adds a non-LWT tunnel port, makes sure
they are created, and then removes the datapath and waits for all the
ports to be gone.
The dpctl script had a few bugs in the none-lwt tunnel creation code,
so fixing them as well to make the testing possible:
- The type of the --lwt option changed in order to properly disable it.
- Removed byte order conversion for the port numbers, as the value
supposed to be in the host order.
- Added missing 'gre' choice for the tunnel type.
openvswitch: vport: fix self-deadlock on release of tunnel ports
vports are used concurrently and protected by RCU, so netdev_put()
must happen after the RCU grace period. So, either in an RCU call or
after the synchronize_net(). The rtnl_delete_link() must happen under
RTNL and so can't be executed in RCU context. Calling synchronize_net()
while holding RTNL is not a good idea for performance and system
stability under load in general, so calling netdev_put() in RCU call
is the right solution here.
However,
when the device is deleted, rtnl_unlock() will call netdev_run_todo()
and block until all the references are gone. In the current code this
means that we never reach the call_rcu() and the vport is never freed
and the reference is never released, causing a self-deadlock on device
removal.
Fix that by moving the rcu_call() before the rtnl_unlock(), so the
scheduled RCU callback will be executed when synchronize_net() is
called from the rtnl_unlock()->netdev_run_todo() while the RTNL itself
is already released.
openvswitch: vport: fix race between tunnel creation and linking
When a tunnel vport is created it first creates the tunnel device, e.g.,
with geneve_dev_create_fb(), then it calls ovs_netdev_link() to take a
reference and link it to the device that represents openvswitch datapath.
The creation of the device is happening under RTNL, but then RTNL is
released and re-acquired to find the device by name. It is technically
possible for the tunnel device to be re-named or deleted within that
window while RTNL is not held, and some other device created in its
place. This will cause a non-tunnel device to be referenced in the
vport and tunnel-specific functions used on it, e.g. vxlan_get_options()
that directly casts the private netdev data into a struct vxlan_dev
causing an invalid memory access:
BUG: KASAN: slab-use-after-free in vxlan_get_options+0x323/0x3a0
vxlan_get_options+0x323/0x3a0
ovs_vport_cmd_new+0x6e3/0xd30
Fix that by taking a reference to the just created device before
releasing RTNL. This ensures that the device in the vport is always
the one that was just created. The search by name is only needed
for a standard vport-netdev that links pre-existing devices, so that
functionality and device type checks are moved to netdev_create().
It is also awkward that ovs_netdev_link() takes ownership of the vport
and destroys it on failure. It doesn't know the type of the port it is
dealing with, so we need to pass down the indicator that it's a tunnel,
so the link can be properly deleted on failure.
It's possible to refactor the logic to make the ovs_netdev_link() do
only the linking part and let the callers perform a proper destruction,
but it will be much more code for each legacy tunnel port type, so it
is not worth it for the bug fix.
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Reported-by: Yuan Tan <tanyuan98@outlook.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Reported-by: Yang Yang <n05ec@lzu.edu.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://patch.msgid.link/20260430213349.407991-1-i.maximets@ovn.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The device name allocated via kzalloc() in init_one_mc() is assigned to
dev->init_name but never freed on the normal removal path. device_register()
copies init_name and then sets dev->init_name to NULL, so the name pointer
becomes unreachable from the device. Thus leaking memory.
Use a stack-local char array instead of using kzalloc() for name.
Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller") Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260401111856.2342975-1-ptsm@linux.microsoft.com
drm/panel: himax-hx83102: restore MODE_LPM after sending disable cmds
When preparing the panel, it seems that it always expects commands to be
transferred in LP mode. However, the disable function removes the
MIPI_DSI_MODE_LPM flag, and no other function re-adds it.
As the unprepare function contains no DSI commands, re-adding the flag
just after disabling the panel should be safe. Add the code re-adding
the flag after the two commands for disabling the panel are sent.
This fixes screen unblanking (after blanking once) on
mt8188-geralt-ciri-sku1 device.
Cc: stable@vger.kernel.org # 6.11+ Fixes: 0ef94554dc40 ("drm/panel: himax-hx83102: Break out as separate driver") Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patch.msgid.link/20260425165751.1716569-1-zhengxingda@iscas.ac.cn
Icenowy Zheng [Sun, 3 May 2026 09:17:08 +0000 (17:17 +0800)]
drm/panel: boe-tv101wum-nl6: restore MODE_LPM after sending disable cmds
When preparing the panel, it seems that it always expects commands to be
transferred in LP mode. However, the disable function removes the
MIPI_DSI_MODE_LPM flag, and no other function re-adds it.
As the unprepare function contains no DSI commands, re-adding the flag
just after disabling the panel should be safe. Add the code re-adding
the flag after the two commands for disabling the panel are sent.
This fixes error messages shown in kernel log when unblanking on
mt8183-kukui-kodama-sku32 device.
Cc: stable@vger.kernel.org Fixes: a869b9db7adf ("drm/panel: support for boe tv101wum-nl6 wuxga dsi video mode panel") Signed-off-by: Icenowy Zheng <zhengxingda@iscas.ac.cn> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patch.msgid.link/20260503091708.1079962-1-zhengxingda@iscas.ac.cn
Like a number of other panel drivers, this newly merged driver
needs DRM_DISPLAY_DSC_HELPER to be enabled:
arm-linux-gnueabi-ld: drivers/gpu/drm/panel/panel-himax-hx83121a.o: in function `himax_prepare':
panel-himax-hx83121a.c:(.text+0x1024): undefined reference to `drm_dsc_pps_payload_pack'
Fixes: a7c61963b727 ("drm/panel: Add Himax HX83121A panel driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: David Heidelberg <david@ixit.cz> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patch.msgid.link/20260413071043.3829868-1-arnd@kernel.org
Chen Ni [Fri, 27 Mar 2026 02:17:28 +0000 (10:17 +0800)]
drm/panel: himax-hx83121a: Fix incorrect error check for devm_drm_panel_alloc()
Check devm_drm_panel_alloc() return value for ERR_PTR instead of NULL.
devm_drm_panel_alloc() returns an ERR_PTR on failure, never NULL. Using
a NULL check skips the error path and may cause a NULL pointer
dereference.
Fixes: a7c61963b727 ("drm/panel: Add Himax HX83121A panel driver") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Pengyu Luo <mitltlatltl@gmail.com> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patch.msgid.link/20260327021728.647182-1-nichen@iscas.ac.cn
ASoC: wm_adsp_fw_find_test: Clear searched_fw_files in find-by-index test
In wm_adsp_fw_find_test_find_firmware_byindex() the content of
priv->searched_fw_files must be cleared before starting the next iteration.
The files searched for are appended to priv->searched_fw_files, so if it is
not cleared on each iteration it will still contain the searches from the
previous iteration.
Redirect wm_adsp_release_firmware_files() to a replacement function that
handles the dummy firmware created by the tests. Use the same cleanup
function to cleanup in the test exit() function. Also call it on each
loop in wm_adsp_fw_find_test_find_firmware_byindex() to free the created
strings before reusing priv->found_fw on the next loop.
wm_adsp_release_firmware_files() will pass the struct firmware* pointers
to release_firmware(). But the pointers created by the tests are dummies
and must not be passed to release_firmware().
The test never invokes wm_adsp_release_firmware_files() so it wasn't
redirected. But the error handling in wm_adsp_request_firmware_files()
calls wm_adsp_release_firmware_files(). The redirected function makes
this safe.
Using the same cleanup function to perform cleanup from the test exit()
handler and wm_adsp_fw_find_test_find_firmware_byindex() avoids the risk
of duplicate cleanup code that all needs updating if there is any change
to the cleanup requirements.
This problem was found by https://sashiko.dev.
Fixes: bf2d44d07de7 ("ASoC: wm_adsp: Add kunit test for firmware file search") Closes: https://sashiko.dev/#/patchset/20260326100853.1582886-1-rf%40opensource.cirrus.com Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20260505105123.3539778-2-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
Dapeng Mi [Thu, 30 Apr 2026 00:25:57 +0000 (08:25 +0800)]
perf/x86/intel: Enable auto counter reload for DMR
Panther cove µarch starts to support auto counter reload (ACR), but the
static_call intel_pmu_enable_acr_event() is not updated for the Panther
Cove µarch used by DMR. It leads to the auto counter reload is not
really enabled on DMR.
Update static_call intel_pmu_enable_acr_event() in intel_pmu_init_pnc().
Dapeng Mi [Thu, 30 Apr 2026 00:25:56 +0000 (08:25 +0800)]
perf/x86/intel: Disable PMI for self-reloaded ACR events
On platforms with Auto Counter Reload (ACR) support, such as NVL, a
"NMI received for unknown reason 30" warning is observed when running
multiple events in a group with ACR enabled:
$ perf record -e '{instructions/period=20000,acr_mask=0x2/u,\
cycles/period=40000,acr_mask=0x3/u}' ./test
The warning occurs because the Performance Monitoring Interrupt (PMI)
is enabled for the self-reloaded event (the cycles event in this case).
According to the Intel SDM, the overflow bit
(IA32_PERF_GLOBAL_STATUS.PMCn_OVF) is never set for self-reloaded events.
Since the bit is not set, the perf NMI handler cannot identify the source
of the interrupt, leading to the "unknown reason" message.
Furthermore, enabling PMI for self-reloaded events is unnecessary and
can lead to extraneous records that pollute the user's requested data.
Disable the interrupt bit for all events configured with ACR self-reload.
Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload") Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260430002558.712334-4-dapeng1.mi@linux.intel.com
Dapeng Mi [Thu, 30 Apr 2026 00:25:55 +0000 (08:25 +0800)]
perf/x86/intel: Always reprogram ACR events to prevent stale masks
Members of an ACR group are logically linked via a bitmask of their
hardware counter indices. If some members of the group are assigned new
hardware counters during rescheduling, even events that keep their
original counter index must be updated with a new mask.
Without this, an event will continue to use a stale acr_mask that
references the old indices of its group peers. Ensure all ACR events are
reprogrammed during the scheduling path to maintain consistency across
the group.
Dapeng Mi [Thu, 30 Apr 2026 00:25:54 +0000 (08:25 +0800)]
perf/x86/intel: Improve validation and configuration of ACR masks
Currently there are several issues on the user space ACR mask validation
and configuration.
- The validation for user space ACR mask (attr.config2) is incomplete,
e.g., the ACR mask could include the index which belongs to another
ACR events group, but it's not validated.
- An early return on an invalid ACR mask caused all subsequent ACR groups
to be skipped.
- The stale hardware ACR mask (hw.config1) is not cleared before setting
new hardware ACR mask.
The following changes address all of the above issues.
- Figure out the event index group of an ACR group. Any bits in the
user-space mask not present in the index group are now dropped.
- Instead of an early return on invalid bits, drop only the invalid
portions and continue iterating through all ACR events to ensure full
configuration.
- Explicitly clear the stale hardware ACR mask for each event prior to
writing the new configuration.
Besides, a non-leader event member of ACR group could be disabled in
theory. This could cause bit-shifting errors in the acr_mask of remaining
group members. But since ACR sampling requires all events to be active,
this should not be a big concern in real use case. Add a "FIXME" comment
to notice this risk.
Peter Zijlstra [Thu, 26 Mar 2026 11:28:21 +0000 (12:28 +0100)]
perf/core: Fix deadlock in perf_mmap() failure path
Ian noted that commit 77de62ad3de3 ("perf/core: Fix refcount bug and
potential UAF in perf_mmap") would cause a deadlock due to
event->mmap_mutex recursion.
This happens because we're now calling perf_mmap_close() under
mmap_mutex, while that function itself can also take mmap_mutex.
Solve this by noting that perf_mmap_close() is far more complicated
than we need at this particular point, since it deals with scenarios
that cannot happen in this particular case.
Replace the call to perf_mmap_close() with a very narrow undo for the
case of first-exposure. If this is not the first mmap(), there is no
race and it is fine to drop the lock and call perf_mmap_close() to
handle to more complicated scenarios.
Note: move the rb->mmap_user (namespace) handling into the rb
init/free code such that it does not complicate the mmap handling.
Fixes: 77de62ad3de3 ("perf/core: Fix refcount bug and potential UAF in perf_mmap") Reported-by: Ian Rogers <irogers@google.com> Closes: https://patch.msgid.link/CAP-5%3DfVJyVMZw%3DDqP53Kxg58nUmJ_0bxoaeOKAbC03BVc11HaA%40mail.gmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260326112821.GK3738786@noisy.programming.kicks-ass.net
====================
net: mana: Fix mana_destroy_rxq() cleanup for partial RXQ init
When mana_create_rxq() fails partway through initialization (e.g. the
hardware rejects the WQ object creation), the error path calls
mana_destroy_rxq() to tear down a partially-initialized RXQ.
This exposed multiple issues in mana_destroy_rxq() path, as it assumed
the RXQ was always fully initialized, leading to multiple issues:
1. xdp_rxq_info_unreg() was called on an unregistered xdp_rxq,
triggering a WARN_ON ("Driver BUG") in net/core/xdp.c.
2. mana_destroy_wq_obj() was called with INVALID_MANA_HANDLE,
sending a bogus destroy command to the hardware.
3. mana_deinit_cq() was called twice — once inside mana_destroy_rxq()
and again in mana_create_rxq()'s error path — causing a
use-after-free since mana_destroy_rxq() frees the rxq first.
This was observed during ethtool ring parameter changes when the
hardware returned an error creating the RXQ. This series makes
mana_destroy_rxq() safe to call at any stage of RXQ initialization
by guarding each teardown step, and removes the redundant cleanup
in mana_create_rxq().
====================
Dipayaan Roy [Thu, 30 Apr 2026 03:57:54 +0000 (20:57 -0700)]
net: mana: remove double CQ cleanup in mana_create_rxq error path
In mana_create_rxq(), the error cleanup path calls mana_destroy_rxq()
followed by mana_deinit_cq(). This is incorrect for two reasons:
1. mana_destroy_rxq() already calls mana_deinit_cq() internally,
so the CQ's GDMA queue is destroyed twice.
2. mana_destroy_rxq() frees the rxq via kfree(rxq) before returning.
The subsequent mana_deinit_cq(apc, cq) then operates on freed memory
since cq points to &rxq->rx_cq, which is embedded in the
already-freed rxq structure — a use-after-free.
Remove the redundant mana_deinit_cq() call from the error path since
mana_destroy_rxq() already handles CQ cleanup. mana_deinit_cq() is
itself safe for an uninitialized CQ as it checks for a NULL gdma_cq
before proceeding.
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Reviewed-by: Aditya Garg <gargaditya@linux.microsoft.com> Link: https://patch.msgid.link/20260430035935.1859220-4-dipayanroy@linux.microsoft.com Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dipayaan Roy [Thu, 30 Apr 2026 03:57:53 +0000 (20:57 -0700)]
net: mana: Skip WQ object destruction for uninitialized RXQ
In mana_destroy_rxq(), mana_destroy_wq_obj() is called unconditionally
even when the WQ object was never created (rxobj is still
INVALID_MANA_HANDLE). When mana_create_rxq() fails before
mana_create_wq_obj() succeeds, the error path calls mana_destroy_rxq()
which sends a bogus destroy command to the hardware:
mana 7870:00:00.0: HWC: Failed hw_channel req: 0x1d
mana 7870:00:00.0: Failed to send mana message: -71, 0x1d
mana 7870:00:00.0 eth7: Failed to destroy WQ object: -71
Guard mana_destroy_wq_obj() with an INVALID_MANA_HANDLE check so that
mana_destroy_rxq() is safe to call at any stage of RXQ initialization.
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260430035935.1859220-3-dipayanroy@linux.microsoft.com Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dipayaan Roy [Thu, 30 Apr 2026 03:57:52 +0000 (20:57 -0700)]
net: mana: check xdp_rxq registration before unreg in mana_destroy_rxq()
When mana_create_rxq() fails at mana_create_wq_obj() or any step before
xdp_rxq_info_reg() is called, the error path jumps to `out:` which calls
mana_destroy_rxq(). mana_destroy_rxq() unconditionally calls
xdp_rxq_info_unreg() on xilinx xdp_rxq that was never registered,
triggering a WARN_ON in net/core/xdp.c:
Guard the xdp_rxq_info_unreg() call with xdp_rxq_info_is_reg() so that
mana_destroy_rxq() is safe to call regardless of how far initialization
progressed.
Fixes: ed5356b53f07 ("net: mana: Add XDP support") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260430035935.1859220-2-dipayanroy@linux.microsoft.com Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakov Novak [Mon, 4 May 2026 16:23:57 +0000 (18:23 +0200)]
wifi: libertas: notify firmware load wait on disconnect
Currently, when the firmware is not fully loaded and if_usb_disconnect
is called, if_usb_prog_firmware gets stuck waiting for
cardp->surprise_removed or cardp->fwdnldover while lbs_remove_card
also waits for the firmware loading to be completed, which never happens.
This caused the reported syzbot bug. To address this, the wake_up
function call can be added in the if_usb_disconnect function which notifies
the if_usb_prog_firmware thread and resolves the firmware loading.
drm/etnaviv: Fix armed job not being pushed to the DRM scheduler
When xa_alloc_cyclic() failed in etnaviv_sched_push_job(), the error
path skipped drm_sched_entity_push_job(). This is a violation of the DRM
scheduler contract, as once a job has been armed with drm_sched_job_arm(),
it must be pushed with drm_sched_entity_push_job(). From the DRM
scheduler documentation,
"""
drm_sched_job_arm() is a point of no return since it initializes the
fences and their sequence number etc. Once that function has been called,
you *must* submit it with drm_sched_entity_push_job() and cannot simply
abort it by calling drm_sched_job_cleanup().
"""
Fix this by splitting the fence ID allocation into two phases: first,
alloc an xarray slot before arming the job (which can fail), then fill in
the actual fence with xa_store() after arming. This way, allocation
failures are handled before the job is armed, and once armed, the job is
always pushed to the scheduler.
This also fixes a double call to drm_sched_job_cleanup(), as both
etnaviv_sched_push_job() and its caller would call it on failure.
Lee Jones [Wed, 29 Apr 2026 13:40:42 +0000 (13:40 +0000)]
nfc: llcp: Fix use-after-free race in nfc_llcp_recv_cc()
A race condition exists in the NFC LLCP connection state machine where
the connection acceptance packet (CC) can be processed concurrently with
socket release. This can lead to a use-after-free of the socket object.
When nfc_llcp_recv_cc() moves the socket from the connecting_sockets
list to the sockets list, it does so without holding the socket lock.
If llcp_sock_release() is executing concurrently, it might have already
unlinked the socket and dropped its references, which can result in
nfc_llcp_recv_cc() linking a freed socket into the live list.
Fix this by holding lock_sock() during the state transition and list
movement in nfc_llcp_recv_cc(). After acquiring the lock, check if
the socket is still hashed to ensure it hasn't already been unlinked
and marked for destruction by the release path. This aligns the locking
pattern with recv_hdlc() and recv_disc().