Rob Clark [Sat, 11 Apr 2026 15:03:12 +0000 (08:03 -0700)]
drm/msm/a6xx: Restore sysprof_active
This got lost in the shuffle somehow when moving the vfunc table to
catalogue. Fixes inhibiting IFPC when userspace is collecting perfcntr
data.
Fixes: 491fadb2b818 ("drm/msm/adreno: Move adreno_gpu_func to catalogue") Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/717780/
Message-ID: <20260411150312.257937-1-robin.clark@oss.qualcomm.com>
drm/msm/adreno: fix userspace-triggered crash on a2xx-a4xx
Before a5xx Adreno driver will not try fetching UBWC params (because
those generations didn't support UBWC anyway), however it's still
possible to query UBWC-related params from the userspace, triggering
possible NULL pointer dereference. Check for UBWC config in
adreno_get_param() and return sane defaults if there is none.
Fixes: a452510aad53 ("drm/msm/adreno: Switch to the common UBWC config struct") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/717778/
Message-ID: <20260411-adreno-fix-ubwc-v3-1-4983156f3f80@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Jeremy Laratro [Tue, 12 May 2026 23:26:16 +0000 (08:26 +0900)]
ksmbd: fix null pointer dereference in compare_guid_key()
session_fd_check() walks the per-inode m_op_list during durable-handle
session teardown and sets op->conn = NULL for every opinfo whose conn
matched the closing session's connection. The matching opinfo, however,
stays linked in its per-ClientGuid lease_table_list entry's lb->lease_list
because destroy_lease_table() only runs on full TCP-connection teardown,
not on SESSION_LOGOFF.
If the same TCP connection then negotiates a fresh session with the
same ClientGuid (ClientGuid is bound to NEGOTIATE, not the session, and
is unchanged across LOGOFF + SETUP) and issues a SMB2 CREATE with a
lease context on a different inode, find_same_lease_key() walks
lb->lease_list, reaches the stale opinfo, and calls compare_guid_key(),
which unconditionally dereferences opinfo->conn->ClientGUID. The conn
pointer is NULL and the kernel panics.
Reproducer requires only a successful SMB2 SESSION_SETUP and a share
configured with 'durable handles = yes'. KASAN report on mainline 70390501d194:
general protection fault, probably for non-canonical address
0xdffffc0000000069: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000348-0x000000000000034f]
Workqueue: ksmbd-io handle_ksmbd_work
RIP: 0010:bcmp+0x5b/0x230
Call Trace:
compare_guid_key+0x4b/0xd0
find_same_lease_key+0x324/0x690
smb2_open+0x6aea/0x8e60
handle_ksmbd_work+0x796/0xee0
...
Faulting address 0x348 is the offset of ClientGUID within struct
ksmbd_conn, confirming opinfo->conn was NULL.
Read opinfo->conn once and bail out if it has been cleared by a
concurrent session_fd_check(). A half-detached opinfo cannot be the
owner of an active lease, so returning 0 is the correct match result.
Fixes: c8efcc786146 ("ksmbd: add support for durable handles v1/v2") Cc: stable@vger.kernel.org Signed-off-by: Jeremy Laratro <research@aradex.io> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Jeremy Laratro [Tue, 12 May 2026 23:23:26 +0000 (08:23 +0900)]
ksmbd: fix null pointer dereference in proc_show_files()
When a SMB2 client opens a file with a durable v2 handle and then issues
SMB2 SESSION_LOGOFF, session_fd_check() clears fp->tcon = NULL on the
reconnectable file pointer but leaves the fp registered in global_ft.idr
until the durable scavenger fires (up to fp->durable_timeout seconds
later).
During that window any read of /proc/fs/ksmbd/files (mode 0400) panics
the kernel because proc_show_files() walks global_ft.idr and
unconditionally dereferences fp->tcon->id with no NULL guard.
Reproducer requires only a successful SMB2 SESSION_SETUP and a share
configured with 'durable handles = yes'. KASAN report on mainline 70390501d194:
general protection fault, probably for non-canonical address
0xdffffc0000000000: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:proc_show_files+0x118/0x740
Call Trace:
proc_show_files+0x118/0x740
seq_read_iter+0x4ef/0xe10
proc_reg_read_iter+0x1b7/0x280
...
Guard the dereference. A durable-disconnected fp legitimately has no
tcon; report its tree id as 0 rather than oopsing.
Fixes: b38f99c1217a ("ksmbd: add procfs interface for runtime monitoring and statistics") Cc: stable@vger.kernel.org Signed-off-by: Jeremy Laratro <research@aradex.io> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Ferry Meng [Mon, 11 May 2026 13:18:16 +0000 (21:18 +0800)]
ksmbd: fix SID memory leak in set_posix_acl_entries_dacl() on overflow
Commit 299f962c0b02 ("ksmbd: use check_add_overflow() to prevent u16
DACL size overflow") added check_add_overflow() guards that break out
of the ACE-building loops in set_posix_acl_entries_dacl() when the
accumulated DACL size would wrap past 65535.
However, each iteration allocates a struct smb_sid via kmalloc_obj()
at the top of the loop and relies on the kfree(sid) call at the end
of the loop body (the 'pass_same_sid' label in the first loop, and
the explicit kfree at the tail of the second loop) to release it.
The newly introduced 'break' statements bypass those kfree() calls,
leaking the sid buffer every time an overflow is detected.
A malicious or malformed file with enough POSIX ACL entries to trip
the overflow check will leak one or more struct smb_sid allocations
on every request that touches the file's DACL, providing a trivial
kernel memory exhaustion vector.
Free sid before breaking out of the loops to plug the leak.
Fixes: 299f962c0b02 ("ksmbd: use check_add_overflow() to prevent u16 DACL size overflow") Cc: stable@vger.kernel.org Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Felix Gu [Fri, 23 Jan 2026 16:37:38 +0000 (00:37 +0800)]
drm/msm/adreno: Fix a reference leak in a6xx_gpu_init()
In a6xx_gpu_init(), node is obtained via of_parse_phandle().
While there was a manual of_node_put() at the end of the
common path, several early error returns would bypass this call,
resulting in a reference leak.
Fix this by using the __free(device_node) cleanup handler to
release the reference when the variable goes out of scope.
Fixes: 5a903a44a984 ("drm/msm/a6xx: Introduce GMU wrapper support") Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/700661/
Message-ID: <20260124-a6xx_gpu-v1-1-fa0c8b2dcfb1@gmail.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Commit dc220915ddb2 ("drm/msm: Fix GMEM_BASE for gen8") changed the
GMEM_BASE check from adreno_is_a650_family() & adreno_is_a740_family()
to family >= ADRENO_6XX_GEN4.
This inadvertently excluded A650 (ADRENO_6XX_GEN3), causing it to report
an incorrect GMEM_BASE which results in severe rendering corruption.
Update check to also include ADRENO_6XX_GEN3 to fix A650.
Fixes: dc220915ddb2 ("drm/msm: Fix GMEM_BASE for gen8") Signed-off-by: Alexander Koskovich <akoskovich@pm.me> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/711880/
Message-ID: <20260314-fix-gmem-base-a650-v1-1-3308f60cf74c@pm.me> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Matt Evans [Mon, 11 May 2026 14:46:42 +0000 (07:46 -0700)]
vfio/pci: Make VFIO_PCI_OFFSET_TO_INDEX() return unsigned
VFIO_PCI_OFFSET_TO_INDEX() is used in several places with a signed
parameter (e.g. loff_t). Because it makes no sense for a BAR/resource
index to be negative, enforce this in the macro.
This fixes at least one current issue, where vfio_pci_ioeventfd() uses
this macro with an unvalidated signed loff_t returned into a signed
type, leading to a possible negative array access. This instance does
test against an out-of-bounds positive value, so treating the index as
unsigned fixes this issue.
Andrea Righi [Wed, 13 May 2026 11:24:38 +0000 (13:24 +0200)]
sched_ext: Use HK_TYPE_DOMAIN_BOOT to detect isolcpus= domain isolation
scx_enable() refuses to attach a BPF scheduler when isolcpus=domain is
in effect by comparing housekeeping_cpumask(HK_TYPE_DOMAIN) against
cpu_possible_mask.
Since commit 27c3a5967f05 ("sched/isolation: Convert housekeeping
cpumasks to rcu pointers"), HK_TYPE_DOMAIN's cpumask is RCU protected
and dereferencing it requires either RCU read lock, the cpu_hotplug
write lock, or the cpuset lock; scx_enable() holds none of these, so
booting with isolcpus=domain and attaching any BPF scheduler triggers
the following lockdep splat:
In addition, commit 03ff73510169 ("cpuset: Update HK_TYPE_DOMAIN cpumask
from cpuset") made HK_TYPE_DOMAIN include cpuset isolated partitions as
well, which means the current check also rejects BPF schedulers when a
cpuset partition is active. That contradicts the original intent of
commit 9f391f94a173 ("sched_ext: Disallow loading BPF scheduler if
isolcpus= domain isolation is in effect"), which explicitly noted that
cpuset partitions are honored through per-task cpumasks and should not
be rejected.
Switch to housekeeping_enabled(HK_TYPE_DOMAIN_BOOT), which reads only
the housekeeping flag bit (no RCU dereference) and reflects exactly the
boot-time isolcpus= configuration that the error message refers to.
Alex Williamson [Thu, 7 May 2026 14:35:46 +0000 (08:35 -0600)]
vfio/pci: fix dma-buf kref underflow after revoke
vfio_pci_dma_buf_move(revoked=true) and vfio_pci_dma_buf_cleanup()
ran the same drain sequence: set priv->revoked, invalidate mappings,
wait for fences, drop the registered kref, wait for completion.
When the VFIO device fd was closed after PCI_COMMAND_MEMORY had been
cleared, both ran in turn -- the second kref_put underflowed and the
subsequent wait_for_completion() blocked on a completion that the
first run had already consumed:
Collapse the duplication: vfio_pci_dma_buf_cleanup() now delegates
the drain to vfio_pci_dma_buf_move(true), which is idempotent for
already-revoked dma-bufs. cleanup retains only list removal and
the device registration drop; the dma_resv_lock that bracketed
those is dropped along with the in-line drain that required it,
memory_lock continues to protect them.
Re-arm the kref and the completion at the end of move()'s revoke
branch so post-revoke state matches post-creation (kref == 1,
completion ready). This keeps cleanup's call into move() a no-op
when revoke already ran, and replaces the explicit kref_init() that
the un-revoke branch used to perform for the un-revoke -> remap
path.
Fixes: 1a8a5227f229 ("vfio: Wait for dma-buf invalidation to complete") Reported-by: Joonas Kylmälä <joonas.kylmala@netum.fi> Closes: https://lore.kernel.org/all/GVXPR02MB12019AA6014F27EF5D773E89BFB372@GVXPR02MB12019.eurprd02.prod.outlook.com/ Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Reviewed-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Alex Williamson <alex.williamson@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260507143548.1018405-1-alex.williamson@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
block: pass a minsize argument to bio_iov_iter_bounce
When bouncing for block size > PAGE_SIZE file systems that require
file system block size alignment (e.g. zoned XFS), the bio needs to
be big enough to fit an entire block.
Fixes: 8dd5e7c75d7b ("block: add helpers to bounce buffer an iov_iter into bios") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@kernel.org> Link: https://patch.msgid.link/20260507050153.1298375-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
Henry Tseng [Wed, 13 May 2026 10:28:46 +0000 (18:28 +0800)]
cpufreq: intel_pstate: Use HYBRID_SCALING_FACTOR_ADL for Bartlett Lake
Bartlett Lake P-core only SKUs (e.g. Intel Core 9 273PE)
do not report X86_FEATURE_HYBRID_CPU and are not in
intel_hybrid_scaling_factor[]. In hwp_get_cpu_scaling(), the
non-hybrid fallback then applies core_get_scaling() (100000),
producing cpuinfo_max_freq values that exceed the documented Max
Turbo Frequency:
Per the Intel datasheet [1], the Intel Core 9 273PE specifies:
Performance-cores: 12
Efficient-cores: 0
Max Turbo Frequency: 5.7 GHz
Intel Thermal Velocity Boost Frequency: 5.7 GHz
Intel Turbo Boost Max Technology 3.0 Frequency: 5.6 GHz
Performance-core Max Turbo Frequency: 5.4 GHz
Performance-core Base Frequency: 2.3 GHz
Bartlett Lake P-cores are Raptor Cove cores, per
commit d466304c4322 ("x86/cpu: Add CPU model number for Bartlett
Lake CPUs with Raptor Cove cores"). The Alder Lake and Raptor Lake
P-core entries in intel_hybrid_scaling_factor[] use
HYBRID_SCALING_FACTOR_ADL (78741). The same factor applies to
Bartlett Lake.
Add Bartlett Lake to intel_hybrid_scaling_factor[] with
HYBRID_SCALING_FACTOR_ADL so HWP performance levels map to the
correct CPU frequencies matching the datasheet's Max Turbo Frequency.
cpufreq: intel_pstate: Use correct scaling factor on Raptor Lake-E
Raptor Lake-E has the same processor ID as Raptor Lake-S, so there is
an entry in intel_hybrid_scaling_factor[] for it. It does not contain
E-cores though and hybrid_get_cpu_type() returns 0 for its P-cores, so
they get the default "core" scaling factor. However, the original
Raptor Lake scaling factor for P-cores still needs to be used for
mapping the HWP performance levels of the P-cores in Raptor Lake-E to
frequency, as though they were part of a real hybrid system.
To address this, update hwp_get_cpu_scaling() to return
hybrid_scaling_factor, which is the P-core scaling factor
retrieved from intel_hybrid_scaling_factor[], for all CPUs
that are not enumerated as E-cores.
Documentation: intel_pstate: Fix description of asymmetric packing with SMT
Patchset [1], including commits
046a5a95c3b0 ("x86/sched/itmt: Give all SMT siblings of a core the same priority") 995998ebdebd ("x86/sched: Remove SD_ASYM_PACKING from the SMT domain flags")
overhauled asym_packing handling in the scheduler on x86 hybrid
processors with SMT. It removed SD_ASYM_PACKING from the x86 SMT
scheduling domain and made all SMT siblings of a core share the same
priority. As a result, asym_packing operates only across physical
cores, spreading tasks among them and only using idle SMT siblings
once all physical cores are busy.
Nicholas Carlini [Mon, 11 May 2026 18:02:16 +0000 (18:02 +0000)]
io-wq: check that the predecessor is hashed in io_wq_remove_pending()
io_wq_remove_pending() needs to fix up wq->hash_tail[] if the cancelled
work was the tail of its hash bucket. When doing this, it checks whether
the preceding entry in acct->work_list has the same hash value, but
never checks that the predecessor is hashed at all. io_get_work_hash()
is simply atomic_read(&work->flags) >> IO_WQ_HASH_SHIFT, and the hash
bits are never set for non-hashed work, so it returns 0. Thus, when a
hashed bucket-0 work is cancelled while a non-hashed work is its list
predecessor, the check spuriously passes and a pointer to the non-hashed
io_kiocb is stored in wq->hash_tail[0].
Because non-hashed work is dequeued via the fast path in
io_get_next_work(), which never touches hash_tail[], the stale pointer
is never cleared. Therefore, after the non-hashed io_kiocb completes and
is freed back to req_cachep, wq->hash_tail[0] is a dangling pointer. The
io_wq is per-task (tctx->io_wq) and survives ring open/close, so the
dangling pointer persists for the lifetime of the task; the next hashed
bucket-0 enqueue dereferences it in io_wq_insert_work() and
wq_list_add_after() writes through freed memory.
Add the missing io_wq_is_hashed() check so a non-hashed predecessor
never inherits a hash_tail[] slot.
Cc: stable@vger.kernel.org Fixes: 204361a77f40 ("io-wq: fix hang after cancelling pending hashed work") Signed-off-by: Nicholas Carlini <nicholas@carlini.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
where xcpus = user_xcpus(cs) which returns cs->exclusive_cpus (if set)
or cs->cpus_allowed. When exclusive_cpus is not set, user_xcpus(cs) can
contain CPUs that were never actually granted to the partition due to
sibling exclusion in compute_excpus(). Consequently, the invalidation
may return CPUs to the parent that remain in use by sibling partitions,
causing overlapping effective_cpus and triggering the
WARN_ON_ONCE(1) in generate_sched_domains().
Use cs->effective_xcpus instead, which reflects the CPUs actually
granted to this partition.
# b1 becomes partition root with CPUs 1-2, but sibling exclusion
# reduces its effective_xcpus to CPU 2 only
echo "1-2" > b1/cpuset.cpus
echo "root" > b1/cpuset.cpus.partition
Linus Torvalds [Wed, 13 May 2026 18:53:51 +0000 (11:53 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"arm64:
- Add the pKVM side of the workaround for ARM's erratum 4193714,
provided that the EL3 firmware does its part of the job. KVM will
refuse to initialise otherwise
- Correctly handle 52bit VAs for guest EL2 stage-1 translations when
running under NV with E2H==0
- Correctly deal with permission faults in guest_memfd memslots
- Fix the steal-time selftest after the infrastructure was reworked
- Make sure the host cannot pass a non-sensical clock update to the
EL2 tracing infrastructure
- Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390
ability to run arm64 guests, which will inevitably lead to arm64
code being directly used on s390
- Make sure that EL2 is configured with both exception entry and exit
being Context Synchronization Events
- Handle the current vcpu being NULL on EL2 panic
- Fix the selftest_vcpu memcache being empty at the point of donation
or sharing
- Check that the memcache has enough capacity before engaging on the
share/donate path
- Fix __deactivate_fgt() to use its parameter rather than a variable
in the macro context
s390:
- Fix array overrun with large amounts of PCI devices
x86:
- Never use L0's PAUSE loop exiting while L2 is running, since it's
unlikely that a nested guest will help solving the hypervisor's
spinlock contention
- Fix emulation of MOVNTDQA
- Fix typo in Xen hypercall tracepoint
- Add back an optimization that was left behind when recently fixing
a bug
- Add module parameter to disable CET, whose implementation seems to
have issues. For now it remains enabled by default
Generic:
- Reject offset causing an unsigned overflow in kvm_reset_dirty_gfn()
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
KVM: VMX: introduce module parameter to disable CET
KVM: x86: Swap the dst and src operand for MOVNTDQA
KVM: x86: use again the flush argument of __link_shadow_page()
KVM: selftests: Ensure gmem file sizes are multiple of host page size
Documentation: kvm: update links in the references section of AMD Memory Encryption
KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running
KVM: x86: Fix Xen hypercall tracepoint argument assignment
KVM: Reject wrapped offset in kvm_reset_dirty_gfn()
KVM: arm64: Pre-check vcpu memcache for host->guest donate
KVM: arm64: Pre-check vcpu memcache for host->guest share
KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache
KVM: arm64: Fix __deactivate_fgt macro parameter typo
KVM: arm64: Guard against NULL vcpu on VHE hyp panic path
KVM: arm64: Make EL2 exception entry and exit context-synchronization events
MAINTAINERS: Add Steffen as reviewer for KVM/arm64
KVM: arm64: Remove potential UB on nvhe tracing clock update
KVM: selftests: arm64: Fix steal_time test after UAPI refactoring
KVM: arm64: Handle permission faults with guest_memfd
KVM: arm64: nv: Consider the DS bit when translating TCR_EL2
KVM: arm64: Work around C1-Pro erratum 4193714 for protected guests
...
Yu Miao [Wed, 13 May 2026 02:39:07 +0000 (10:39 +0800)]
selftests/cgroup: Fix error path leaks in test_percpu_basic
When cg_name_indexed() returns NULL partway through the child creation
loop, the code returned -1 without running cleanup_children and cleanup.
That left the `parent` pathname allocation unreleased and did not remove
child cgroup directories already created under the parent. Fix by jumping
to cleanup_children instead of returning.
When cg_create() fails, `child` (the pathname from cg_name_indexed())
was not freed before cleanup_children. Fix by freeing `child` before
branching to cleanup_children.
RDMA/bnxt_re: zero shared page before exposing to userspace
bnxt_re_alloc_ucontext() allocates uctx->shpg via
__get_free_page(GFP_KERNEL). The buddy allocator does not zero pages
without __GFP_ZERO, so the page contains stale kernel data from
whatever object most recently freed it.
The page is then mapped into userspace via vm_insert_page() under
BNXT_RE_MMAP_SH_PAGE in bnxt_re_mmap(). The driver only ever writes
4 bytes (a u32 AVID) at offset BNXT_RE_AVID_OFFT (0x10) inside
bnxt_re_create_ah(); the remaining 4092 bytes of the page are exposed
to userspace unsanitised, leaking kernel memory contents.
Any user with access to /dev/infiniband/uverbsX on a host with a
bnxt_re device (typically rdma group membership) can read this data
via a single mmap() at pgoff 0 after IB_USER_VERBS_CMD_GET_CONTEXT.
Other shared pages in the same file already use get_zeroed_page()
correctly:
Yi Lai [Thu, 7 May 2026 12:51:06 +0000 (20:51 +0800)]
selftests/rdma: explicitly skip tests when required modules are missing
Currently, the rdma rxe selftests fail with an exit code of 1 when
required kernel modules are not present. This causes spurious failures
in environments where these modules might not be compiled or available.
Include the standard kselftest 'ktap_helpers.sh' and replace the
hardcoded error exits with '$KSFT_SKIP'. This ensures the tests are
properly marked as skipped rather than failed.
Fixes: e01027cab38a ("RDMA/rxe: Add testcase for net namespace rxe") Signed-off-by: Yi Lai <yi1.lai@intel.com> Link: https://patch.msgid.link/20260507125106.3114167-1-yi1.lai@intel.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
Johan Hovold [Fri, 8 May 2026 14:44:46 +0000 (16:44 +0200)]
drm/gma500/oaktrail_lvds: fix i2c adapter leaks on init
The LVDS init code looks up an I2C adapter using i2c_get_adapter() and
tries to read the EDID before falling back to allocating and registering
its own adapter.
Make sure to drop the references taken by i2c_get_adapter() when falling
back to allocating an adapter as well as on late errors to allow the
looked up adapter to be deregistered.
Johan Hovold [Fri, 8 May 2026 14:44:45 +0000 (16:44 +0200)]
drm/gma500/oaktrail_lvds: fix hang on init failure
The LVDS init code looks up an I2C adapter using i2c_get_adapter() and
tries to read the EDID before falling back to allocating and registering
its own adapter.
The error handling does not separate these cases so on a late init
failure it will try to deregister and free also an adapter that had
previously been registered. Since i2c_get_adapter() takes another
reference to the adapter, deregistration hangs indefinitely while
waiting for the reference to be released.
Fix this by only destroying adapters allocated during LVDS init on
errors.
Fixes: a57ebfc0b4da ("drm/gma500: Make oaktrail lvds use ddc adapter from drm_connector") Cc: stable@vger.kernel.org # 6.0 Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Link: https://patch.msgid.link/20260508144446.59722-3-johan@kernel.org
KVM: selftests: Guard execinfo.h inclusion for non-glibc builds
The backtrace() function and execinfo.h are GNU extensions available
in glibc but not in non-glibc C libraries such as musl. Building KVM
selftests with musl-gcc fails with:
lib/assert.c:9:10: fatal error: execinfo.h: No such file or directory
Fix this by guarding the inclusion of execinfo.h and the stack dumping
logic under #ifdef __GLIBC__. For non-glibc builds, provide a local
stub for test_dump_stack().
Suggested-by: Aqib Faruqui <aqibaf@amazon.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Hisam Mehboob <hisamshar@gmail.com> Link: https://patch.msgid.link/20260409153846.1502656-2-hisamshar@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
Lei Chen [Thu, 9 Apr 2026 14:22:26 +0000 (22:22 +0800)]
KVM: x86: Rate-limit global clock updates on vCPU load
commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
As a result, kvm_arch_vcpu_load() can queue global clock update requests
every time a vCPU is scheduled when the master clock is disabled or when
the vCPU is loaded for the first time.
Restore the throttling with a per-VM ratelimit state and gate
KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
scheduling does not generate a steady stream of redundant clock update
requests.
Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/ Link: https://patch.msgid.link/20260409142226.2581-1-lei.chen@smartx.com Signed-off-by: Sean Christopherson <seanjc@google.com>
x86/virt: Silence RCU lockdep splat in emergency virt callback path
x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
through machine_crash_shutdown() with IRQs disabled but with RCU not
necessarily watching the crashing CPU, which triggers a suspicious
RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
panic/kdump:
A truly correct fix is non-trivial: the RCU usage genuinely is wrong in
panic context (RCU may ignore the crashing CPU during synchronization),
and a concurrent KVM module unload could in principle race with the
callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return
notifier registered on reboot/shutdown") which notes that nothing
prevents module unload during panic/reboot.
However, the alternatives are worse:
- smp_store_release()/smp_load_acquire() handles ordering but not
liveness; the kernel still needs to keep the module text alive
while the callback is in flight.
- Taking a lock in the panic path is risky — any lock could be held
by a CPU that has already been NMI'd to a halt.
Use rcu_dereference_raw() to silence the splat and accept the
vanishingly small remaining race. Panic context inherently cannot
guarantee complete correctness; the goal here is to keep debug builds
quiet on the kdump path so the splat doesn't obscure the actual
kernel state being captured.
Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
with kvm_amd or kvm_intel loaded by triggering kdump:
echo c > /proc/sysrq-trigger
Suggested-by: Sean Christopherson <seanjc@google.com> Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem") Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260504235435.90957-1-mikhail.v.gavrilov@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: selftests: Include sys/mman.h *and* linux/mman.h, via kvm_syscalls.h
Include both linux/mman.h (the kernel provided version) and sys/mman.h (the
libc provided version) throughout KVM selftests, by way of kvm_syscalls.h
(which should have been including sys/mman.h anyways). Pulling in the
kernel's version fixes compilation errors with the guest_memfd test on
older versions of libc due to a recent commit adding MADV_COLLAPSE testing.
In file included from include/kvm_util.h:8,
from guest_memfd_test.c:21:
guest_memfd_test.c: In function ‘test_collapse’:
guest_memfd_test.c:219:47: error: ‘MADV_COLLAPSE’ undeclared (first use in this function); did you mean ‘MADV_COLD’?
219 | TEST_ASSERT_EQ(madvise(mem, pmd_size, MADV_COLLAPSE), -1);
| ^~~~~~~~~~~~~
include/test_util.h:62:16: note: in definition of macro ‘TEST_ASSERT_EQ’
62 | typeof(a) __a = (a); \
| ^
guest_memfd_test.c:219:47: note: each undeclared identifier is reported only once for each function it appears in
219 | TEST_ASSERT_EQ(madvise(mem, pmd_size, MADV_COLLAPSE), -1);
| ^~~~~~~~~~~~~
include/test_util.h:62:16: note: in definition of macro ‘TEST_ASSERT_EQ’
62 | typeof(a) __a = (a); \
| ^
Route the includes through kvm_syscalls.h to try and avoid a future game
of whack-a-mole, i.e. so that future expansion of test coverage doesn't run
into the same problem.
To discourage use of sys/mman.h, opportunistically include the kernel's
version of mman.h in test_util.h as it only needs MAP_SHARED, i.e. only
needs the full set of kernel defs, not the libc syscall wrappers.
Fixes: 9830209b4ae8 ("KVM: selftests: Test MADV_COLLAPSE on guest_memfd") Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Closes: https://lore.kernel.org/all/20260427204313.50741-1-rick.p.edgecombe@intel.com Link: https://patch.msgid.link/20260428012503.1213654-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
/*
* Alert userspace if needed. If we logged an MCE, reduce the polling
* interval, otherwise increase the polling interval.
*/
if (mce_notify_irq())
<--- here we haven't ran the notifier chain yet so mce_need_notify is
not set yet so this won't hit and we won't halve the interval iv.
Now the notifier chain runs. mce_early_notifier() sets the bit, does
mce_notify_irq(), that clears the bit and then the notifier chain
a little later logs the error.
So this is a silly timing issue.
But, that's all unnecessary.
All it needs to happen here is, the "should we notify of a logged MCE"
mce_notify_irq() asks, should be simply a question to the mce gen pool:
"Are you empty?"
And that then turns into a simple yes or no answer and it all
JustWorks(tm).
So do that and also distribute the functionality where it belongs:
- Print that MCE events have been logged in mce_log()
- Trigger the mcelog tool specific work in the first notifier
Ming Lei [Wed, 13 May 2026 10:19:40 +0000 (18:19 +0800)]
selftests: ublk: cap nthreads to kernel's actual nr_hw_queues
dev->nthreads is derived from the user-requested queue count before the
ADD command, but the kernel may reduce nr_hw_queues (capped to
nr_cpu_ids). When the VM has fewer CPUs than requested queues, the
daemon creates more handler threads than there are kernel queues.
In non-batch mode, the extra threads access uninitialized queues
(q_depth=0), submit zero io_uring SQEs, and block forever in
io_cqring_wait. In batch mode, the extra threads cause similar hangs
during device removal.
In both cases, the stuck threads prevent the daemon from closing the
char device, holding the last ublk_device reference and causing
ublk_ctrl_del_dev() to hang in wait_event_interruptible().
Fix by capping dev->nthreads to the kernel-returned nr_hw_queues after
the ADD command completes. per_io_tasks mode is excluded because threads
interleave across all queues, so nthreads > nr_hw_queues is valid.
Damien Le Moal [Wed, 13 May 2026 11:11:29 +0000 (20:11 +0900)]
block: fix handling of dead zone write plugs
Shin'ichiro reported hard to reproduce unaligned write errors with zoned
block devices. Under normal operation conditions (e.g. running XFS on an
SMR disk), these errors are nearly impossible to trigger. But using a
"slow" kernel with many debug options enables and some specific use
cases (e.g. fio zbd test case 46), the errors can be reproduced fairly
easily.
The unaligned write errors come from mishandling a valid reference
counting pattern of zone write plugs. Such pattern triggers for instance
if a process A writes a zone (not necessarilly to the full state),
another process B immediately resets the zone and immediately following
the completion of the zone reset, starts issuing writes to the zone.
With such pattern, in some cases, the zone write plugs worker thread of
the device may still be holding a reference to the zone write plug of
the zone taken when process A was writing to the zone. The following
zone reset from process B marks the zone as dead but does not remove the
zone write plug from the device hash table as a reference to the plug
still exist. Once process B starts issuing new writes, the zone write
plug is seen as dead and the writes from process B are immediately
failed, despite this write pattern being perfectly legal.
Fix this by allowing restoring a dead zone write plug to a live state if
a write is issued to the zone when the zone is: marked as dead, empty
and the write sector corresponds to the first sector of the zone (that
is, the write is aligned to the zone write pointer). This is done with
the new helper function disk_check_zone_wplug_dead(), which restores a
dead zone write plug to a live state by clearing the BLK_ZONE_WPLUG_DEAD
flag and restoring the initial reference to the zone write plug taken
when the plug was added to the device hash table.
Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: b7d4ffb51037 ("block: fix zone write plug removal") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://patch.msgid.link/20260513111129.108809-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Bonzini [Tue, 12 May 2026 14:58:48 +0000 (16:58 +0200)]
KVM: VMX: introduce module parameter to disable CET
There have been reports of host hangs caused by CET virtualization.
Until these are analyzed further, introduce a module parameter that
makes it possible to easily disable it.
Mixing devm and drmm functions will result in a use-after-free on msm
driver teardown if userspace keeps a reference on the drm device:
The WB connector data will be destroyed because of the use of
devm_kzalloc()), while the usersoace still can try interacting with the
WB connector (which uses drmm_ functions).
drm/msm/dsi: don't dump registers past the mapped region
On DSI 6G platforms the IO address space is internally adjusted by
io_offset. Later this adjusted address might be used for memory dumping.
However the size that is used for memory dumping isn't adjusted to
account for the io_offset, leading to the potential access to the
unmapped region. Lower ctrl_size by the io_offset value to prevent
access past the mapped area.
The Kaanapali DPU catalog defines kaanapali_cwb[] with the correct
CWB base addresses for this platform (0x169200, 0x169600, 0x16a200,
0x16a600), but the dpu_kaanapali_cfg struct was mistakenly pointing
to sm8650_cwb instead. The SM8650 CWB blocks sit at completely
different offsets (0x66200, 0x66600, 0x7E200, 0x7E600), so using
them on Kaanapali would program CWB registers at wrong addresses,
corrupting unrelated hardware blocks and breaking writeback capture.
Fix this by pointing .cwb to the correct kaanapali_cwb array.
Fixes: 83fe2cd56b1d ("drm/msm/dpu: Add support for Kaanapali DPU") Signed-off-by: Mahadevan P <mahadevan.p@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/721444/ Link: https://lore.kernel.org/r/20260428-kaanapali_cwb-v1-1-51fdb2c65498@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
dt-bindings: display/msm: qcom,eliza-mdss: Correct DPU and DP ranges in example
VBIF register range is 0x3000 long. DisplayPort block has few too short
ranges and misses four more address spaces. Similarly first part of DSI
space should be 0x300 long.
No practical impact, except when existing code is being re-used in new
contributions.
dt-bindings: display/msm: dp-controller: Allow DAI on SM8650 and others
DisplayPort on Qualcomm SoCs like SM8650 and compatible SM8750 supports
audio and there is already DTS having cells and sound-name-prefix. The
"else:" clause for non-EDP and non-aux-bus cases already requires
'#sound-dai-cells', so it should actually reference the dai-common.yaml
for other properties, as pointed out by dtbs_check warnings like:
sm8650-hdk-display-card-rear-camera-card.dtb:
displayport-controller@af54000 (qcom,sm8650-dp): Unevaluated properties are not allowed ('sound-name-prefix' was unexpected)
dt-bindings: display/msm: dp-controller: Correct SM8650 IO range
DP on Qualcomm SM8650 come with nine address ranges, so describe the
remaining ones as optional to keep ABI backwards compatible. Driver
also does not need them to operate correctly.
Yang Xiuwei [Wed, 13 May 2026 09:43:03 +0000 (17:43 +0800)]
io_uring/rw: drop unused attr_type_mask from io_prep_rw_pi()
io_prep_rw_pi() never used the attr_type_mask argument. Callers already
validate sqe->attr_type_mask before invoking the helper (only
IORING_RW_ATTR_FLAG_PI is supported today). Remove the dead parameter to
avoid implying further interpretation happens here.
Hardik Prakash [Tue, 12 May 2026 07:31:38 +0000 (13:01 +0530)]
pinctrl-amd: enable IRQ for WACF2200 touchscreen on Lenovo Yoga 7 14AGP11
On Lenovo Yoga 7 14AGP11 (83TD), the WACF2200 touchscreen controller
is wired via I2C2 (AMDI0010:02) with its interrupt on GPIO pin 157
(confirmed via ACPI _CRS GpioInt decode). After amd_gpio_irq_init()
clears all GPIO interrupts at boot, pin 157 is never re-enabled,
preventing the touchscreen from signalling the driver.
Windows keeps GPIO 157 INTERRUPT_ENABLE (bit 11) and INTERRUPT_MASK
(bit 12) set after initialisation. Add a DMI quirk to restore these
bits after amd_gpio_irq_init() on this hardware.
Niklas Söderlund [Sun, 10 May 2026 10:30:17 +0000 (12:30 +0200)]
net: ethernet: ravb: Do not check URAM suspension when WoL is active
When updating the driver to match latest datasheet to suspend access to
URAM when suspending DMA transfers a corner-case was missed, URAM access
will not be suspended if WoL is enabled. This lead to the error message
(correctly) being triggered as URAM access is not suspended even tho
it's requested as part of stopping DMA.
Avoid checking if URAM access is suspended and printing the error
message if WoL is enabled when we suspend the system, as we know it will
not be.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/all/CAMuHMdWnjV%3DHGE1o08zLhUfTgOSene5fYx1J5GG10mB%2BToq8qg@mail.gmail.com/ Fixes: 353d8e7989b6 ("net: ethernet: ravb: Suspend and resume the transmission flow") Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sai Krishna <saikrishnag@marvell.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chenguang Zhao [Mon, 11 May 2026 01:43:43 +0000 (09:43 +0800)]
ethtool: fix ethnl_bitmap32_not_zero() bit interval semantics
ethnl_bitmap32_not_zero() should return true if some bit in [start, end)
is set:
- Fix inverted memchr_inv() sense: return true when the scan finds a
non-zero byte, not when the middle words are all zero.
- Return false for an empty interval (end <= start).
- When end is 32-bit aligned, indices in [start, end) do not include any
bits from map[end_word]; return false after earlier checks found no
non-zero data.
Xiang Mei [Sun, 10 May 2026 22:26:40 +0000 (15:26 -0700)]
net/smc: avoid NULL deref of conn->lnk in smc_msg_event tracepoint
The smc_msg_event tracepoint class, shared by smc_tx_sendmsg and
smc_rx_recvmsg, unconditionally dereferences smc->conn.lnk:
__string(name, smc->conn.lnk->ibname)
conn->lnk is only set for SMC-R; for SMC-D it is NULL. Other code on
these paths already handles this (e.g. !conn->lnk in
SMC_STAT_RMB_TX_SIZE_SMALL()). With the tracepoint enabled, the first
sendmsg()/recvmsg() on an SMC-D socket crashes:
Oops: general protection fault, probably for non-canonical address
KASAN: null-ptr-deref in range [...]
RIP: 0010:strlen+0x1e/0xa0
Call Trace:
trace_event_raw_event_smc_msg_event (net/smc/smc_tracepoint.h:44)
smc_rx_recvmsg (net/smc/smc_rx.c:515)
smc_recvmsg (net/smc/af_smc.c:2859)
__sys_recvfrom (net/socket.c:2315)
__x64_sys_recvfrom (net/socket.c:2326)
do_syscall_64
The faulting address 0x3e0 is offsetof(struct smc_link, ibname),
confirming the NULL ->lnk deref. Enabling the tracepoint requires
root, but the trigger itself is unprivileged: socket(AF_SMC, ...) has
no capability check, and SMC-D negotiation needs no admin step on
s390 or on x86 with the loopback ISM device loaded.
Log an empty device name for SMC-D instead of dereferencing NULL.
Fixes: aff3083f10bf ("net/smc: Introduce tracepoints for tx and rx msg") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Sidraya Jayagond <sidraya@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nicolò Coccia [Sun, 10 May 2026 16:34:13 +0000 (12:34 -0400)]
net/smc: fix sleep-inside-lock in __smc_setsockopt() causing local DoS
A logic flaw in __smc_setsockopt() allows a local unprivileged user to
cause a Denial of Service (DoS) by holding the socket lock indefinitely.
The function __smc_setsockopt() calls copy_from_sockptr() while holding
lock_sock(sk). By passing a userfaultfd-monitored memory page (or
FUSE-backed memory on systems where unprivileged userfaultfd is disabled)
as the optval, an attacker can halt execution during the copy operation,
keeping the lock held.
Combined with asynchronous tear-down operations like shutdown(), this
exhausts the kernel wq (kworkers) and triggers the hung task watchdog.
[ 240.123456] INFO: task kworker/u8:2 blocked for more than 120 seconds.
[ 240.123489] Call Trace:
[ 240.123501] smc_shutdown+...
[ 240.123512] lock_sock_nested+...
This patch moves the user-space copy outside the lock_sock() critical
section to prevent the issue.
Fixes: a6a6fe27bab4 ("net/smc: Dynamic control handshake limitation by socket options") Signed-off-by: Nicolò Coccia <n.coccia96@gmail.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Tested-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wei Yang [Sat, 9 May 2026 12:23:58 +0000 (20:23 +0800)]
net: atm: fix skb leak in sigd_send() default branch
The default branch in sigd_send() calls sock_put() and returns -EINVAL
without freeing the skb, while all other exit paths do so. Add the
missing dev_kfree_skb() before sock_put() to fix the leak.
phy_remove() clears phydev->drv but doesn't call phy_detach(), so the
phy_device stays in the link topology xarray and ethnl_req_get_phydev()
still hands it back. ETHTOOL_MSG_PHY_GET then oopses on:
drvname is already treated as optional by phy_reply_size(),
phy_fill_reply() and phy_cleanup_data(), so just skip the allocation
when there is no driver bound.
Fixes: 9dd2ad5e92b9 ("net: ethtool: phy: Convert the PHY_GET command to generic phy dump") Cc: stable@vger.kernel.org # 6.13.x Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260509215046.107157-1-devnexen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Zoran Ilievski [Mon, 11 May 2026 06:40:02 +0000 (08:40 +0200)]
net: atlantic: preserve PCI wake-from-D3 on shutdown when WOL enabled
The shutdown handler aq_pci_shutdown() unconditionally calls
pci_wake_from_d3(pdev, false), clearing the PCI PME_En bit even when
wake-on-LAN has been configured. While aq_nic_shutdown() correctly
programs the NIC firmware via aq_nic_set_power() to listen for magic
packets, the PCI subsystem will not propagate the resulting PME wake
event from D3, so the system never wakes after poweroff.
WOL from suspend (S3) is unaffected because aq_suspend_common() does
not touch pci_wake_from_d3() and relies on the PM core's wake
configuration via device_may_wakeup().
This affects all atlantic-supported NICs (AQC107/108/111/112/113);
users have reported that WOL works if the atlantic driver is never
loaded, but breaks once it has run its shutdown path.
Pass the configured WOL state to pci_wake_from_d3() instead of a
literal false, so the PCI PME_En bit is preserved when the user has
armed WOL via ethtool.
Tejun Heo [Mon, 11 May 2026 23:18:23 +0000 (13:18 -1000)]
sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work
scx_sub_enable_workfn() pins parent->kobj before dropping scx_sched_lock,
but that does not pin parent->sub_kset. Concurrent disable can
kset_unregister and free sub_kset before scx_alloc_and_add_sched()
dereferences it.
Split sub_kset teardown: kobject_del() at disable keeps sysfs removal; defer
kobject_put() to scx_sched_free_rcu_work so the memory survives. A racing
child sees state_in_sysfs=0 with valid memory, sysfs_create_dir() fails, and
the existing exit_kind gate in scx_link_sched() turns it away with -ENOENT.
Tejun Heo [Mon, 11 May 2026 23:18:19 +0000 (13:18 -1000)]
sched_ext: INIT_LIST_HEAD() &sch->all in scx_alloc_and_add_sched()
On scx_link_sched() error paths (parent disabled, hash insert failure),
&sch->all is never added to scx_sched_all. The cleanup path runs
scx_unlink_sched() unconditionally, which calls list_del_rcu(&sch->all) on a
list_head that was never initialized triggering a corruption warning.
Initialize &sch->all.
Fixes: 54be8de4236a ("sched_ext: Factor out scx_link_sched() and scx_unlink_sched()") Signed-off-by: Tejun Heo <tj@kernel.org>
Tejun Heo [Tue, 12 May 2026 20:30:00 +0000 (10:30 -1000)]
sched_ext: Drop NONE early return in scx_disable_and_exit_task()
d3e73a0808dd ("sched_ext: Handle SCX_TASK_NONE in disable/switched_from
paths") skipped the trailing scx_set_task_sched(p, NULL) on NONE tasks.
After scx_fail_parent() parks a task at NONE/sched=parent and the parent
is later freed via queue_rcu_work() during root_disable, the preserved
p->scx.sched dangles - print_scx_info() from sched_show_task() reads
sch->ops.name from freed memory.
Drop the early return. __scx_disable_and_exit_task() already short-
circuits on NONE and the SUB_INIT block was cleared by
scx_fail_parent()'s earlier call, so clearing p->scx.sched is the only
work left - and the one thing the path actually needs.
v2: Extend the SUB_INIT block comment to note that the flag is only
set on the sub-enable path, so it's always clear on the NONE
re-entry (Andrea).
Fixes: d3e73a0808dd ("sched_ext: Handle SCX_TASK_NONE in disable/switched_from paths") Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
KVM: x86: Swap the dst and src operand for MOVNTDQA
Swap the MOVNTDQA operands, as MOVNTDQA does NOT in fact have "the same
characteristics as 0F E7 (MOVNTDQ)"; MOVNTDQA loads from memory and stores
to registers, while MOVNTDQ loads from registers and stores to memory.
Per the SDM:
MOVNTDQ - Move packed integer values in xmm1 to m128 using non-temporal
hint.
MOVNTDQA - Move double quadword from m128 to xmm1 using non-temporal hint
if WC memory type.
Reported-by: Josh Eads <josheads@google.com> Fixes: c57d9bafbd0b ("KVM: x86: Add support for emulating MOVNTDQA") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260506213514.2781948-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sun, 3 May 2026 21:09:17 +0000 (23:09 +0200)]
KVM: x86: use again the flush argument of __link_shadow_page()
Except in the case of parentless nested-TDP pages, mmu_page_zap_pte()
clears the SPTE but leaves the invalid_list empty. In this case, using
kvm_flush_remote_tlbs() as kvm_mmu_remote_flush_or_zap() does is overkill.
Avoid flushing the entirety of the remote TLBs unless the invalid_list
was populated: instead, use a more efficient gfn-targeting flush (if
available) and skip it altogether if the caller guarantees that a TLB
flush is not necessary.
Based-on: <20260503201029.106481-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260503210917.121840-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Val Packett [Thu, 12 Mar 2026 00:53:37 +0000 (21:53 -0300)]
arm64: dts: qcom: x1-dell-thena: remove i2c20 (battery SMBus) and reserve its pins
i2c20 is used by the battmgr service on the ADSP to communicate with the
SBS interface of the battery. Initializing it from Linux would break the
battmgr functionality when booted in EL2. Mark those pins as reserved.
Fixes: e7733b42111c ("arm64: dts: qcom: Add support for Dell Inspiron 7441 / Latitude 7455") Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Signed-off-by: Val Packett <val@packett.cool> Link: https://lore.kernel.org/r/20260312005731.12488-2-val@packett.cool Signed-off-by: Bjorn Andersson <andersson@kernel.org>
KVM: selftests: Ensure gmem file sizes are multiple of host page size
When creating a guest_memfd file and associated memslot to validate shared
guest memory, size the file+memslot to the maximum of the host or guest
page size. Attempting to allocate a single guest page will fail if the
host page size is greater than the guest page size, as KVM requires that
the size of memslots and guest_memfd files are a multiple of the host page
size.
For simplicity, verify the entire file can be shared between guest and host,
e.g. instead of trying to validate "partial" mappings.
Fixes: 42188667be38 ("KVM: selftests: Add guest_memfd testcase to fault-in on !mmap()'d memory") Reported-by: Zenghui Yu <zenghui.yu@linux.dev> Closes: https://lore.kernel.org/all/0064952b-048c-455d-ad89-e27e5cb82591@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260512155634.772602-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 12 May 2026 20:19:20 +0000 (22:19 +0200)]
Merge tag 'kvmarm-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 7.1, take #2
- Add the pKVM side of the workaround for ARM's erratum 4193714, provided
that the EL3 firmware does its part of the job. KVM will refuse to
initialise otherwise.
- Correctly handle 52bit VAs for guest EL2 stage-1 translations when
running under NV with E2H==0.
- Correctly deal with permission faults in guest_memfd memslots.
- Fix the steal-time selftest after the infrastructure was reworked.
- Make sure the host cannot pass a non-sensical clock update to the
EL2 tracing infrastructure.
- Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390
ability to run arm64 guests, which will inevitably lead to arm64
code being directly used on s390.
- Make sure that EL2 is configured with both exception entry and exit
being Context Synchronization Events.
- Handle the current vcpu being NULL on EL2 panic.
- Fix the selftest_vcpu memcache being empty at the point of donation or
sharing.
- Check that the memcache has enough capacity before engaging on the
share/donate path.
- Fix __deactivate_fgt() to use its parameter rather than a variable
in the macro context.
KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running
Never use L0's (KVM's) PAUSE loop exiting controls while L2 is running,
and instead always configure vmcb02 according to L1's exact capabilities
and desires.
The purpose of intercepting PAUSE after N attempts is to detect when the
vCPU may be stuck waiting on a lock, so that KVM can schedule in a
different vCPU that may be holding said lock. Barring a very interesting
setup, L1 and L2 do not share locks, and it's extremely unlikely that an
L1 vCPU would hold a spinlock while running L2. I.e. having a vCPU
executing in L1 yield to a vCPU running in L2 will not allow the L1 vCPU
to make forward progress, and vice versa.
While teaching KVM's "on spin" logic to only yield to other vCPUs in L2 is
doable, in all likelihood it would do more harm than good for most setups.
KVM has limited visibility into which L2 "vCPUs" belong to the same VM,
and thus share a locking domain. And even if L2 vCPUs are in the same
VM, KVM has no visilibity into L2 vCPU's that are scheduled out by the
L1 hypervisor.
Furthermore, KVM doesn't actually steal PAUSE exits from L1. If L1 is
intercepting PAUSE, KVM will route PAUSE exits to L1, not L0, as
nested_svm_intercept() gives priority to the vmcb12 intercept. As such,
overriding the count/threshold fields in vmcb02 with vmcb01's values is
nonsensical, as doing so clobbers all the training/learning that has been
done in L1.
Even worse, if L1 is not intercepting PAUSE, i.e. KVM is handling PAUSE
exits, then KVM will adjust the PLE knobs based on L2 behavior, which could
very well be detrimental to L1, e.g. due to essentially poisoning L1 PLE
training with bad data.
And copying the count from vmcb02 to vmcb01 on a nested VM-Exit makes even
less sense, because again, the purpose of PLE is to detect spinning vCPUs.
Whether or not a vCPU is spinning in L2 at the time of a nested VM-Exit
has no relevance as to the behavior of the vCPU when it executes in L1.
The only scenarios where any of this actually works is if at least one
of KVM or L1 is NOT intercepting PAUSE for the guest. Per the original
changelog, those were the only scenarios considered to be supported.
Disabling KVM's use of PLE makes it so the VM is always in a "supported"
mode.
Last, but certainly not least, using KVM's count/threshold instead of the
values provided by L1 is a blatant violation of the SVM architecture.
Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Cc: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260508213321.373309-1-seanjc@google.com/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Aaron Sacks [Tue, 12 May 2026 06:07:42 +0000 (02:07 -0400)]
KVM: Reject wrapped offset in kvm_reset_dirty_gfn()
kvm_reset_dirty_gfn() guards the gfn range with
if (!memslot || (offset + __fls(mask)) >= memslot->npages)
return;
but offset is u64 and the addition is unchecked. The check can be
silently bypassed by a u64 wrap.
The dirty ring backing those entries is MAP_SHARED at
KVM_DIRTY_LOG_PAGE_OFFSET of the vcpu fd, so the VMM can rewrite the
slot and offset fields of any entry between when the kernel pushes
them and when KVM_RESET_DIRTY_RINGS consumes them. On reset,
kvm_dirty_ring_reset() re-reads the values via READ_ONCE() and feeds
them straight back into this check; only the flags handshake is
treated as the handover, the slot/offset payload is taken on trust.
makes the coalescing loop in kvm_dirty_ring_reset() compute
delta = (s64)(0 - 0xffffffffffffffc1) = 63
which falls in [0, BITS_PER_LONG), so it folds entry[i+1] into the
existing mask by setting bit 63. The trailing kvm_reset_dirty_gfn()
call then sees offset = 0xffffffffffffffc1 and __fls(mask) = 63;
the sum is 0 in u64 and the bounds check passes.
That offset propagates into kvm_arch_mmu_enable_log_dirty_pt_masked()
unchanged. On the legacy MMU path -- kvm_memslots_have_rmaps() ==
true, i.e. shadow paging, any VM that has allocated shadow roots, or
a write-tracked slot -- it reaches gfn_to_rmap(), which indexes
slot->arch.rmap[0][] with a near-U64_MAX gfn. That is an
out-of-bounds load of a kvm_rmap_head, followed by a conditional
clear of PT_WRITABLE_MASK in whatever the loaded pointer points at.
The path is reachable from any process holding /dev/kvm.
Range-check offset on its own first, so the addition cannot wrap.
memslot->npages is bounded well below U64_MAX, so once offset <
npages holds, offset + __fls(mask) (with __fls(mask) < BITS_PER_LONG)
stays in range.
Sergio Correia [Tue, 12 May 2026 13:28:59 +0000 (14:28 +0100)]
audit: enforce AUDIT_LOCKED for AUDIT_TRIM and AUDIT_MAKE_EQUIV
AUDIT_ADD_RULE and AUDIT_DEL_RULE correctly check for AUDIT_LOCKED
and return -EPERM, but AUDIT_TRIM and AUDIT_MAKE_EQUIV do not. This
allows a process with CAP_AUDIT_CONTROL to modify directory tree
watches and equivalence mappings even when the audit configuration
has been locked, undermining the purpose of the lock.
Add AUDIT_LOCKED checks to both commands.
Cc: stable@vger.kernel.org Reviewed-by: Ricardo Robaina <rrobaina@redhat.com> Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Sergio Correia <scorreia@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Sergio Correia [Tue, 12 May 2026 13:28:33 +0000 (14:28 +0100)]
audit: fix incorrect inheritable capability in CAPSET records
__audit_log_capset() records the effective capability set into the
inheritable field due to a copy-paste error. Every CAPSET audit
record therefore reports cap_pi (process inheritable) with the value
of cap_effective instead of cap_inheritable.
This silently corrupts audit data used for compliance and forensic
analysis: an attacker who modifies inheritable capabilities to
prepare for a privilege-escalating exec would have the change masked
in the audit trail.
The bug has been present since the original introduction of CAPSET
audit records in 2008.
Cc: stable@vger.kernel.org Fixes: e68b75a027bb ("When the capset syscall is used it is not possible for audit to record the actual capbilities being added/removed. This patch adds a new record type which emits the target pid and the eff, inh, and perm cap sets.") Reviewed-by: Ricardo Robaina <rrobaina@redhat.com> Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Sergio Correia <scorreia@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
Raphael Zimmer [Tue, 12 May 2026 16:16:40 +0000 (18:16 +0200)]
libceph: Fix potential null-ptr-deref in decode_choose_args()
A message of type CEPH_MSG_OSD_MAP contains an OSD map that itself
contains a CRUSH map. When decoding this CRUSH map in crush_decode(), an
array of max_buckets CRUSH buckets is decoded, where some indices may
not refer to actual buckets and are therefore set to NULL. The received
CRUSH map may optionally contain choose_args that get decoded in
decode_choose_args(). When decoding a crush_choose_arg_map, a series of
choose_args for different buckets is decoded, with the bucket_index
being read from the incoming message. It is only checked that the bucket
index does not exceed max_buckets, but not that it doesn't point to an
index with a NULL bucket. If a (potentially corrupted) message contains
a crush_choose_arg_map including such a bucket_index, a null pointer
dereference may occur in the subsequent processing when attempting to
access the bucket with the given index.
This patch fixes the issue by extending the affected check. Now, it is
only attempted to access the bucket if it is not NULL.
Raphael Zimmer [Tue, 12 May 2026 07:29:30 +0000 (09:29 +0200)]
libceph: handle rbtree insertion error in decode_choose_args()
A message of type CEPH_MSG_OSD_MAP contains an OSD map that itself
contains a CRUSH map. The received CRUSH map may optionally contain
choose_args that get decoded in decode_choose_args(). In this function,
num_choose_arg_maps is read from the message, and a corresponding number
of crush_choose_arg_maps gets decoded afterwards. Each
crush_choose_arg_map has a choose_args_index, which serves as the key
when inserting it into the choose_args rbtree of the decoded crush_map.
If a (potentially corrupted) message contains two crush_choose_arg_maps
with the same index, the assertion in insert_choose_arg_map() triggers a
kernel BUG when trying to insert the second crush_choose_arg_map.
This patch fixes the issue by switching to the non-asserting rbtree
insertion function and rejecting the message if the insertion fails.
hwmon: (asus_atk0110) Check ACPI_COMPANION() against NULL
Every platform driver can be forced to match a device that doesn't match
its list of device IDs because of device_match_driver_override(), so
platform drivers that rely on the existence of a device's ACPI companion
object need to verify its presence.
Accordingly, add a requisite ACPI_HANDLE() check against NULL to the
asus_atk0110 hwmon driver.
Fixes: ee1752590733 ("hwmon: (asus_atk0110) Convert ACPI driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/2261594.irdbgypaU6@rafael.j.wysocki Signed-off-by: Guenter Roeck <linux@roeck-us.net>
hwmon: (acpi_power_meter) Check ACPI_COMPANION() against NULL
Every platform driver can be forced to match a device that doesn't match
its list of device IDs because of device_match_driver_override(), so
platform drivers that rely on the existence of a device's ACPI companion
object need to verify its presence.
Accordingly, add a requisite ACPI_COMPANION() check against NULL to the
acpi_power_meter hwmon driver.
Fixes: afc6c4aedea5 ("hwmon: (acpi_power_meter) Convert ACPI driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/5068745.GXAFRqVoOG@rafael.j.wysocki Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Input: atlas - check ACPI_COMPANION() against NULL
Every platform driver can be forced to match a device that doesn't match
its list of device IDs because of device_match_driver_override(), so
platform drivers that rely on the existence of a device's ACPI companion
object need to verify its presence.
Accordingly, add a requisite ACPI_COMPANION() check against NULL to the
atlas_btns driver.
Fixes: b8303880b641 ("Input: atlas - convert ACPI driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/8696590.T7Z3S40VBb@rafael.j.wysocki Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Linus Torvalds [Tue, 12 May 2026 17:18:02 +0000 (10:18 -0700)]
Merge tag 'probes-fixes-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- kprobes: skip non-symbol addresses in kprobe_add_ksym_blacklist()
Since the ftrace adds its NOPs at .kprobes.text section (which stores
an array), a wrong entry is added when loading a module which uses
"__kprobes" attribute.
To solve this, add "notrace" to __kprobes functions
- test_kprobes: clear kprobes between test runs
Clear all kprobes in the test program after running a test set,
because Kunit test can run several times
- fprobe: Fix unregister_fprobe() to wait for RCU grace period
Since the fprobe data structure is removed with hlist_del_rcu(), it
should wait for the RCU grace period. If the caller waits for RCU, we
can use the async variant (e.g. eBPF)
* tag 'probes-fixes-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fprobe: Fix unregister_fprobe() to wait for RCU grace period
test_kprobes: clear kprobes between test runs
kprobes: skip non-symbol addresses in kprobe_add_ksym_blacklist()
Willy Tarreau [Sat, 9 May 2026 09:47:55 +0000 (11:47 +0200)]
Documentation: security-bugs: clarify requirements for AI-assisted reports
AI tools are increasingly used to assist in bug discovery. While these
tools can identify valid issues, reports that are submitted without
manual verification often lack context, contain speculative impact
assessments, or include unnecessary formatting. Such reports increase
triage effort, waste maintainers' time and may be ignored.
Reports where the reporter has verified the issue and the proposed fix
typically meet quality standards. This documentation outlines specific
requirements for length, formatting, and impact evaluation to reduce
the effort needed to deal with these reports.
Willy Tarreau [Sat, 9 May 2026 09:47:54 +0000 (11:47 +0200)]
Documentation: security-bugs: explain what is and is not a security bug
The use of automated tools to find bugs in random locations of the kernel
induces a raise of security reports even if most of them should just be
reported as regular bugs. This patch is an attempt at drawing a line
between what qualifies as a security bug and what does not, hoping to
improve the situation and ease decision on the reporter's side.
It defers the enumeration to a new file, threat-model.rst, that tries
to enumerate various classes of issues that are and are not security
bugs. This should permit to more easily update this file for various
subsystem-specific rules without having to revisit the security bug
reporting guide.
Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Leon Romanovsky <leon@kernel.org> Suggested-by: Leon Romanovsky <leon@kernel.org> Suggested-by: Greg KH <gregkh@linuxfoundation.org> Reviewed-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260509094755.2838-3-w@1wt.eu>
Willy Tarreau [Sat, 9 May 2026 09:47:53 +0000 (11:47 +0200)]
Documentation: security-bugs: do not systematically Cc the security team
With the increase of automated reports, the security team is dealing
with way more messages than really needed. The reporting process works
well with most teams so there is no need to systematically involve the
security team in reports.
Let's suggest to keep it for small lists of recipients and new reporters
only. This should continue to cover the risk of lost messages while
reducing the volume from prolific reporters.
Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Leon Romanovsky <leon@kernel.org> Reviewed-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <20260509094755.2838-2-w@1wt.eu>
ACPI: PAD: xen: Check ACPI_COMPANION() against NULL
Every platform driver can be forced to match a device that doesn't match
its list of device IDs because of device_match_driver_override(), so
platform drivers that rely on the existence of a device's ACPI companion
object need to verify its presence.
Accordingly, add a requisite ACPI_COMPANION() check against NULL to the
Xen variant of the ACPI processor aggregator device (PAD) driver.
accel/qaic: Add overflow check to remap_pfn_range during mmap
The call to remap_pfn_range in qaic_gem_object_mmap is susceptible to
(re)mapping beyond the VMA if the BO is too large. This can cause use
after free issues when munmap() unmaps only the VMA region and not the
additional mappings. To prevent this, check the remaining size of the
VMA before remapping and truncate the remapped length if sg->length is
too large.
Reported-by: Lukas Maar <lukas.maar@tugraz.at> Fixes: ff13be830333 ("accel/qaic: Add datapath") Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
[jhugo: fix braces from checkpatch --strict] Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://patch.msgid.link/20260430193858.1178641-1-zachary.mckevitt@oss.qualcomm.com
Tomasz Pakuła [Sun, 10 May 2026 12:23:52 +0000 (14:23 +0200)]
HID: pidff: Fix integer overflow in pidff_rescale
Rescaling values close to the max (U16_MAX) temporarily creates values
that exceed the s32 range. This caused value overflow in case when, for
example, a periodic effect phase was higer than 180 degrees. In turn,
rescale function could return values outised of the logical range of the
HID field.
Fix by using 64 bit signed integer to store the value during calculation
but still return only 32 bit integer.
Closes: https://github.com/JacKeTUs/universal-pidff/issues/116 Fixes: 224ee88fe395 ("Input: add force feedback driver for PID devices") Cc: stable@vger.kernel.org Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
Xu Rao [Sat, 9 May 2026 08:21:32 +0000 (16:21 +0800)]
HID: i2c-hid: add reset quirk for BLTP7853 touchpad
The BLTP7853 I2C HID touchpad may fail to probe after reboot or
reprobe because reset completion is not signalled to the host. The
driver then waits for the reset-complete interrupt until it times out
and the device probe fails:
i2c_hid i2c-BLTP7853:00: failed to reset device.
i2c_hid i2c-BLTP7853:00: can't add hid device: -61
i2c_hid: probe of i2c-BLTP7853:00 failed with error -61
Add I2C_HID_QUIRK_NO_IRQ_AFTER_RESET for the device so i2c-hid does
not wait for a reset interrupt that may never arrive.
hid_input_report() is used in too many places to have a commit that
doesn't cross subsystem borders. Instead of changing the API, introduce
a new one when things matters in the transport layers:
- usbhid
- i2chid
This effectively revert to the old behavior for those two transport
layers.
commit 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing
bogus memset()") enforced the provided data to be at least the size of
the declared buffer in the report descriptor to prevent a buffer
overflow. However, we can try to be smarter by providing both the buffer
size and the data size, meaning that hid_report_raw_event() can make
better decision whether we should plaining reject the buffer (buffer
overflow attempt) or if we can safely memset it to 0 and pass it to the
rest of the stack.
Fixes: 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing bogus memset()") Cc: stable@vger.kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Acked-by: Johan Hovold <johan@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.com>
Myeonghun Pak [Fri, 24 Apr 2026 12:50:41 +0000 (21:50 +0900)]
HID: google: hammer: stop hardware on devres action failure
hammer_probe() starts the HID hardware before registering the devres
action that stops it. If devm_add_action() fails, probe returns an
error with the hardware still started because the cleanup action was
never registered and the driver's remove callback is not called after a
failed probe.
Use devm_add_action_or_reset() so the stop action runs immediately on
registration failure while preserving the existing devres-managed cleanup
path for later probe failures and remove.
Signed-off-by: Myeonghun Pak <mhun512@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
Sangyun Kim [Mon, 20 Apr 2026 05:13:18 +0000 (14:13 +0900)]
HID: appletb-kbd: run inactivity autodim from workqueues
The autodim code in hid-appletb-kbd takes backlight_device->ops_lock
via backlight_device_set_brightness() -> mutex_lock() from two
different atomic contexts:
* appletb_inactivity_timer() is a struct timer_list callback, so it
runs in softirq context. Every expiry triggers
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591
Call Trace:
<IRQ>
__might_resched
__mutex_lock
backlight_device_set_brightness
appletb_inactivity_timer
call_timer_fn
run_timer_softirq
* reset_inactivity_timer() is called from appletb_kbd_hid_event() and
appletb_kbd_inp_event(). On real USB hardware these run in
softirq/IRQ context (URB completion and input-event dispatch).
When the Touch Bar has already been dimmed or turned off, the
reset path calls backlight_device_set_brightness() directly to
restore brightness, producing the same warning.
Both call sites hit the same mutex_lock()-from-atomic bug. Fix them
together by moving the blocking work onto the system workqueue:
* Convert the inactivity timer from struct timer_list to
struct delayed_work; the callback (appletb_inactivity_work) now
runs in process context where mutex_lock() is legal.
* Add a dedicated struct work_struct restore_brightness_work and have
reset_inactivity_timer() schedule it instead of calling
backlight_device_set_brightness() directly.
Cancel both works synchronously during driver tear-down alongside the
existing backlight reference drop.
The semantics are unchanged (same delays, same state transitions on
dim, turn-off and user activity); only the execution context of the
sleeping call changes. The timer field and callback are renamed to
match their new type; reset_inactivity_timer() keeps its name because
it is invoked from input event paths that read naturally as "reset
the inactivity timer".
Fixes: 93a0fc489481 ("HID: hid-appletb-kbd: add support for automatic brightness control while using the touchbar") Cc: stable@vger.kernel.org Signed-off-by: Sangyun Kim <sangyun.kim@snu.ac.kr> Signed-off-by: Jiri Kosina <jkosina@suse.com>
Sangyun Kim [Mon, 20 Apr 2026 05:13:17 +0000 (14:13 +0900)]
HID: appletb-kbd: fix UAF in inactivity-timer cleanup path
Commit 38224c472a03 ("HID: appletb-kbd: fix slab use-after-free bug in
appletb_kbd_probe") added timer_delete_sync(&kbd->inactivity_timer) to
both the probe close_hw error path and appletb_kbd_remove(), but the
way it was wired in left the inactivity timer reachable during driver
tear-down via two distinct windows.
Window A -- put_device() before timer_delete_sync():
The inactivity_timer softirq reads kbd->backlight_dev and calls
backlight_device_set_brightness() -> mutex_lock(&ops_lock). If a
concurrent hid_appletb_bl unbind drops the last devm reference
between these two calls, the backlight_device is freed and the
mutex_lock() touches freed memory.
Window B -- backlight cleanup before hid_hw_stop():
if (kbd->backlight_dev) {
timer_delete_sync(...);
put_device(...);
}
hid_hw_close(hdev);
hid_hw_stop(hdev);
Even after Window A is closed, hid_hw_close()/hid_hw_stop() still run
afterwards, so a late ".event" callback from the HID core (USB URB
completion on real Apple hardware) can arrive after
timer_delete_sync() drained the softirq but before put_device() drops
the reference. That callback reaches reset_inactivity_timer(), which
calls mod_timer() and re-arms the timer. The freshly re-armed timer
can then fire on the about-to-be-freed backlight_device.
Both windows produce the same KASAN slab-use-after-free:
BUG: KASAN: slab-use-after-free in __mutex_lock+0x1aab/0x21c0
Read of size 8 at addr ffff88803ee9a108 by task swapper/0/0
Call Trace:
<IRQ>
__mutex_lock
backlight_device_set_brightness
appletb_inactivity_timer
call_timer_fn
run_timer_softirq
handle_softirqs
Allocated by task N:
devm_backlight_device_register
appletb_bl_probe
Freed by task M:
(concurrent hid_appletb_bl unbind path)
Close both windows at once by reworking the tear-down in
appletb_kbd_remove() and in the probe close_hw error path so that
1) hid_hw_close()/hid_hw_stop() run before the backlight cleanup,
guaranteeing no further .event callback can fire and re-arm the
timer, and
2) inside the "if (kbd->backlight_dev)" block, timer_delete_sync()
runs before put_device(), so the softirq is drained before the
final reference is dropped.
Fixes: 38224c472a03 ("HID: appletb-kbd: fix slab use-after-free bug in appletb_kbd_probe") Cc: stable@vger.kernel.org Signed-off-by: Sangyun Kim <sangyun.kim@snu.ac.kr> Signed-off-by: Jiri Kosina <jkosina@suse.com>
T.J. Mercier [Fri, 17 Apr 2026 15:47:02 +0000 (08:47 -0700)]
HID: playstation: Clamp num_touch_reports
A device would never lie about the number of touch reports would it?
If it does the loop in dualshock4_parse_report will read off the end of
the touch_reports array, up to about 2 KiB for the maximum number of 256
loop iteraions. The data that is read is emitted via evdev if the
DS4_TOUCH_POINT_INACTIVE bit happens to be set. Protect against this by
clamping the num_touch_reports value provided by the device to the
maximum size of the touch_reports array.
Lee Jones [Thu, 16 Apr 2026 13:16:54 +0000 (14:16 +0100)]
HID: magicmouse: Prevent out-of-bounds (OOB) read during DOUBLE_REPORT_ID
It is currently possible for a malicious or misconfigured USB device to
cause an out-of-bounds (OOB) read when submitting reports using
DOUBLE_REPORT_ID by specifying a large report length and providing a
smaller one.
Let's prevent that by comparing the specified report length with the
actual size of the data read in from userspace. If the actual data
length ends up being smaller than specified, we'll politely warn the
user and prevent any further processing.
Signed-off-by: Lee Jones <lee@kernel.org> Reviewed-by: Günther Noack <gnoack@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
HID: mcp2221: fix OOB write in mcp2221_raw_event()
mcp2221_raw_event() copies device-supplied data into mcp->rxbuf at
offset rxbuf_idx without checking that the copy fits within the
destination buffer. A device responding with up to 60 bytes to a
small I2C/SMBus read can overflow the buffer.
Add a rxbuf_size field to struct mcp2221, set it alongside rxbuf in
mcp_i2c_smbus_read(), and check rxbuf_idx + data[3] <= rxbuf_size
before the memcpy.
Thomas Huth [Tue, 21 Apr 2026 14:27:01 +0000 (16:27 +0200)]
xen/arm: Replace __ASSEMBLY__ with __ASSEMBLER__ in interface.h
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembly code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This can be very confusing when switching between userspace
and kernelspace coding, or when dealing with uapi headers that
rather should use __ASSEMBLER__ instead. So let's standardize now
on the __ASSEMBLER__ macro that is provided by the compilers.
Sungwoo Kim [Tue, 12 May 2026 05:09:29 +0000 (01:09 -0400)]
block: bio-integrity: Fix null-ptr-deref in bio_integrity_map_user()
pin_user_pages_fast() can partially succeed and return the number of
pages that were actually pinned. However, the bio_integrity_map_user()
does not handle this partial pinning. This leads to a general protection
fault since bvec_from_pages() dereferences an unpinned page address,
which is 0.
To fix this, add a check to verify that all requested memory is pinned.
If partial pinning occurs, unpin the memory and return -EFAULT.
Kernel Oops:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 UID: 0 PID: 1061 Comm: nvme-passthroug Not tainted 7.0.0-11783-g90957f9314e8-dirty #16 PREEMPT(lazy)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
RIP: 0010:bio_integrity_map_user.cold+0x1b0/0x9d6
Fixes: 492c5d455969 ("block: bio-integrity: directly map user buffers") Acked-by: Chao Shi <cshi008@fiu.edu> Acked-by: Weidong Zhu <weizhu@fiu.edu> Acked-by: Dave Tian <daveti@purdue.edu> Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://github.com/linux-blktests/blktests/pull/244 Link: https://patch.msgid.link/20260512050929.541397-2-iam@sung-woo.kim Signed-off-by: Jens Axboe <axboe@kernel.dk>
Lukas Bulwahn [Thu, 5 Feb 2026 08:11:31 +0000 (09:11 +0100)]
HID: quirks: really enable the intended work around for appledisplay
Commit c7fabe4ad921 ("HID: quirks: work around VID/PID conflict for
appledisplay") intends to add a quirk for kernels built with Apple Cinema
Display support, but it refers to the non-existing config option
CONFIG_APPLEDISPLAY, whereas the config option for Apple Cinema Display
support is named CONFIG_USB_APPLEDISPLAY.
Refer to the intended config option CONFIG_USB_APPLEDISPLAY in the ifdef
directive.
Fixes: c7fabe4ad921 ("HID: quirks: work around VID/PID conflict for appledisplay") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
Casey Chen [Mon, 11 May 2026 21:22:30 +0000 (15:22 -0600)]
block: recompute nr_integrity_segments in blk_insert_cloned_request
blk_insert_cloned_request() already recomputes nr_phys_segments
against the bottom queue, because "the queue settings related to
segment counting may differ from the original queue." The exact same
reasoning applies to integrity segments: a stacked driver's underlying
queue can have tighter virt_boundary_mask, seg_boundary_mask, or
max_segment_size than the top queue, in which case
blk_rq_count_integrity_sg() against the bottom queue produces a
different count than the cached rq->nr_integrity_segments inherited
from the source request by blk_rq_prep_clone().
When the cached count is lower than the bottom queue's actual count,
blk_rq_map_integrity_sg() trips
BUG_ON(segments > rq->nr_integrity_segments);
on dispatch. The same families of stacked setups that motivated the
existing nr_phys_segments recompute -- dm-multipath fanning out to
nvme-rdma in particular -- can produce this.
Mirror the nr_phys_segments handling: when the request carries
integrity, recompute nr_integrity_segments against the bottom queue
and reject the request if it exceeds the bottom queue's
max_integrity_segments. blk_rq_count_integrity_sg() and
queue_max_integrity_segments() are both already available via
<linux/blk-integrity.h>, which blk-mq.c includes.
This closes a latent gap in the stacking contract and brings the
integrity-segment accounting in line with the existing
phys-segment accounting.
David Carlier [Mon, 11 May 2026 21:51:51 +0000 (22:51 +0100)]
block: don't overwrite bip_vcnt in bio_integrity_copy_user()
bio_integrity_add_page() already sets bip_vcnt to 1 for the bounce
segment. Overwriting it with nr_vecs breaks bip_vcnt <= bip_max_vcnt
on WRITE (bip_max_vcnt is 1), so the gap-merge checks in block/blk.h
read past the bip_vec[] flex array. On READ the read is in bounds
but lands on a saved user bvec instead of the bounce.
The line was added for split propagation, but bio_integrity_clone()
doesn't copy bip_vcnt and BIP_CLONE_FLAGS excludes BIP_COPY_USER.
Oliver Neukum [Tue, 3 Mar 2026 09:48:54 +0000 (10:48 +0100)]
HID: hid-sjoy: race between init and usage
The driver uses an initial IO to set the device to a default
state. That initialization is currently being done after the device
node has been created. That means that the single buffer used
for output can be altered while IO is in progress.
Move the intialization before announcement to user space.
Fixes: fac733f029251 ("HID: force feedback support for SmartJoy PLUS PS2/USB adapter") Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
Steve French [Tue, 12 May 2026 02:55:23 +0000 (21:55 -0500)]
SMB3.1.1: add missing QUERY_DIR info levels
New Infolevels for QUERY_DIR (and QUERY_INFO) levels 78 through 81 are
now being used by Windows clients and were added to the documentation.
Add defines for them (and correct some typos in documentation). See
MS-SMB2 2.2.33 and MS-FSCC 2.4
Signed-off-by: Steve French <stfrench@microsoft.com>
Paolo Abeni [Tue, 12 May 2026 14:15:03 +0000 (16:15 +0200)]
Merge branch 'net-shaper-fix-various-minor-bugs'
Jakub Kicinski says:
====================
net: shaper: fix various minor bugs
Fix various minor bugs in the net shaper API.
First 2 patches deal with ordering issues around inserting
and publishing new shapers. Shapers are inserted "tentatively"
and marked valid only after HW op succeeded, this used to
be slightly racy.
Only other patch of note is patch 8. We want to add a Netlink
policy check on the handle ID. This necessitates patch 7.
Jakub Kicinski [Sun, 10 May 2026 19:29:04 +0000 (12:29 -0700)]
net: shaper: reject QUEUE scope handle with missing id
net_shaper_parse_handle() does not enforce that the user provides
the handle ID. For NODE the ID defaults to UNSPEC for all other
cases it defaults to 0.
For NETDEV 0 is the only option. For QUEUE defaulting to 0 makes
less intuitive sense. Specifically because the behavior should
(IMHO) be the same for all cases where there may be more than
one ID (QUEUE and NODE).
We should either document this as intentional or reject.
I picked the latter with no strong conviction.