Linus Torvalds [Sun, 24 May 2026 17:48:55 +0000 (10:48 -0700)]
Merge tag 'core-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects fix from Ingo Molnar::
- Fix debugobjects regression on -rt kernels: don't fill the pool
(which uses a coarse lock) if ->pi_blocked_on, because that messes up
the priority inheritance of callers
* tag 'core-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Do not fill_pool() if pi_blocked_on
Linus Torvalds [Sun, 24 May 2026 17:37:55 +0000 (10:37 -0700)]
Merge tag 'hwmon-for-v7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- adm1266: Various fixes from Abdurrahman Hussain
The fixed issues were reported by Sashiko as part of a code review of
a functional change in the driver.
- lenovo-ec-sensors: Convert to devm_request_region() to fix
release_region cleanup, and fix EC "MCHP" signature validation logic,
from Kean Ren
* tag 'hwmon-for-v7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (pmbus/adm1266) serialize sequencer_state debugfs read with pmbus_lock
hwmon: (pmbus/adm1266) serialize NVMEM blackbox read with pmbus_lock
hwmon: (pmbus/adm1266) serialize GPIO PMBus accesses with pmbus_lock
hwmon: (pmbus/adm1266) register the nvmem device after pmbus_do_probe()
hwmon: (pmbus/adm1266) register the gpio_chip after pmbus_do_probe()
hwmon: (pmbus/adm1266) reject short block-read responses in the GPIO accessors
hwmon: (pmbus/adm1266) don't clobber GPIO bits before PDIO read in get_multiple
hwmon: (pmbus/adm1266) cap PDIO scan in get_multiple at ADM1266_PDIO_NR
hwmon: (pmbus/adm1266) bounce blackbox records through a protocol-sized buffer
hwmon: (pmbus/adm1266) include adapter number in GPIO line label
hwmon: (pmbus/adm1266) include PEC byte in pmbus_block_xfer read buffer
hwmon: (pmbus/adm1266) reject implausible blackbox record_count
hwmon: (pmbus/adm1266) widen blackbox-info buffer to I2C_SMBUS_BLOCK_MAX
hwmon: (pmbus/adm1266) seed timestamp from the real-time clock
hwmon: (lenovo-ec-sensors): Fix EC "MCHP" signature validation logic
hwmon: (lenovo-ec-sensors): Convert to devm_request_region()
drm/msm: Restore second parameter name in purge() and evict()
After commit 3392291fc509 ("drm/msm: Fix shrinker deadlock"), all
supported versions of clang warn (or error with CONFIG_WERROR=y):
drivers/gpu/drm/msm/msm_gem_shrinker.c:105:58: error: omitting the parameter name in a function definition is a C23 extension [-Werror,-Wc23-extensions]
105 | purge(struct drm_gem_object *obj, struct ww_acquire_ctx *)
| ^
drivers/gpu/drm/msm/msm_gem_shrinker.c:117:58: error: omitting the parameter name in a function definition is a C23 extension [-Werror,-Wc23-extensions]
117 | evict(struct drm_gem_object *obj, struct ww_acquire_ctx *)
| ^
2 errors generated.
With older but supported versions of GCC, this is an unconditional hard error:
drivers/gpu/drm/msm/msm_gem_shrinker.c: In function 'purge':
drivers/gpu/drm/msm/msm_gem_shrinker.c:105:35: error: parameter name omitted
purge(struct drm_gem_object *obj, struct ww_acquire_ctx *)
^~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/msm/msm_gem_shrinker.c: In function 'evict':
drivers/gpu/drm/msm/msm_gem_shrinker.c:117:35: error: parameter name omitted
evict(struct drm_gem_object *obj, struct ww_acquire_ctx *)
^~~~~~~~~~~~~~~~~~~~~~~
Restore the parameter name to clear up the warnings, renaming it
"unused" to make it clear it is only needed to satisfy the prototype of
drm_gem_lru_scan().
Linus Torvalds [Sun, 24 May 2026 16:53:17 +0000 (09:53 -0700)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix bpf_throw() and global subprog combination (Kumar Kartikeya
Dwivedi)
- Fix out of bounds access in BPF interpreter (Yazhou Tang)
- Fix potential out of bounds access in inner per-cpu array map
(Guannan Wang)
- Reject NULL data/sig in bpf_verify_pkcs7_signature (KP Singh)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
libbpf: fix off-by-one in emit_signature_match jump offset
bpf: Reject NULL data/sig in bpf_verify_pkcs7_signature
selftests/bpf: Cover global subprog exception leaks
bpf: Check global subprog exception paths
bpf: make bpf_session_is_return() reference optional
bpf: Use array_map_meta_equal for percpu array inner map replacement
selftests/bpf: Add test for large offset bpf-to-bpf call
bpf: Fix s16 truncation for large bpf-to-bpf call offsets
bpf: Fix out-of-bounds read in bpf_patch_call_args()
The driver supports a new comp_mask: REQ_MASK_FIXED_QUE_ATTR.
The application sets this comp_mask bit in the CREATE_QP ureq
to indicate direct control of the QP. The driver goes through
the required processing for app allocated QPs (previous patches).
Only variable WQE mode is supported for these QPs.
This patch removes an unused comp_mask:
BNXT_RE_QP_REQ_MASK_VAR_WQE_SQ_SLOTS
RDMA/bnxt_re: Support doorbells for app allocated QPs
App allocated QPs can use a separate doorbell for each QP.
This doorbell region can be passed through a new driver specific
DBR_HANDLE attribute, during QP creation. When this attribute
is set, associate the QP with the given doorbell region.
While the QP holds a reference to the dbr, the dbr itself
cannot be destroyed and is rejected with EBUSY error.
RDMA/bnxt_re: Enhance dpi lifecycle logic in doorbell uapis
If the DPI is freed when the dbr object is freed, but if the
process has not unmapped the page yet, then the DPI slot could
get reallocated to another process while the original process
still has it mapped. To prevent this, save the DPI info in the
mmap entry during dbr allocation and free the DPI slot from
bnxt_re_mmap_free(), which enures that there are no references
to it.
This change is needed to support doorbell allocation to QPs
in the next patch.
RDMA/bnxt_re: Enhance dbr usecnt logic in doorbell uapis
The current logic in the doorbell cleanup function is not
sufficient for a change in a subsequent patch, that fails
doorbell remove operation in some conditions. The cleanup
should facilitate freeing of the dbr object when the caller
may not retry the teardown operation (implicit teardown:
process-exit/driver-removal).
Extend this counter to use kref mechanism so that the dbr
object gets freed (via kref callback) when there are no more
references to it, rather than directly freeing it in the
cleanup uapi.
RDMA/bnxt_re: Update msn table size for app allocated QPs
For app allocated QPs, the driver shouldn't use slots/round-up logic
to compute the msn table size. The application handles this logic
and computes 'sq_npsn' and passes it to the driver using a new uapi
parameter.
The umem changes for CQ added a helper - bnxt_re_setup_sginfo().
Use the same helper for QP creation since we support only 4K
pages for QP ring memory too.
Add a new helper function bnxt_re_get_psn_bytes() to improve
readability as this code will be updated in subsequent patches.
For the above scenarios, if the call_rcu_tasks() is not called again
afterward, the rcu_tasks_kthread will not have a chance to be wakeup,
the test_rcu_tasks_callback() will never be called, the boot-time tests
failed can happen, this commit therefore check havekthread variable, if
it's false and the rtpcp->cblist is empty, set needwake variable is true,
if the rtp->kthread_ptr exist, the rtpcp->rtp_irq_work can be queued to
wakeup rcu_tasks_kthread.
Currently, rcu_normal_wake_from_gp is only enabled by default
on small systems(<= 16 CPUs) or when a user explicitly set it
enabled.
Introduce an adaptive latching mechanism:
* Track the number of in-flight synchronize_rcu() requests
using a new rcu_sr_normal_count counter;
* If the count reaches/exceeds RCU_SR_NORMAL_LATCH_THR(64),
it sets the rcu_sr_normal_latched, reverting new requests
onto the scaled wait_rcu_gp() path;
* The latch is cleared only when the pending requests are fully
drained(nr == 0);
* Enables rcu_normal_wake_from_gp by default for all systems,
relying on this dynamic throttling instead of static CPU
limits.
Testing(synthetic flood workload):
* Kernel version: 6.19.0-rc6
* Number of CPUs: 1536
* 60K concurrent synchronize_rcu() calls
Perf(cycles, system-wide):
total cycles: 932020263832
rcu_sr_normal_add_req(): 2650282811 cycles(~0.28%)
rcu: Document rcu_access_pointer() feeding into cmpxchg()
This commit documents the rcu_access_pointer() use case for fetching the
old value of an RCU-protected pointer within a lockless updater for use
by an atomic cmpxchg() operation.
Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reported-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
rcu: Simplify param_set_next_fqs_jiffies() by applying clamp_val()
This commit replaces a nested ?: sequence with clamp_val(). This does
not reduce the number of lines of code, but it does simplify the line
that it modifies.
Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
This commit replaces a nested ?: sequence with clamp(). This does not
reduce the number of lines of code, but it does simplify the line that
it modifies.
Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
checkpatch: Undeprecate rcu_read_lock_trace() and rcu_read_unlock_trace()
It turns out that there are BPF use cases that rely on nesting RCU
Tasks Trace readers. These use cases are well-served by the old
rcu_read_lock_trace() and rcu_read_unlock_trace() functions that maintain
a nesting counter in the task_struct structure. But these use cases incur
a performance penalty when using the shiny new rcu_read_lock_tasks_trace()
and rcu_read_unlock_tasks_trace() functions, which nest in the same way
that SRCU does.
This means that rcu_read_lock_trace() and rcu_read_unlock_trace()
will be with us for some time. Therefore, remove the checkpatch.pl
deprecation.
Also, the rcu_read_lock_tasks_trace() and rcu_read_unlock_tasks_trace()
functions are intended for use only by BPF. Therefore, add them to
the list of functions that checkpatch complains about outside of BPF
(and of course, RCU).
Reported-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
torture: Add torture_sched_set_normal() for user-specified nice values
This new torture_sched_set_normal() function clamps the nice value at
the MIN_NICE..MAX_NICE limits, splatting it these limits are exceeded.
It then invokes sched_set_normal() to set the new value. This prevents
more difficult-to-debug failures within the scheduler.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Currently, rcutorture bypasses lazy RCU by using call_rcu_hurry().
This works, avoiding the dreaded rtort_pipe_count WARN(), but fails to
fully test lazy RCU. The rtort_pipe_count WARN() splats because lazy RCU
could delay the start of an RCU grace period for a full stutter period,
which defaults to only three seconds.
This commit therefore reverts the call_rcu_hurry() instances
back to call_rcu(), but, in kernels built with CONFIG_RCU_LAZY=y,
queues a workqueue handler just before the call to stutter_wait() in
rcu_torture_writer(). This workqueue handler invokes rcu_barrier(),
which motivates any lingering lazy callbacks, thus avoiding the splat.
Reported-by: Saravana Kannan <saravanak@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Nicholas Piggin [Thu, 21 May 2026 17:06:33 +0000 (10:06 -0700)]
dt-bindings: iommu: riscv: Add bindings for Tenstorrent RISC-V IOMMU
Extend the binding to cover details specific to the Tenstorrent RISC-V
IOMMU. In particular, a second register range is added which contains
M-privileged registers, e.g., PMAs and PMPs.
The RISC-V spec S-privileged registers remain in the first register
range and are compatible with "riscv,iommu" so the Linux driver does not
notice any difference, but the binding will be used by OpenSBI and
potentially other M-mode software.
Reviewed-by: Joel Stanley <joel@jms.id.au> Acked-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[fustini: fix dt_binding_check errors] Signed-off-by: Drew Fustini <fustini@kernel.org>
Linus Torvalds [Sat, 23 May 2026 23:59:02 +0000 (16:59 -0700)]
Merge tag 'v7.1-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- fix for creating tmpfiles
- fix durable reconnect error path
- validate SID in security descriptor when inheriting DACL
* tag 'v7.1-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
smb/server: promote S_DEL_ON_CLS to S_DEL_PENDING when close
ksmbd: validate SID in parent security descriptor during ACL inheritance
ksmbd: fix durable reconnect error path file lifetime
Linus Torvalds [Sat, 23 May 2026 23:54:48 +0000 (16:54 -0700)]
Merge tag 'for-7.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A batch of fixes to simple quotas:
- add conditional rescheduling point not dependent on the lock during
inode iterations to avoid delays with PREEMPT_NONE enabled
- fix subvolume deletion so it does not break the squota invariants
- properly handle enabling squota, tracking extents in the initial
transaction
- catch and warn about underflows, clamp to zero to avoid further
problems
And one fix to inode size handling:
- fix handling of preallocated extents beyond i_size when not using
the no-holes feature"
* tag 'for-7.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: swallow btrfs_record_squota_delta() ENOENT
btrfs: clamp to avoid squota underflow
btrfs: fix squota accounting during enable generation
btrfs: check for subvolume before deleting squota qgroup
btrfs: always drop root->inodes lock before cond_resched()
btrfs: mark file extent range dirty after converting prealloc extents
Michael Neuling [Fri, 10 Apr 2026 02:49:59 +0000 (21:49 -0500)]
riscv: dts: tenstorrent: Add PMU node to blackhole for Linux perf support
Add a riscv,pmu device tree node with SBI PMU event mappings for the
SiFive X280 hardware performance counters. This enables OpenSBI to
expose the SBI PMU extension, allowing Linux perf to use the 4
programmable counters (mhpmcounter3-6) across 3 event classes:
instruction commit, microarchitectural, and memory system events.
Event encodings are derived from the SiFive Tenstorrent X280 MC Manual
(21G3.04.00) Table 13, section 3.10.5.
Linus Torvalds [Sat, 23 May 2026 23:51:22 +0000 (16:51 -0700)]
Merge tag 'xfs-fixes-7.1-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fix from Carlos Maiolino:
"A single fix for a race in xfs buffer cache which may lead to
filesystem shutdown due to inconsistent metadata if the buffer
lookup happens to find an old dead buffer still in the cache"
* tag 'xfs-fixes-7.1-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix a buffer lookup against removal race
Li Guan [Wed, 13 May 2026 18:07:21 +0000 (02:07 +0800)]
perf riscv: Fix discarded const qualifier in _get_field()
The assignment of strrchr() return values to non-const char * variables
triggers a -Werror=discarded-qualifiers warning when building with GCC
14.
This happens because in newer glibc versions, strrchr() returns a 'const
char *' if the input string is const.
Properly declare 'line2' and 'nl' as const char * to match the glibc
function signature and ensure type safety. This avoids the need for
explicit type casting and aligns with the design pattern of not
modifying read-only memory in the perf tool.
Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Li Guan <guanli.oerv@isrc.iscas.ac.cn> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <pjw@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ronaldo Nunez [Fri, 22 May 2026 19:13:48 +0000 (16:13 -0300)]
pwm: imx27: Fix variable truncation in .apply()
Fix a variable truncation when calculating period in microseconds as
part of the solution for the ERR051198 in .apply() callback.
Example scenario:
- Period of 3us (PWMPR = 196 and prescaler = 1)
- Expected value in tmp: 198000000000 (NSEC_PER_SEC * (196 + 2) * 1)
- Actual value is 431504384 (truncation to u32)
Signed-off-by: Ronaldo Nunez <rnunez@baylibre.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260522191348.6227-1-rnunez@baylibre.com Fixes: a25351e4c774 ("pwm: imx27: Workaround of the pwm output bug when decrease the duty cycle") Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
PWMCON register Bit(3) is used to configure whether to pre divide
the clock source. Most revisions clear this bit to disable frequency
division. However, mt7628 needs to set this bit. Hence, we introduce
a new clksel_fixup flag to correctly configure the clock source for
mt7628.
Linus Torvalds [Sat, 23 May 2026 16:21:08 +0000 (09:21 -0700)]
Merge tag 'nios2_updates_for_v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux
Pull nios2 fixes from Dinh Nguyen:
- Implement _THIS_IP_ for inline asm
- Add Simon Schuster as a maintainer and mark the NIOS2 as Supported
* tag 'nios2_updates_for_v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
nios2: Implement _THIS_IP_ using inline asm
MAINTAINERS: arch/nios2: Add Simon Schuster as co-maintainer
Linus Torvalds [Sat, 23 May 2026 16:13:00 +0000 (09:13 -0700)]
Merge tag 'loongarch-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Rework KASLR to avoid initrd overlap, remove some unused code to avoid
a build warning, fix some bugs in kprobes and KVM"
* tag 'loongarch-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Move some variable declarations to paravirt.h
LoongArch: kprobes: Fix handling of fatal unrecoverable recursions
LoongArch: kprobes: Use larch_insn_text_copy() to patch instructions
LoongArch: Remove unused code to avoid build warning
LoongArch: Avoid initrd overlap during kernel relocation
LoongArch: Skip relocation-time KASLR if already applied
efi/loongarch: Randomize kernel preferred address for KASLR
KP Singh [Fri, 22 May 2026 21:53:36 +0000 (23:53 +0200)]
libbpf: fix off-by-one in emit_signature_match jump offset
The offset for the cleanup-label jump is computed before the MOV R7
instruction is emitted, but the JMP lands after it. Account for the
extra insn in the offset calculation (-2 instead of -1). Drop the
redundant self-loop in the else branch; gen->error = -ERANGE already
marks the generation as failed.
Linus Torvalds [Sat, 23 May 2026 14:49:05 +0000 (07:49 -0700)]
Merge tag 'driver-core-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Remove the software node on platform device release(); without this,
the software node remains registered after the device is gone and a
subsequent platform_device_register_full() reusing the same node
fails with -EBUSY
- In sysfs_update_group(), do not remove a pre-existing directory when
create_files() fails; the previous code would silently destroy a
sysfs group that the caller did not create
- Set fwnode->secondary to NULL in fwnode_init() to avoid dereferencing
uninitialized memory (e.g. in dev_to_swnode()) when the firmware node
is allocated on the stack or via a non-zeroing allocator
* tag 'driver-core-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
device property: set fwnode->secondary to NULL in fwnode_init()
sysfs: don't remove existing directory on update failure
driver core: platform: remove software node on release()
Linus Torvalds [Sat, 23 May 2026 14:17:27 +0000 (07:17 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
- syzbot triggred crash in rxe due to concurrent plug/unplug
- Possible non-zero'd memory exposed to userspace in bnxt_re
- Malicous 'magic packet' with SIW causes a buffer overflow
- Tighten the new uAPI validation code to not crash in debugging prints
and have the right module dependencies in drivers
- mana was missing the max_msg_sz report to userspace
- UAF in rtrs on an error path
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/rtrs: Fix use-after-free in path file creation cleanup
RDMA/mana_ib: Report max_msg_sz in mana_ib_query_port
RDMA/core: Do not read wild stack memory in uverbs_get_handler_fn()
RDMA/core: Move the _ib_copy_validate_udata* functions to ib_core_uverbs
RDMA/siw: Reject MPA FPDU length underflow before signed receive math
RDMA/bnxt_re: zero shared page before exposing to userspace
selftests/rdma: explicitly skip tests when required modules are missing
RDMA/nldev: Add mutual exclusion in nldev_dellink()
Sascha Bischoff [Wed, 20 May 2026 09:19:49 +0000 (10:19 +0100)]
KVM: arm64: Fix arch timer interrupts for GICv3-on-GICv5 guests
When running on a GICv5 host, we push an arch-timer-specific interrupt
domain for the timer interrupts. This interrupt domain is used to mask
the host interrupt when a GICv5 guest is running. However, this
interrupt domain is still in place when running with a GICv3 guest on
GICv5 hardware. The result is that some interrupt state changes are
not correctly propragated to the host irqchip driver for legacy
guests.
Explicitly pass irqchip state changes though to the host irqchip
driver when running a GICv3-based guest on a GICv5 host. This bypasses
all masking, and thereby operates just as a native GICv3 guest would,
with the exception of having an additional irq domain in the
hierarchy.
Fixes: 9491c63b6cd7 ("KVM: arm64: gic-v5: Enlighten arch timer for GICv5") Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://lore.kernel.org/r/20260520091949.542365-19-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
Sascha Bischoff [Wed, 20 May 2026 09:19:48 +0000 (10:19 +0100)]
irqchip/gic-v5: Immediately exec priority drop following activate
With GICv5 an interrupt of equal or lower priority cannot be signalled
until there has been a priority drop. This is done via the GIC CDEOI
system instruction. Once this has been executed, the hardware is able
to signal the next interrupt if there is one.
As all interrupts are programmed to have the same priority, no new
interrupts can be signalled until the priority drop has happened. This
can cause issues when, for example, an interrupt remains active while
a long running process takes place, such as when injecting a physical
interrupt into a guest VM in software.
The GICv5 driver has so far done the priority drop as part of
irq_eoi(), i.e., at the same time as deactivating the interrupt. This
means that any long running process (or VM) could block incoming
interrupts, effectively causing a denial of service for all other
interrupts.
Rather than doing the EOI as part of irq_eoi() (which the name would
suggest would be a good place for it), move it to happen immediately
after acknowledging an interrupt in the main GICv5 interrupt
handler. The deactivation of interrupts (GIC CDDI) remains implemented
as part of irq_eoi(), which means that the same interrupt cannot be
signalled a second time until deactivated by software.
Sascha Bischoff [Wed, 20 May 2026 09:19:47 +0000 (10:19 +0100)]
Documentation: KVM: Clarify that PMU_V3_IRQ IntID requirements for GICv5
When running a GICv5-based guest, the PMU must use PPI 23. This,
however, must be communicated via the
KVM_ARM_VCPU_PMU_V3_CTRL->KVM_ARM_VCPU_PMU_V3_IRQ ioctl as a full
GICv5-style Interrupt ID. That is, 0x20000017. Optionally, the whole
ioctl can be skipped for GICv5.
This was previously not clearly documented, so bump the documentation
accordingly.
Sascha Bischoff [Wed, 20 May 2026 09:19:45 +0000 (10:19 +0100)]
KVM: arm64: selftests: Improve error handling for GICv5 PPI selftest
Cases where the KVM_RUN ioctl returned an error were wrongly reported
as incorrect ucalls. Furthermore, potential failures when calling
KVM_IRQ_LINE were being hidden.
Improve the error handling to correctly propagate the error in both
cases.
Sascha Bischoff [Wed, 20 May 2026 09:19:42 +0000 (10:19 +0100)]
KVM: arm64: vgic-v5: Atomically assign bits to PPI DVI bitmap
For GICv5 guests we make use of the DVI mechanism for PPIs where
possible. When mapping a virtual irq to a physical one for a GICv5
guest, the corresponding bit in the DVI bitmap is set. When unmapping,
said bit is cleared again. The key user of this mechanism is the arch
timer.
The existing code used the non-atomic __assign_bit() rather than doing
the update atomically. This could technically result in losing state
if a second PPI's DVI bit were being manipulated concurrently. Each
individual bit within the DVI bitmap is guarded using
vgic_irq->irq_lock, but there's no locking for the overall
bitmap. Therefore, switch to using the atomic assign_bit() function
instead.
Sascha Bischoff [Wed, 20 May 2026 09:19:41 +0000 (10:19 +0100)]
KVM: arm64: vgic-v5: Add missing trap handing for NV triage
As things stand, there is no support for Nested Virt with GICv5 guests
yet. However, this is coming and therefore we need to be able to
correctly triage the traps when running with NV.
Add the missing fgtreg lookups required for that to
triage_sysreg_trap(). These are specific to the FGT regs added as part
of GICv5:
* ICH_HFGRTR_EL2
* ICH_HFGWTR_EL2
* ICH_HFGITR_EL2
Randy Dunlap [Thu, 21 May 2026 19:14:57 +0000 (12:14 -0700)]
ARM: zte: clean up zx297520v3 doc. warnings
Fix multiple documentation build warnings.
Improve punctuation and formatting of the rendered output.
Documentation/arch/arm/zte/zx297520v3.rst:66: WARNING: Title underline too short.
3. Building for built-in U-Boot
--------------------------- [docutils]
Documentation/arch/arm/zte/zx297520v3.rst:90: WARNING: Enumerated list ends without a blank line; unexpected unindent. [docutils]
Documentation/arch/arm/zte/zx297520v3.rst:116: WARNING: Inline literal start-string without end-string. [docutils]
Documentation/arch/arm/zte/zx297520v3.rst:137: ERROR: Unexpected indentation. [docutils]
Documentation/arch/arm/zte/zx297520v3.rst:138: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
Documentation/arch/arm/zte/zx297520v3.rst:164: WARNING: Inline literal start-string without end-string. [docutils]
Documentation/arch/arm/zte/zx297520v3.rst:164: WARNING: Inline interpreted text or phrase reference start-string without end-string. [docutils]
Documentation/arch/arm/zte/zx297520v3.rst:7: WARNING: Document or section may not begin with a transition. [docutils]
Fixes: 220ae5d36dba ("ARM: zte: Add zx297520v3 platform support") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Stefan Dösinger <stefandoesinger@gmail.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Stefan Dösinger <stefandoesinger@gmail.com>
Adam Crosser [Fri, 24 Apr 2026 12:37:47 +0000 (19:37 +0700)]
gpib: fix double decrement of descriptor_busy in command_ioctl()
commit d1857f8296dc ("gpib: fix use-after-free in IO ioctl handlers")
introduced a descriptor_busy reference counter to pin struct
gpib_descriptor across IO ioctl operations. In command_ioctl(), the
error path inside the loop decrements descriptor_busy and breaks, but
execution then falls through to the unconditional decrement after the
loop, underflowing the counter to -1.
This re-enables the use-after-free that the original fix was meant to
prevent: a concurrent close_dev_ioctl() sees descriptor_busy == 0 on
an actively-used descriptor and frees it.
Remove the early decrement from the error path. The post-loop
decrement already handles all exit paths, matching the correct pattern
used in read_ioctl() and write_ioctl().
gpib: agilent_82357a: don't check a NULL serial string
The agilent_82357a driver uses the USB device serial string for device
matching but does not verify that the string exists before passing it
to strcmp().
Verify that the device has a serial number before accessing it to avoid
triggering a NULL-pointer dereference with devices that don't provide
a serial number (iSerialNumber = 0).
Similar to commit aa79f996eb41 ("i2c: cp2615: fix serial string
NULL-deref at probe").
The applicom driver supports PCI Profibus cards from Applicom, later
acquired by Molex. It has severe coding style issues and has attracted
a number of bug and security fixes over the years, despite the fact
that no one appears to be using it. It was broken from at least the
beginning of Git history (Linux 2.6.12-rc2 in April 2005) until October
2008, when a fatal bug was fixed in commit bc20589bf1c6 ("applicom.c:
fix apparently-broken code in do_ac_read()"). In the commit message,
the author commented that no one they knew was able to test the change.
Since then, there have been no commits that indicate the driver is
being used. Later PCI and PCI-Express Applicom Profibus cards only
officially support Windows [1], and even the PCI-Express cards have
been discontinued [2]. Given all these factors, remove the driver to
reduce future maintenance workload.
char: dtlk: remove driver for ISA speech synthesizer card
The dtlk driver supports the RC Systems DoubleTalk PC ISA speech
synthesizer card. It has severe coding style issues and has only
received tree-wide fixes and drive-by cleanups in the entire Git
history (since Linux 2.6.12-rc2). The same hardware is supported by
drivers/accessibility/speakup for screen reader use, but that
implementation does not share any code with this driver. Given all of
these factors, it is likely the driver is entirely unused. Remove it to
reduce future maintenance workload.
The deassign path freed the irqfd while a shutdown work item was
already queued by EPOLLHUP (or vice versa), so the work item could
resurrect a dangling pointer through container_of().
Switch to the lifetime model used by KVM irqfds:
- Deassign/deinit only deactivate the irqfd: remove it from vm->irqfds
under irqfds_lock and queue the cleanup work.
- hsm_irqfd_shutdown_work() becomes the sole owner that unhooks the
eventfd waitqueue entry, drops the eventfd reference and frees the
irqfd.
- A new HSM_IRQFD_FLAG_SHUTDOWN bit guarded by test_and_set_bit()
ensures the cleanup work is queued at most once, no matter how many
of {EPOLLHUP, deassign, deinit} fire concurrently. This is safe to
call from the waitqueue callback, which runs with wqh->lock held and
IRQs disabled and therefore cannot take irqfds_lock.
- acrn_irqfd_deassign() flushes vm->irqfd_wq before returning so the
eventfd is fully detached on return. acrn_irqfd_deinit() deactivates
every irqfd, flushes the workqueue and only then destroys it, so no
path can queue_work() onto a torn-down workqueue.
- acrn_irqfd_assign() now installs the eventfd waitqueue entry and
publishes the irqfd to vm->irqfds under irqfds_lock, so the irqfd is
never visible to deassign/deinit before its waitqueue entry is in
place, and any EPOLLHUP that fires in the assign window queues
cleanup work that blocks on irqfds_lock until publication is done.
misc: pch_phub: Introduce an enum for device indentification
Instead of using magic constants give them names that make the code more
idiomatic. While touching the pci_device_id array, use named
initializers to assign .driver_data.
The two functions are unused since commit 34afa1d657d4
("misc/pch_phub.c: use generic power management") but the compiler
didn't warn about it because the same commit marked the functions as
__maybe_unsed.
The global nvram_mutex in drivers/char/nvram.c is redundant and unused,
and this triggers compiler warnings on some configurations.
All platform-specific nvram operations already provide their own internal
synchronization, meaning the wrapper-level mutex does not provide any
additional safety.
Remove the nvram_mutex definition along with all remaining lock/unlock
users across PPC32, x86, and m68k code paths, and rely entirely on the
per-architecture nvram implementations for locking.
sonypi: Check ACPI_COMPANION() against NULL at probe time
Every platform driver can be forced to match a device that doesn't match
its list of device IDs because of device_match_driver_override(), so
platform drivers that rely on the existence of a device's ACPI companion
object need to verify its presence.
Accordingly, add a requisite ACPI_COMPANION() check against NULL to the
sonypi driver.
Fixes: 7e488b0af021 ("sonypi: Convert ACPI driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/5087721.GXAFRqVoOG@rafael.j.wysocki Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
hpet: Check ACPI_COMPANION() against NULL at probe time
Every platform driver can be forced to match a device that doesn't match
its list of device IDs because of device_match_driver_override(), so
platform drivers that rely on the existence of a device's ACPI companion
object need to verify its presence.
Accordingly, add a requisite ACPI_COMPANION() check against NULL to the
hpet driver.
Fixes: 71f0a267346b ("hpet: Convert ACPI driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/4750803.LvFx2qVVIh@rafael.j.wysocki Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
misc: tifm: Use PCI_VDEVICE to initialize pci_device_id array
The PCI_VDEVICE macro allows to assign the first four members of
pci_device_id more idiomatic and compact.
Also drop trailing zeros in the list initializer that the compiler takes
care of then. The driver doesn't use neither .class, .class_mask nor
.driver_data, so it's fine to not assign these explicitly.
There are no changes to the compiled data; confirmed using an x86 and an
arm64 build.
misc: rtsx: Use named initializers for struct pci_device_id
Initializing structures using list initializers is harder to read than
using named initializers. Seeing the member name is more ideomatic and
easier to understand.
Use named initializers for the driver's pci_device_id array.
While at it also drop an explicit zero in the terminating array entry.
There are no changes to the compiled result of the array; verified with
builds for x86 and arm64.
James Kim [Sun, 3 May 2026 10:11:31 +0000 (19:11 +0900)]
char: tlclk: fix use-after-free in tlclk_cleanup()
This patch improves the module cleanup process in the tlclk driver to
prevent potential use-after-free and race conditions.
Currently, the file_operations structure does not specify the .owner
field, which could allow the module to be unloaded while user-space
processes are still interacting with the device. Additionally, the
tlclk_cleanup() function frees the alarm_events memory before ensuring
that blocked processes in the waitqueue are fully awakened and that the
switchover_timer has completed.
To address these cases, this patch:
- Sets '.owner = THIS_MODULE' in tlclk_fops to safely defer module
unloading while the device is in use.
- Updates tlclk_cleanup() to explicitly wake up all blocked readers
(wake_up_all), properly release hardware I/O regions, and safely
delete the timer (timer_delete_sync) prior to freeing memory.
Dave Penkler [Wed, 22 Apr 2026 07:48:07 +0000 (09:48 +0200)]
gpib: Suppress setting END on error from NI_USB dongle
The NI USB adapter sets the END bit in the status word when an error
occurs such as a read being interrupted by the setting of ATN. This
happens for example when a device clear is received from the
controller in charge during a read.
The common driver changes the error return to 0 whenever the END bit
is set in order to avoid errors such as timeout or interrupt to be
reported after the full message has actually been read. The behaviour
of the NI USB adapter in setting the END bit on errors was causing
actual errors (-EINTR, -ETIMEDOUT) not to be reported.
We avoid setting the END bit in the ni_usb_gpib driver when an error
is reported in error_code of the status from the adaptor.
Dave Penkler [Sat, 11 Apr 2026 17:25:11 +0000 (19:25 +0200)]
gpib; Add register and unregister calls
Register the driver for new 72130 based pci_xl board type with the
common driver on module initialisation.
Unregister the driver on registration error and module exit.
Alice Ryhl [Thu, 7 May 2026 11:07:47 +0000 (11:07 +0000)]
rust_binder: use lock_vma_under_rcu() in shrinker
The shrinker callback currently uses the mmap read trylock operation to
attempt to access the vma, but it's generally better to only lock the
vma instead of the whole mmap when you can.
When lock_vma_under_rcu() fails, there is no reason to lock the mmap
lock instead because it's already a trylock operation that is allowed to
fail.
Merge tag 'usb-serial-7.1-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB serial fixes for 7.1-rc5
Here are a number of fixes for memory corruption and information leaks
due to missing endpoint and transfer sanity checks dating back to
simpler times when we trusted our hardware.
Included are also a fix for a recently added modem device id entry and
some new modem devices ids.
All but the last five commits have been in linux-next and with no
reported issues.
* tag 'usb-serial-7.1-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: cypress_m8: validate interrupt packet headers
USB: serial: safe_serial: fix memory corruption with small endpoint
USB: serial: omninet: fix memory corruption with small endpoint
USB: serial: mxuport: fix memory corruption with small endpoint
USB: serial: cypress_m8: fix memory corruption with small endpoint
USB: serial: option: add missing RSVD(5) flag for Rolling RW135R-GL
USB: serial: option: add MeiG SRM813Q
USB: serial: mct_u232: fix missing interrupt-in transfer sanity check
USB: serial: mct_u232: fix memory corruption with small endpoint
USB: serial: keyspan: fix missing indat transfer sanity check
USB: serial: digi_acceleport: fix memory corruption with small endpoints
USB: serial: belkin_sa: validate interrupt status length
Abel Vesa [Fri, 15 May 2026 11:21:52 +0000 (14:21 +0300)]
pinctrl: qcom: eliza: Merge QUP1_SE4 lanes in groups
QUP1_SE4 uses GPIO36 and GPIO37 for two selectable lane pairs. The
current driver exposes lanes 0, 1, 2 and 3 as independent functions.
However, since these are usually configured in pairs in devicetree,
it makes more sense to merge them into groups.
So merge the per-lane functions into qup1_se4_01 and qup1_se4_23, and list
both GPIO36 and GPIO37 in each function group.
Abel Vesa [Fri, 15 May 2026 11:21:51 +0000 (14:21 +0300)]
dt-bindings: pinctrl: qcom,eliza-tlmm: Merge QUP1_SE4 lane functions
QUP1_SE4 uses GPIO36 and GPIO37 for two selectable lane pairs. The
previous split added one function name per lane. Since these are usually
configured in pairs in devicetree, it makes more sense to have them
grouped.
So replace the per-lane qup1_se4_l[0-3] names with names for the two
selectable pairs, qup1_se4_01 and qup1_se4_23.
Fixes: 1bd5c56253c5 ("dt-bindings: pinctrl: qcom,eliza-tlmm: Split QUP1_SE4 lanes") Suggested-by: Bjorn Andersson <andersson@kernel.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Linus Walleij <linusw@kernel.org>
Frank Li [Wed, 8 Apr 2026 05:07:01 +0000 (01:07 -0400)]
pinctrl: Add OF dependency for PINCTRL_GENERIC_MUX
Add an explicit OF dependency for PINCTRL_GENERIC_MUX to ensure the
generic mux support is only enabled when device tree is available.
Also fix the stub implementation of pinctrl_generic_to_map() by correcting
its last argument to match the non-stub prototype.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202604072013.aI84l57L-lkp@intel.com/ Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Linus Walleij <linusw@kernel.org>
Tejun Heo [Fri, 22 May 2026 17:22:16 +0000 (07:22 -1000)]
bpf/arena: Add bpf_arena_map_kern_vm_start() and bpf_prog_arena()
struct bpf_arena is opaque to callers outside arena.c. Add two helpers
for struct_ops subsystems that need to reach into an arena:
bpf_arena_map_kern_vm_start(struct bpf_map *map)
returns @map's kern_vm_start. A sched_ext follow-up needs this
to translate kern_va <-> uaddr.
bpf_prog_arena(struct bpf_prog *prog)
returns the bpf_map of the arena referenced by @prog (NULL if
@prog references no arena). The verifier enforces at most one
arena per program. Used by struct_ops callers that auto-discover
an arena from a member prog and need to take a map reference.
Tejun Heo [Fri, 22 May 2026 17:22:15 +0000 (07:22 -1000)]
bpf: Add bpf_struct_ops_for_each_prog()
Add a helper that walks the member progs of the struct_ops map
containing a given @kdata vmtable. struct_ops ->reg() callbacks (and
similar) sometimes need to inspect the loaded BPF programs, e.g. to
discover maps they reference via prog->aux->used_maps.
The implementation mirrors bpf_struct_ops_id(): container_of @kdata
to recover the bpf_struct_ops_map, then iterate st_map->links[i]->prog
for i in [0, funcs_cnt). Same access pattern, no new locking - by the
time ->reg() fires st_map is fully populated and stable.
A sched_ext follow-up walks the member progs of a cid-form scheduler's
struct_ops map, reads prog->aux->arena directly, and requires all member
progs to reference exactly one arena, without requiring the BPF program
to call a registration kfunc.
Tejun Heo [Fri, 22 May 2026 17:22:14 +0000 (07:22 -1000)]
bpf: Add sleepable variant of bpf_arena_alloc_pages for kernel callers
The existing kernel-side export of bpf_arena_alloc_pages is _non_sleepable
only - it's used by the verifier to inline the kfunc when the call site is
non-sleepable. There is no sleepable equivalent for kernel callers. The
kfunc bpf_arena_alloc_pages itself is BPF-only.
sched_ext needs sleepable kernel-side allocs for its arena pool init/grow
paths. Add bpf_arena_alloc_pages_sleepable() mirroring the _non_sleepable
wrapper but passing sleepable=true to arena_alloc_pages().
bpf: Recover arena kernel faults with scratch page
BPF arena usage is becoming more prevalent, but kernel <-> BPF communication
over arena memory is awkward today. Data has to be staged through a trusted
kernel pointer with extra code and copying on the BPF side. While reads
through arena pointers can use a fault-safe helper, writes don't have a good
solution. The in-line alternative would need instruction emulation or asm
fixup labels.
Enable direct kernel-side reads and writes within GUARD_SZ / 2 of any
handed-in arena pointer, without bounds checking. A per-arena scratch page
is installed by the arch fault path into empty arena kernel PTEs - x86 from
page_fault_oops() for not-present faults, arm64 from __do_kernel_fault() for
translation faults, both after the existing exception-table and KFENCE
handling. The faulting instruction retries and the access is also reported
through the program's BPF stream, preserving error reporting.
bpf_prog_find_from_stack() resolves the current BPF program (and its arena)
from the kernel stack - no new bpf_run_ctx state is added. Recovery covers
the 4 GiB arena plus the upper half-guard (GUARD_SZ / 2). The lower
half-guard is excluded because well-behaved kfuncs only access forward from
arena pointers. The kfunc-author contract - access at most GUARD_SZ / 2 past
a handed-in pointer - is documented in Documentation/bpf/kfuncs.rst.
The install is lock-free via ptep_try_set(). On race-loss the winning
installer's PTE is already valid, so the access retry succeeds. The arena
clear path uses ptep_get_and_clear() so installer and clearer race through
atomic accessors. No flush_tlb_kernel_range() afterwards. Stale "not mapped"
entries just cause one extra re-fault, cheaper than a global IPI on every
install.
Scratch exists only to keep the kernel from oopsing on an in-line arena
access. Its presence at a PTE means the BPF program has already
malfunctioned, and the violation is reported through the program's BPF
stream. The only requirement for behavior on a scratched PTE is that the
kernel doesn't crash. In particular, any user-side access through such a PTE
may segfault. The shared scratch page is freed once during map destruction.
BPF instruction faults continue to use the existing JIT exception-table
path. This patch changes only the kernel-text fault path. No UAPI flag is
added. The new behavior is the default.
v2: Use ptep_get_and_clear() in apply_range_clear_cb(). (David)
v3: Stub bpf_arena_handle_page_fault() for !CONFIG_BPF_SYSCALL. (lkp)
Tejun Heo [Fri, 22 May 2026 17:22:12 +0000 (07:22 -1000)]
mm: Add ptep_try_set() for lockless empty-slot installs
Add ptep_try_set(ptep, new_pte): atomically set *ptep to new_pte iff it is
currently pte_none(). Returns true on success, false if the slot was already
populated or the arch has no implementation.
The intended caller is the upcoming bpf_arena kernel-side fault recovery
path. The install runs from a page fault that can be nested under locks
held by the faulting kernel caller (e.g. a BPF program holding
raw_res_spin_lock_irqsave on its arena's spinlock), so trylock-and-retry
would A-A deadlock. Lock-free cmpxchg is the only viable option, which
constrains this helper to special kernel page tables where concurrent
writers cooperate via atomic accessors.
The generic version in <linux/pgtable.h> returns false. x86 and arm64
override with try_cmpxchg-based implementations on the underlying pteval.
Other architectures get the false stub - the callers there already fall
through to oops.
v2: Rename to ptep_try_set(). Tighten kerneldoc. (David, Alexei)
v3: Note that strict-zero cmpxchg is narrower than pte_none(). (Andrea)
Linus Walleij [Sat, 23 May 2026 08:46:52 +0000 (10:46 +0200)]
Merge tag 'renesas-pinctrl-for-v7.2-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into devel
pinctrl: renesas: Updates for v7.2
- Save/restore more registers during suspend/resume on the RZ/G2L and
RZ/V2H SoC families,
- Add support for the RZ/G3L (R9A08G046) SoC,
- Add support for pinconf-groups in debugfs on EMMA Mobile,
SH/R-Mobile, R-Car, RZ/G1, and RZ/G2 SoCs,
- Miscellaneous fixes and improvements.
Tina Zhang [Fri, 22 May 2026 04:00:14 +0000 (12:00 +0800)]
KVM: SVM: Disable AVIC IPI virtualization on Hygon Family 18h (erratum #1235)
Hygon Family 18h CPUs are derived from AMD Family 17h (Zen1) silicon and
share the same erratum #1235: hardware may read a stale IsRunning=1 bit
during ICR write emulation and silently fail to generate an
AVIC_IPI_FAILURE_TARGET_NOT_RUNNING VM-Exit on the sending vCPU.
The absence of the VM-Exit causes KVM to miss the required wakeup of
blocking target vCPUs, leading to hung vCPUs and unbounded delays in
guest execution.
Extend the existing AMD Family 17h erratum #1235 workaround to also cover
Hygon Family 18h. With IPI virtualization disabled, KVM never sets
IsRunning=1 in the Physical ID table, so every non-self IPI generates a
VM-Exit and is correctly emulated.
Fixes: 8de4a1c8164e ("KVM: SVM: Disable (x2)AVIC IPI virtualization if CPU has erratum #1235") Cc: <stable@vger.kernel.org> Signed-off-by: Tina Zhang <zhang_wei@open-hieco.net>
Message-ID: <20260522040014.3380201-1-zhang_wei@open-hieco.net>
KVM: selftests: Verify that KVM returns the configured APIC cycle length
Add checks in the APIC bus clock test to verify that querying
KVM_CAP_X86_APIC_BUS_CYCLES_NS on the VM after changing the frequency
returns the VM's actual APIC cycle length, not KVM's default. For
giggles, verify that KVM still returns its default frequency for the
system-scoped check.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260522173526.3539407-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>