Paolo Bonzini [Mon, 9 Feb 2026 18:35:16 +0000 (19:35 +0100)]
Merge tag 'kvm-x86-pmu-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM mediated PMU support for 6.20
Add support for mediated PMUs, where KVM gives the guest full ownership of PMU
hardware (contexted switched around the fastpath run loop) and allows direct
access to data MSRs and PMCs (restricted by the vPMU model), but intercepts
access to control registers, e.g. to enforce event filtering and to prevent the
guest from profiling sensitive host state.
To keep overall complexity reasonable, mediated PMU usage is all or nothing
for a given instance of KVM (controlled via module param). The Mediated PMU
is disabled default, partly to maintain backwards compatilibity for existing
setup, partly because there are tradeoffs when running with a mediated PMU that
may be non-starters for some use cases, e.g. the host loses the ability to
profile guests with mediated PMUs, the fastpath run loop is also a blind spot,
entry/exit transitions are more expensive, etc.
Versus the emulated PMU, where KVM is "just another perf user", the mediated
PMU delivers more accurate profiling and monitoring (no risk of contention and
thus dropped events), with significantly less overhead (fewer exits and faster
emulation/programming of event selectors) E.g. when running Specint-2017 on
a single-socket Sapphire Rapids with 56 cores and no-SMT, and using perf from
within the guest:
Perf command:
a. basic-sampling: perf record -F 1000 -e 6-instructions -a --overwrite
b. multiplex-sampling: perf record -F 1000 -e 10-instructions -a --overwrite
Paolo Bonzini [Mon, 9 Feb 2026 18:33:30 +0000 (19:33 +0100)]
Merge tag 'kvm-x86-apic-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM x86 APIC-ish changes for 6.20
- Fix a benign bug where KVM could use the wrong memslots (ignored SMM) when
creating a vCPU-specific mapping of guest memory.
- Clean up KVM's handling of marking mapped vCPU pages dirty.
- Drop a pile of *ancient* sanity checks hidden behind in KVM's unused
ASSERT() macro, most of which could be trivially triggered by the guest
and/or user, and all of which were useless.
- Fold "struct dest_map" into its sole user, "struct rtc_status", to make it
more obvious what the weird parameter is used for, and to allow burying the
RTC shenanigans behind CONFIG_KVM_IOAPIC=y.
- Bury all of ioapic.h and KVM_IRQCHIP_KERNEL behind CONFIG_KVM_IOAPIC=y.
- Add a regression test for recent APICv update fixes.
- Rework KVM's handling of VMCS updates while L2 is active to temporarily
switch to vmcs01 instead of deferring the update until the next nested
VM-Exit. The deferred updates approach directly contributed to several
bugs, was proving to be a maintenance burden due to the difficulty in
auditing the correctness of deferred updates, and was polluting
"struct nested_vmx" with a growing pile of booleans.
- Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv()
to consolidate the updates, and to co-locate SVI updates with the updates
for KVM's own cache of ISR information.
Paolo Bonzini [Mon, 9 Feb 2026 18:08:17 +0000 (19:08 +0100)]
Merge tag 'kvm-x86-gmem-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM guest_memfd changes for 6.20
- Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage
handling, and instead rely on SNP (the only user of the tracking) to do its
own tracking via the RMP.
- Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE
and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to
avoid non-trivial complexity for a non-existent usecase (and because
in-place conversion simply can't support unaligned sources).
- When populating guest_memfd memory, GUP the source page in common code and
pass the refcounted page to the vendor callback, instead of letting vendor
code do the heavy lifting. Doing so avoids a looming deadlock bug with
in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap
invalidate lock.
Paolo Bonzini [Mon, 9 Feb 2026 18:05:42 +0000 (19:05 +0100)]
Merge tag 'kvm-riscv-6.20-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv changes for 6.20
- Fixes for issues discoverd by KVM API fuzzing in
kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(),
and kvm_riscv_vcpu_aia_imsic_update()
- Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM
- Add riscv vm satp modes in KVM selftests
- Transparent huge page support for G-stage
- Adjust the number of available guest irq files based on
MMIO register sizes in DeviceTree or ACPI
Paolo Bonzini [Mon, 9 Feb 2026 17:53:47 +0000 (18:53 +0100)]
Merge tag 'kvm-x86-misc-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.20
- Disallow changing the virtual CPU model if L2 is active, for all the same
reasons KVM disallows change the model after the first KVM_RUN.
- Fix a bug where KVM would incorrectly reject host accesses to PV MSRs that
were advertised as supported to userspace when running with
KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled.
- Fix a bug where KVM would attempt to read protect guest state (CR3) when
configuring an async #PF entry.
- Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86
only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL. Explicitly allow
the few exports that are intended for external usage.
- Ignore -EBUSY when checking nested events after a vCPU exits blocking as
the WARN is user-triggerable, and because exiting to userspace on -EBUSY
does more harm than good in pretty much every situation.
- Throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU is
in Wait-For-SIPI, as playing whack-a-mole with syzkaller turned out to be an
unwinnable game.
- Add support for new Intel instructions that don't require anything beyond
enumerating feature flags to userspace.
- Grab SRCU when reading PDPTRs in KVM_GET_SREGS2.
- Add WARNs to guard against modifying KVM's CPU caps outside of the intended
setup flow, as nested VMX in particular is sensitive to unexpected changes
in KVM's golden configuration.
- Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts
when the suppression feature is enabled by the guest (currently limited to
split IRQCHIP, i.e. userspace I/O APIC). Sadly, simply fixing KVM to honor
Suppress EOI Broadcasts isn't an option as some userspaces have come to rely
on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective
of whether or not userspace I/O APIC supports Directed EOIs).
Paolo Bonzini [Mon, 9 Feb 2026 17:51:37 +0000 (18:51 +0100)]
Merge tag 'kvm-x86-svm-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM SVM changes for 6.20
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure.
- Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS
relies on an upcoming, publicly announced change in the APM to reduce the
set of conditions where hardware (i.e. KVM) *must* flush the RAP.
- Ignore nSVM intercepts for instructions that are not supported according to
L1's virtual CPU model.
- Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath
for EPT Misconfig.
- Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM,
and allow userspace to restore nested state with GIF=0.
- Treat exit_code as an unsigned 64-bit value through all of KVM.
- Add support for fetching SNP certificates from userspace.
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD
or VMSAVE on behalf of L2.
Paolo Bonzini [Mon, 9 Feb 2026 17:50:04 +0000 (18:50 +0100)]
Merge tag 'kvm-x86-vmx-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM VMX changes for 6.20
- Fix an SGX bug where KVM would incorrectly try to handle EPCM #PFs by always
relecting EPCM #PFs back into the guest. KVM doesn't shadow EPCM entries,
and so EPCM violations cannot be due to KVM interference, and can't be
resolved by KVM.
- Fix a bug where KVM would register its posted interrupt wakeup handler even
if loading kvm-intel.ko ultimately failed.
- Disallow access to vmcb12 fields that aren't fully supported, mostly to
avoid weirdness and complexity for FRED and other features, where KVM wants
enable VMCS shadowing for fields that conditionally exist.
- Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or
refuses to online a CPU) due to a VMCS config mismatch.
Paolo Bonzini [Mon, 9 Feb 2026 17:38:54 +0000 (18:38 +0100)]
Merge tag 'kvm-x86-selftests-6.20' of https://github.com/kvm-x86/linux into HEAD
KVM selftests changes for 6.20
- Add a regression test for TPR<=>CR8 synchronization and IRQ masking.
- Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support,
and extend x86's infrastructure to support EPT and NPT (for L2 guests).
- Extend several nested VMX tests to also cover nested SVM.
- Add a selftest for nested VMLOAD/VMSAVE.
- Rework the nested dirty log test, originally added as a regression test for
PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage
and to hopefully make the test easier to understand and maintain.
Paolo Bonzini [Mon, 9 Feb 2026 17:17:01 +0000 (18:17 +0100)]
Merge tag 'loongarch-kvm-6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.20
1. Add more CPUCFG mask bits.
2. Improve feature detection.
3. Add FPU/LBT delay load support.
4. Set default return value in KVM IO bus ops.
5. Add paravirt preempt feature support.
6. Add KVM steal time test case for tools/selftests.
Linus Torvalds [Sat, 7 Feb 2026 17:37:34 +0000 (09:37 -0800)]
Merge tag 'spi-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"One final batch of fixes for the Tegra SPI drivers, the main one is a
batch of fixes for races with the interrupts in the Tegra210 QSPI
driver that Breno has been working on for a while"
* tag 'spi-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: tegra114: Preserve SPI mode bits in def_command1_reg
spi: tegra: Fix a memory leak in tegra_slink_probe()
spi: tegra210-quad: Protect curr_xfer check in IRQ handler
spi: tegra210-quad: Protect curr_xfer clearing in tegra_qspi_non_combined_seq_xfer
spi: tegra210-quad: Protect curr_xfer in tegra_qspi_combined_seq_xfer
spi: tegra210-quad: Protect curr_xfer assignment in tegra_qspi_setup_transfer_one
spi: tegra210-quad: Move curr_xfer read inside spinlock
spi: tegra210-quad: Return IRQ_HANDLED when timeout already processed transfer
Linus Torvalds [Sat, 7 Feb 2026 17:34:49 +0000 (09:34 -0800)]
Merge tag 'regulator-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One last fix for v6.19: the voltages for the SpaceMIT P1 were not
described correctly"
* tag 'regulator-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: spacemit-p1: Fix n_voltages for BUCK and LDO regulators
Linus Torvalds [Sat, 7 Feb 2026 17:27:57 +0000 (09:27 -0800)]
Merge tag 'char-misc-6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull binder fixes from Greg KH:
"Here are some small, last-minute binder C and Rust driver fixes for
reported issues. They include a number of fixes for reported crashes
and other problems.
All of these have been in linux-next this week, and longer"
* tag 'char-misc-6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binderfs: fix ida_alloc_max() upper bound
rust_binderfs: fix ida_alloc_max() upper bound
binder: fix BR_FROZEN_REPLY error log
rust_binder: add additional alignment checks
binder: fix UAF in binder_netlink_report()
rust_binder: correctly handle FDA objects of length zero
Linus Torvalds [Sat, 7 Feb 2026 17:10:42 +0000 (09:10 -0800)]
Merge tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Miscellaneous MMCID fixes to address bugs and performance regressions
in the recent rewrite of the SCHED_MM_CID management code:
- Fix livelock triggered by BPF CI testing
- Fix hard lockup on weakly ordered systems
- Simplify the dropping of CIDs in the exit path by removing an
unintended transition phase
- Fix performance/scalability regression on a thread-pool benchmark
by optimizing transitional CIDs when scheduling out"
* tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/mmcid: Optimize transitional CIDs when scheduling out
sched/mmcid: Drop per CPU CID immediately when switching to per task mode
sched/mmcid: Protect transition on weakly ordered systems
sched/mmcid: Prevent live lock on task to CPU mode transition
Linus Torvalds [Sat, 7 Feb 2026 16:21:21 +0000 (08:21 -0800)]
Merge tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar::
- Bump up the Clang minimum version requirements for livepatch
builds, due to Clang assembler section handling bugs causing
silent miscompilations
- Strip livepatching symbol artifacts from non-livepatch modules
- Fix livepatch build warnings when certain Clang LTO options
are enabled
- Fix livepatch build error when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
* tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool/klp: Fix unexported static call key access for manually built livepatch modules
objtool/klp: Fix symbol correlation for orphaned local symbols
livepatch: Free klp_{object,func}_ext data after initialization
livepatch: Fix having __klp_objects relics in non-livepatch modules
livepatch/klp-build: Require Clang assembler >= 20
Josh Poimboeuf [Fri, 6 Feb 2026 22:24:55 +0000 (14:24 -0800)]
x86/vmware: Fix hypercall clobbers
Fedora QA reported the following panic:
BUG: unable to handle page fault for address: 0000000040003e54
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20251119-3.fc43 11/19/2025
RIP: 0010:vmware_hypercall4.constprop.0+0x52/0x90
..
Call Trace:
vmmouse_report_events+0x13e/0x1b0
psmouse_handle_byte+0x15/0x60
ps2_interrupt+0x8a/0xd0
...
because the QEMU VMware mouse emulation is buggy, and clears the top 32
bits of %rdi that the kernel kept a pointer in.
The QEMU vmmouse driver saves and restores the register state in a
"uint32_t data[6];" and as a result restores the state with the high
bits all cleared.
RDI originally contained the value of a valid kernel stack address
(0xff5eeb3240003e54). After the vmware hypercall it now contains
0x40003e54, and we get a page fault as a result when it is dereferenced.
The proper fix would be in QEMU, but this works around the issue in the
kernel to keep old setups working, when old kernels had not happened to
keep any state in %rdi over the hypercall.
In theory this same issue exists for all the hypercalls in the vmmouse
driver; in practice it has only been seen with vmware_hypercall3() and
vmware_hypercall4(). For now, just mark RDI/RSI as clobbered for those
two calls. This should have a minimal effect on code generation overall
as it should be rare for the compiler to want to make RDI/RSI live
across hypercalls.
Linus Torvalds [Fri, 6 Feb 2026 21:07:47 +0000 (13:07 -0800)]
Merge tag 'mm-hotfixes-stable-2026-02-06-12-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"A couple of late-breaking MM fixes. One against a new-in-this-cycle
patch and the other addresses a locking issue which has been there for
over a year"
* tag 'mm-hotfixes-stable-2026-02-06-12-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/memory-failure: reject unsupported non-folio compound page
procfs: avoid fetching build ID while holding VMA lock
Linus Torvalds [Fri, 6 Feb 2026 20:37:28 +0000 (12:37 -0800)]
Merge tag 'trace-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
- Fix event format field alignments for 32 bit architectures
The fields in the event format files are used to parse the raw binary
buffer data by applications. If they are incorrect, then the
application produces garbage.
On 32 bit architectures, the function graph 64bit calltime and
rettime were off by 4bytes. That's because the actual fields are in a
packed structure but the macros used by the ftrace events did not
mark them as packed, and instead, gave them their natural alignment
which made their offsets off by 4 bytes.
There are macros to have a packed field within an embedded structure
of an event, but there's no macro for normal fields within a packed
structure of the event. The macro __field_packed() was used for the
packed embedded structure field. Rename that to __field_desc_packed()
(to match the non-packed embedded field macro __field_desc()), and
make __field_packed() for fields that are in a packed event structure
(which matches the unpacked __field() macro).
Switch the calltime and rettime fields of the function graph event to
use the new __field_packed() and this makes the offsets correct.
* tag 'trace-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix ftrace event field alignments
Linus Torvalds [Fri, 6 Feb 2026 18:34:17 +0000 (10:34 -0800)]
Merge tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"One RBD and two CephFS fixes which address potential oopses.
The RBD thing is more of a rare edge case that pops up in our CI,
while the two CephFS scenarios are regressions that were reported by
users and can be triggered trivially in normal operation. All marked
for stable"
* tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-client:
ceph: fix NULL pointer dereference in ceph_mds_auth_match()
ceph: fix oops due to invalid pointer for kfree() in parse_longname()
rbd: check for EOD after exclusive lock is ensured to be held
Linus Torvalds [Fri, 6 Feb 2026 18:27:42 +0000 (10:27 -0800)]
Merge tag 'dma-mapping-6.19-2026-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fixes from Marek Szyprowski:
"Two minor fixes for the DMA-mapping subsystem:
- check for the rare case of the allocation failure of the global CMA
pool (Shanker Donthineni)
- avoid perf buffer overflow when tracing large scatter-gather lists
(Deepanshu Kartikey)"
* tag 'dma-mapping-6.19-2026-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma: contiguous: Check return value of dma_contiguous_reserve_area()
tracing/dma: Cap dma_map_sg tracepoint arrays to prevent buffer overflow
Linus Torvalds [Fri, 6 Feb 2026 18:10:39 +0000 (10:10 -0800)]
Merge tag 'pmdomain-v6.19-rc3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
- imx:
- Fix system wakeup support for imx8mp power domains
- Fix potential out-of-range access for imx8m power domains
- Fix the imx8mm gpu hang
- qcom: Fix off-by-one error for highest state in rpmpd
* tag 'pmdomain-v6.19-rc3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: imx8mp-blk-ctrl: Keep usb phy power domain on for system wakeup
pmdomain: imx8mp-blk-ctrl: Keep gpc power domain on for system wakeup
pmdomain: imx8m-blk-ctrl: fix out-of-range access of bc->domains
pmdomain: imx: gpcv2: Fix the imx8mm gpu hang due to wrong adb400 reset
pmdomain: qcom: rpmpd: fix off-by-one error in clamping to the highest state
Linus Torvalds [Fri, 6 Feb 2026 17:59:40 +0000 (09:59 -0800)]
Merge tag 'sound-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. It became a bit larger than wished, but
all of them are device-specific small fixes, and it should be still
fairly safe to take at the last minute.
Included are a few quirks and fixes for Intel, AMD, HD-audio, and
USB-audio, as well as a race fix in aloop driver and corrections of
Cirrus firmware kunit test"
* tag 'sound-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Enable headset mic for Acer Nitro 5
ASoC: fsl_xcvr: fix missing lock in fsl_xcvr_mode_put()
ASoC: dt-bindings: ti,tlv320aic3x: Add compatible string ti,tlv320aic23
ASoC: amd: fix memory leak in acp3x pdm dma ops
ALSA: usb-audio: fix broken logic in snd_audigy2nx_led_update()
ALSA: aloop: Fix racy access at PCM trigger
ASoC: rt1320: fix intermittent no-sound issue
ASoC: SOF: Intel: use hdev->info.link_mask directly
firmware: cs_dsp: rate-limit log messages in KUnit builds
ASoC: amd: yc: Add quirk for HP 200 G2a 16
ASoC: cs42l43: Correct handling of 3-pole jack load detection
ASoC: Intel: sof_es8336: Add DMI quirk for Huawei BOD-WXX9
ASoC: sof_sdw: Add a quirk for Lenovo laptop using sidecar amps with cs42l43
Linus Torvalds [Fri, 6 Feb 2026 17:56:03 +0000 (09:56 -0800)]
Merge tag 'slab-for-6.19-rc8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
"A stable fix for memory allocation profiling tag not being cleared
when aborting an allocation due to memcg charge failure (Hao Ge)"
* tag 'slab-for-6.19-rc8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: Add alloc_tagging_slab_free_hook for memcg_alloc_abort_single
Xu Lu [Sun, 4 Jan 2026 13:34:57 +0000 (21:34 +0800)]
irqchip/riscv-imsic: Adjust the number of available guest irq files
Currently, KVM assumes the minimum of implemented HGEIE bits and
"BIT(gc->guest_index_bits) - 1" as the number of guest files available
across all CPUs. This will not work when CPUs have different number
of guest files because KVM may incorrectly allocate a guest file on a
CPU with fewer guest files.
To address above, during initialization, calculate the number of
available guest interrupt files according to MMIO resources and
constrain the number of guest interrupt files that can be allocated
by KVM.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Thomas Gleixner <tglx@kernel.org> Link: https://lore.kernel.org/r/20260104133457.57742-1-luxu.kernel@bytedance.com Signed-off-by: Anup Patel <anup@brainfault.org>
Wu Fei [Wed, 5 Nov 2025 15:14:26 +0000 (23:14 +0800)]
KVM: riscv: selftests: Add riscv vm satp modes
Current vm modes cannot represent riscv guest modes precisely, here add
all 9 combinations of P(56,40,41) x V(57,48,39). Also the default vm
mode is detected on runtime instead of hardcoded one, which might not be
supported on specific machine.
Signed-off-by: Wu Fei <wu.fei9@sanechips.com.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20251105151442.28767-1-wu.fei9@sanechips.com.cn Signed-off-by: Anup Patel <anup@brainfault.org>
Jiakai Xu [Tue, 27 Jan 2026 08:43:13 +0000 (08:43 +0000)]
RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized
kvm_riscv_vcpu_aia_imsic_update() assumes that the vCPU IMSIC state has
already been initialized and unconditionally accesses imsic->vsfile_lock.
However, in fuzzed ioctl sequences, the AIA device may be initialized at
the VM level while the per-vCPU IMSIC state is still NULL.
This leads to invalid access when entering the vCPU run loop before
IMSIC initialization has completed.
The crash manifests as:
Unable to handle kernel paging request at virtual address dfffffff00000006
...
kvm_riscv_vcpu_aia_imsic_update arch/riscv/kvm/aia_imsic.c:801
kvm_riscv_vcpu_aia_update arch/riscv/kvm/aia_device.c:493
kvm_arch_vcpu_ioctl_run arch/riscv/kvm/vcpu.c:927
...
Add a guard to skip the IMSIC update path when imsic_state is NULL. This
allows the vCPU run loop to continue safely.
This issue was discovered during fuzzing of RISC-V KVM code.
Jiakai Xu [Tue, 27 Jan 2026 07:22:19 +0000 (07:22 +0000)]
RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr()
Add a null pointer check for imsic_state before dereferencing it in
kvm_riscv_aia_imsic_rw_attr(). While the function checks that the
vcpu exists, it doesn't verify that the vcpu's imsic_state has been
initialized, leading to a null pointer dereference when accessed.
The crash manifests as:
Unable to handle kernel paging request at virtual address dfffffff00000006
...
kvm_riscv_aia_imsic_rw_attr+0x2d8/0x854 arch/riscv/kvm/aia_imsic.c:958
aia_set_attr+0x2ee/0x1726 arch/riscv/kvm/aia_device.c:354
kvm_device_ioctl_attr virt/kvm/kvm_main.c:4744 [inline]
kvm_device_ioctl+0x296/0x374 virt/kvm/kvm_main.c:4761
vfs_ioctl fs/ioctl.c:51 [inline]
...
The fix adds a check to return -ENODEV if imsic_state is NULL and moves
isel assignment after imsic_state NULL check.
Jiakai Xu [Sun, 25 Jan 2026 14:33:44 +0000 (14:33 +0000)]
RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr()
Add a null pointer check for imsic_state before dereferencing it in
kvm_riscv_aia_imsic_has_attr(). While the function checks that the
vcpu exists, it doesn't verify that the vcpu's imsic_state has been
initialized, leading to a null pointer dereference when accessed.
This issue was discovered during fuzzing of RISC-V KVM code. The
crash occurs when userspace calls KVM_HAS_DEVICE_ATTR ioctl on an
AIA IMSIC device before the IMSIC state has been fully initialized
for a vcpu.
The crash manifests as:
Unable to handle kernel paging request at virtual address dfffffff00000001
...
epc : kvm_riscv_aia_imsic_has_attr+0x464/0x50e
arch/riscv/kvm/aia_imsic.c:998
...
kvm_riscv_aia_imsic_has_attr+0x464/0x50e arch/riscv/kvm/aia_imsic.c:998
aia_has_attr+0x128/0x2bc arch/riscv/kvm/aia_device.c:471
kvm_device_ioctl_attr virt/kvm/kvm_main.c:4722 [inline]
kvm_device_ioctl+0x296/0x374 virt/kvm/kvm_main.c:4739
...
The fix adds a check to return -ENODEV if imsic_state is NULL, which
is consistent with other error handling in the function and prevents
the null pointer dereference.
Qiang Ma [Mon, 29 Dec 2025 07:25:30 +0000 (15:25 +0800)]
RISC-V: KVM: Remove unnecessary 'ret' assignment
If execution reaches "ret = 0" assignment in
kvm_riscv_vcpu_pmu_event_info() then it means
kvm_vcpu_write_guest() returned 0 hence ret is
already zero and does not need to be assigned 0.
Fixes: e309fd113b9f ("RISC-V: KVM: Implement get event info function") Signed-off-by: Qiang Ma <maqianga@uniontech.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20251229072530.3075496-1-maqianga@uniontech.com Signed-off-by: Anup Patel <anup@brainfault.org>
Viktor Kleen [Thu, 5 Feb 2026 08:49:41 +0000 (16:49 +0800)]
iommu/vt-d: Treat PAGE_SNOOP and PWSNP separately
The PASID_FLAG_PAGE_SNOOP and PASID_FLAG_PWSNP constants are identical.
This will cause the pasid code to always set both or neither of the
PGSNP and PWSNP bits in PASID table entries. However, PWSNP is a
reserved bit if SMPWC is not set in the IOMMU's extended capability
register, even if SC is supported.
This has resulted in DMAR errors when testing the iommufd code on an
Arrow Lake platform. With this patch, those errors disappear and the
PASID table entries look correct.
Fixes: 101a2854110fa ("iommu/vt-d: Follow PT_FEAT_DMA_INCOHERENT into the PASID entry") Cc: stable@vger.kernel.org Signed-off-by: Viktor Kleen <viktor@kleen.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20260202192109.1665799-1-viktor@kleen.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
When __memcg_slab_post_alloc_hook() fails, there are two different
free paths depending on whether size == 1 or size != 1. In the
kmem_cache_free_bulk() path, we do call alloc_tagging_slab_free_hook().
However, in memcg_alloc_abort_single() we don't, the above warning will be
triggered on the next allocation.
Therefore, add alloc_tagging_slab_free_hook() to the
memcg_alloc_abort_single() path.
Fixes: 9f9796b413d3 ("mm, slab: move memcg charging to post-alloc hook") Cc: stable@vger.kernel.org Suggested-by: Hao Li <hao.li@linux.dev> Signed-off-by: Hao Ge <hao.ge@linux.dev> Reviewed-by: Hao Li <hao.li@linux.dev> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260204101401.202762-1-hao.ge@linux.dev Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Linus Torvalds [Fri, 6 Feb 2026 05:33:22 +0000 (21:33 -0800)]
Merge tag 'hwmon-for-v6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- occ: Mark occ_init_attribute() as __printf to avoid build failure due
to '-Werror=suggest-attribute=format'
- gpio-fan: Allow to stop fans when CONFIG_PM is disabled, and fix
set_rpm() return value
- acpi_power_meter: Fix deadlocks related to acpi_power_meter_notify()
- dell-smm: Add Dell G15 5510 to fan control whitelist
* tag 'hwmon-for-v6.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (occ) Mark occ_init_attribute() as __printf
hwmon: (gpio-fan) Allow to stop FANs when CONFIG_PM is disabled
hwmon: (gpio-fan) Fix set_rpm() return value
hwmon: (acpi_power_meter) Fix deadlocks related to acpi_power_meter_notify()
hwmon: (dell-smm) Add Dell G15 5510 to fan control whitelist
Linus Torvalds [Fri, 6 Feb 2026 03:56:47 +0000 (19:56 -0800)]
Merge tag 'drm-fixes-2026-02-06' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"The usual xe/amdgpu selection, and a couple of misc changes for
gma500, mgag200 and bridge. There is a nouveau revert, and also a set
of changes that fix a regression since we moved to 570 firmware.
Suspend/resume was broken on a bunch of GPUs. The fix looks big, but
it's mostly just refactoring to pass an extra bit down the nouveau
abstractions to the firmware command.
amdgpu:
- MES 11 old firmware compatibility fix
- ASPM fix
- DC LUT fixes
amdkfd:
- Fix possible double deletion of validate list
xe:
- Fix topology query pointer advance
- A couple of kerneldoc fixes
- Disable D3Cold for BMG only on specific platforms
- Fix CFI violation in debugfs access
nouveau:
- Revert adding atomic commit functions as it regresses pre-nv50
- Fix suspend/resume bugs exposed by enabling 570 firmware
gma500:
- Revert a regression caused by vblank changes
mgag200:
- Replace a busy loop with a polling loop to fix that blocking 1 cpu
for 300 ms roughly every 20 minutes
bridge:
- imx8mp-hdmi-pa: Use runtime pm to fix a bug in channel ordering"
* tag 'drm-fixes-2026-02-06' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe/guc: Fix CFI violation in debugfs access.
drm/bridge: imx8mp-hdmi-pai: enable PM runtime
drm/xe/pm: Disable D3Cold for BMG only on specific platforms
drm/xe: Fix kerneldoc for xe_tlb_inval_job_alloc_dep
drm/xe: Fix kerneldoc for xe_gt_tlb_inval_init_early
drm/xe: Fix kerneldoc for xe_migrate_exec_queue
drm/xe/query: Fix topology query pointer advance
drm/mgag200: fix mgag200_bmc_stop_scanout()
nouveau/gsp: fix suspend/resume regression on r570 firmware
nouveau: add a third state to the fini handler.
nouveau/gsp: use rpc sequence numbers properly.
drm/amdgpu: Fix double deletion of validate_list
drm/amd/display: remove assert around dpp_base replacement
drm/amd/display: extend delta clamping logic to CM3 LUT helper
drm/amd/display: fix wrong color value mapping on MCM shaper LUT
Revert "drm/amd: Check if ASPM is enabled from PCIe subsystem"
drm/amd: Set minimum version for set_hw_resource_1 on gfx11 to 0x52
Revert "drm/gma500: use drm_crtc_vblank_crtc()"
Revert "drm/nouveau/disp: Set drm_mode_config_funcs.atomic_(check|commit)"
Dave Airlie [Fri, 6 Feb 2026 02:41:35 +0000 (12:41 +1000)]
Merge tag 'drm-xe-fixes-2026-02-05' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix topology query pointer advance (Shuicheng)
- A couple of kerneldoc fixes (Shuicheng)
- Disable D3Cold for BMG only on specific platforms (Karthik)
- Fix CFI violation in debugfs access (Daniele)
Bibo Mao [Fri, 6 Feb 2026 01:28:01 +0000 (09:28 +0800)]
LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side
Function vcpu_is_preempted() is used to check whether vCPU is preempted
or not. Here add the implementation with vcpu_is_preempted() when option
CONFIG_PARAVIRT is enabled.
Bibo Mao [Fri, 6 Feb 2026 01:28:01 +0000 (09:28 +0800)]
LoongArch: KVM: Add paravirt preempt feature in hypervisor side
Feature KVM_FEATURE_PREEMPT is added to show whether vCPU is preempted
or not. It is to help guest OS scheduling or lock checking etc. Here
add KVM_FEATURE_PREEMPT feature and use one byte as preempted flag in
the steal time structure.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Fri, 6 Feb 2026 01:28:00 +0000 (09:28 +0800)]
LoongArch: KVM: Set default return value in KVM IO bus ops
When in-kernel irqchip is enabled, its register area is registered in
the KVM IO bus list with API kvm_io_bus_register_dev(). In MMIO/IOCSR
register access emulation, kvm_io_bus_read()/kvm_io_bus_write() is
called firstly. If it returns 0, it means that the in-kernel irqchip
handles the emulation already, else it returns to user-mode VMM and
lets VMM emulate the register access.
Once in-kernel irqchip is enabled, it should return 0 if the address
is within range of the registered KVM IO bus. It should not return to
user-mode VMM since VMM does not know how to handle it, and irqchip is
handled in kernel already.
Here set default return value with 0 in KVM IO bus operations.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Fri, 6 Feb 2026 01:28:00 +0000 (09:28 +0800)]
LoongArch: KVM: Add FPU/LBT delay load support
FPU/LBT are lazy enabled with KVM hypervisor. After FPU/LBT enabled and
loaded, vCPU can be preempted and FPU/LBT will be lost again, there will
be unnecessary FPU/LBT exceptions, load and store stuff. Here delay the
FPU/LBT load until the guest entry.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Fri, 6 Feb 2026 01:27:47 +0000 (09:27 +0800)]
LoongArch: KVM: Move LASX capability check in exception handler
Like FPU exception handler, check LASX capability in the LASX exception
handler rather than function kvm_own_lasx(). Since LASX capability in
the function kvm_guest_has_lasx() implies FPU and LSX capability, only
checking kvm_guest_has_lasx() is OK here.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Fri, 6 Feb 2026 01:27:47 +0000 (09:27 +0800)]
LoongArch: KVM: Move LSX capability check in exception handler
Like FPU exception handler, check LSX capability in the LSX exception
handler rather than function kvm_own_lsx(). Since LSX capability in
the function kvm_guest_has_lsx() implies FPU capability, only checking
kvm_guest_has_lsx() is OK here.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Fri, 6 Feb 2026 01:27:46 +0000 (09:27 +0800)]
LoongArch: KVM: Handle LOONGARCH_CSR_IPR during vCPU context switch
Register LOONGARCH_CSR_IPR is interrupt priority setting for nested
interrupt handling. Though LoongArch Linux AVEC driver does not use
this register, KVM hypervisor needs to save and restore this it during
vCPU context switch. Because Linux AVEC driver may use this register
in future, or other OS may use it.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Fri, 6 Feb 2026 01:27:46 +0000 (09:27 +0800)]
LoongArch: KVM: Check VM msgint feature during interrupt handling
During message interrupt handling and relative CSR registers saving and
restore, it is better to check VM msgint feature rather than host msgint
feature, because VM may disable this feature even if host supports this.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Fri, 6 Feb 2026 01:27:46 +0000 (09:27 +0800)]
LoongArch: KVM: Move feature detection in kvm_vm_init_features()
VM feature detection is sparsed in function kvm_vm_init_features() and
kvm_vm_feature_has_attr(). Here move all the features detection in
function kvm_vm_init_features(), and there is only feature checking in
function kvm_vm_feature_has_attr().
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Fri, 6 Feb 2026 01:27:46 +0000 (09:27 +0800)]
LoongArch: KVM: Add more CPUCFG mask bits
With new CPU cores there are more features supported which are indicated
in CPUCFG2 bits 24:30 and CPUCFG3 bits 17:23. The KVM hypervisor cannot
enable or disable (most of) these features and there is no KVM exception
when instructions of these features are executed in guest mode.
Here add more CPUCFG mask support with LA664 CPU type.
Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Linus Torvalds [Thu, 5 Feb 2026 23:00:53 +0000 (15:00 -0800)]
Merge tag 'block-6.19-20260205' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- Revert of a change for loop, which caused regressions for some users
(Actually revert of two commits, where one is just an existing fix
for the offending commit)
- NVMe pull via Keith:
- Fix NULL pointer access setting up dma mappings
- Fix invalid memory access from malformed TCP PDU
* tag 'block-6.19-20260205' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
loop: revert exclusive opener loop status change
nvmet-tcp: add bounds checks in nvmet_tcp_build_pdu_iovec
nvme-pci: handle changing device dma map requirements
Linus Torvalds [Thu, 5 Feb 2026 22:40:06 +0000 (14:40 -0800)]
Merge tag 'io_uring-6.19-20260205' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Two small fixes for zcrx
- Two small fixes for fdinfo - one is just killing a superflous newline
* tag 'io_uring-6.19-20260205' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/fdinfo: be a bit nicer when looping a lot of SQEs/CQEs
io_uring/fdinfo: kill unnecessary newline feed in CQE32 printing
io_uring/zcrx: fix rq flush locking
io_uring/zcrx: fix page array leak
When !CONFIG_TRANSPARENT_HUGEPAGE, a non-folio compound page can appear in
a userspace mapping via either vm_insert_*() functions or
vm_operatios_struct->fault(). They are not folios, thus should not be
considered for folio operations like split. To reject these pages, make
sure get_hwpoison_page() is always called as HWPoisonHandlable() will do
the right work.
[Some commit log borrowed from Zi Yan. Thanks.]
Link: https://lkml.kernel.org/r/20260205075328.523211-1-linmiaohe@huawei.com Fixes: 689b8986776c ("mm/memory-failure: improve large block size folio handling") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reported-by: 是参差 <shicenci@gmail.com> Closes: https://lore.kernel.org/all/PS1PPF7E1D7501F1E4F4441E7ECD056DEADAB98A@PS1PPF7E1D7501F.apcprd02.prod.outlook.com/ Reviewed-by: Zi Yan <ziy@nvidia.com> Tested-by: Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jane Chu <jane.chu@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrii Nakryiko [Thu, 29 Jan 2026 21:53:40 +0000 (13:53 -0800)]
procfs: avoid fetching build ID while holding VMA lock
Fix PROCMAP_QUERY to fetch optional build ID only after dropping mmap_lock
or per-VMA lock, whichever was used to lock VMA under question, to avoid
deadlock reported by syzbot:
This seems to be exacerbated (as we haven't seen these syzbot reports
before that) by the recent:
777a8560fd29 ("lib/buildid: use __kernel_read() for sleepable context")
To make this safe, we need to grab file refcount while VMA is still locked, but
other than that everything is pretty straightforward. Internal build_id_parse()
API assumes VMA is passed, but it only needs the underlying file reference, so
just add another variant build_id_parse_file() that expects file passed
directly.
[akpm@linux-foundation.org: fix up kerneldoc] Link: https://lkml.kernel.org/r/20260129215340.3742283-1-andrii@kernel.org Fixes: ed5d583a88a9 ("fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reported-by: <syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vishwaroop A [Wed, 4 Feb 2026 14:12:12 +0000 (14:12 +0000)]
spi: tegra114: Preserve SPI mode bits in def_command1_reg
The COMMAND1 register bits [29:28] set the SPI mode, which controls
the clock idle level. When a transfer ends, tegra_spi_transfer_end()
writes def_command1_reg back to restore the default state, but this
register value currently lacks the mode bits. This results in the
clock always being configured as idle low, breaking devices that
need it high.
Fix this by storing the mode bits in def_command1_reg during setup,
to prevent this field from always being cleared.
Linus Torvalds [Thu, 5 Feb 2026 19:19:26 +0000 (11:19 -0800)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dcache fixes from Al Viro:
"A couple of regression fixes for the tree-in-dcache series this cycle"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
functionfs: use spinlock for FFS_DEACTIVATED/FFS_CLOSING transitions
rust_binderfs: fix a dentry leak
Al Viro [Sat, 31 Jan 2026 23:24:41 +0000 (18:24 -0500)]
functionfs: use spinlock for FFS_DEACTIVATED/FFS_CLOSING transitions
When all files are closed, functionfs needs ffs_data_reset() to be
done before any further opens are allowed.
During that time we have ffs->state set to FFS_CLOSING; that makes
->open() fail with -EBUSY. Once ffs_data_reset() is done, it
switches state (to FFS_READ_DESCRIPTORS) indicating that opening
that thing is allowed again. There's a couple of additional twists:
* mounting with -o no_disconnect delays ffs_data_reset()
from doing that at the final ->release() to the first subsequent
open(). That's indicated by ffs->state set to FFS_DEACTIVATED;
if open() sees that, it immediately switches to FFS_CLOSING and
proceeds with doing ffs_data_reset() before returning to userland.
* a couple of usb callbacks need to force the delayed
transition; unfortunately, they are done in locking environment
that does not allow blocking and ffs_data_reset() can block.
As the result, if these callbacks see FFS_DEACTIVATED, they change
state to FFS_CLOSING and use schedule_work() to get ffs_data_reset()
executed asynchronously.
Unfortunately, the locking is rather insufficient. A fix attempted
in e5bf5ee26663 ("functionfs: fix the open/removal races") had closed
a bunch of UAF, but it didn't do anything to the callbacks, lacked
barriers in transition from FFS_CLOSING to FFS_READ_DESCRIPTORS
_and_ it had been too heavy-handed in open()/open() serialization -
I've used ffs->mutex for that, and it's being held over actual IO on
ep0, complete with copy_from_user(), etc.
Even more unfortunately, the userland side is apparently racy enough
to have the resulting timing changes (no failures, just a delayed
return of open(2)) disrupt the things quite badly. Userland bugs
or not, it's a clear regression that needs to be dealt with.
Solution is to use a spinlock for serializing these state checks and
transitions - unlike ffs->mutex it can be taken in these callbacks
and it doesn't disrupt the timings in open().
We could introduce a new spinlock, but it's easier to use the one
that is already there (ffs->eps_lock) instead - the locking
environment is safe for it in all affected places.
Since now it is held over all places that alter or check the
open count (ffs->opened), there's no need to keep that atomic_t -
int would serve just fine and it's simpler that way.
Fixes: e5bf5ee26663 ("functionfs: fix the open/removal races") Fixes: 18d6b32fca38 ("usb: gadget: f_fs: add "no_disconnect" mode") # v4.0 Tested-by: Samuel Wu <wusamuel@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 26 Jan 2026 06:05:57 +0000 (01:05 -0500)]
rust_binderfs: fix a dentry leak
Parallel to binderfs patches - 02da8d2c0965 "binderfs_binder_ctl_create():
kill a bogus check" and the bit of b89aa544821d "convert binderfs" that
got lost when making 4433d8e25d73 "convert rust_binderfs"; the former is
a cleanup, the latter is about marking /binder-control persistent, so that
it would be taken out on umount.
Fixes: 4433d8e25d73 ("convert rust_binderfs") Acked-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Shigeru Yoshida [Wed, 4 Feb 2026 09:58:37 +0000 (18:58 +0900)]
ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF
syzbot reported a kernel BUG in fib6_add_rt2node() when adding an IPv6
route. [0]
Commit f72514b3c569 ("ipv6: clear RA flags when adding a static
route") introduced logic to clear RTF_ADDRCONF from existing routes
when a static route with the same nexthop is added. However, this
causes a problem when the existing route has a gateway.
When RTF_ADDRCONF is cleared from a route that has a gateway, that
route becomes eligible for ECMP, i.e. rt6_qualify_for_ecmp() returns
true. The issue is that this route was never added to the
fib6_siblings list.
This leads to a mismatch between the following counts:
- The sibling count computed by iterating fib6_next chain, which
includes the newly ECMP-eligible route
- The actual siblings in fib6_siblings list, which does not include
that route
When a subsequent ECMP route is added, fib6_add_rt2node() hits
BUG_ON(sibling->fib6_nsiblings != rt->fib6_nsiblings) because the
counts don't match.
Fix this by only clearing RTF_ADDRCONF when the existing route does
not have a gateway. Routes without a gateway cannot qualify for ECMP
anyway (rt6_qualify_for_ecmp() requires fib_nh_gw_family), so clearing
RTF_ADDRCONF on them is safe and matches the original intent of the
commit.
Jakub Kicinski [Thu, 5 Feb 2026 16:38:02 +0000 (08:38 -0800)]
Merge tag 'nf-26-02-05' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter: update for net
This is one last-minute crash fix for nf_tables, from Andrew Fasano:
Logical check is inverted, this makes kernel fail to correctly undo
the transaction, leading to a use-after-free.
* tag 'nf-26-02-05' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate()
====================
Jens Axboe [Thu, 5 Feb 2026 16:26:53 +0000 (09:26 -0700)]
loop: revert exclusive opener loop status change
This commit effectively reverts the following two commits:
2704024d83fa ("loop: add missing bd_abort_claiming in loop_set_status") 08e136ebd193 ("loop: don't change loop device under exclusive opener in loop_set_status")
as there are reports of them causing issues with unmounting. As we're
close to the 6.19 kernel release and the original author hasn't taken a
closer look at this yet, revert them for release.
This is caused an extra file->klp sanity check which was added by commit 164c9201e1da ("objtool: Add base objtool support for livepatch
modules"). That check was intended to ensure that livepatch modules
built with klp-build always have full access to their static call keys.
However, it failed to account for the fact that manually built livepatch
modules (i.e., not built with klp-build) might need access to unexported
static call keys, for which read-only access is typically allowed for
modules.
While the livepatch-shadow-fix1 module doesn't explicitly use any static
calls, it does have a memory allocation, which can cause
CONFIG_MEM_ALLOC_PROFILING_DEBUG to insert a WARN() call. And WARN() is
now an unexported static call as of commit 860238af7a33 ("x86_64/bug:
Inline the UD1").
Fix it by removing the overzealous file->klp check, restoring the
original behavior for manually built livepatch modules.
Josh Poimboeuf [Mon, 2 Feb 2026 18:01:08 +0000 (10:01 -0800)]
objtool/klp: Fix symbol correlation for orphaned local symbols
When compiling with CONFIG_LTO_CLANG_THIN, vmlinux.o has
__irf_[start|end] before the first FILE entry:
$ readelf -sW vmlinux.o
Symbol table '.symtab' contains 597706 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE LOCAL DEFAULT 18 __irf_start
2: 0000000000000200 0 NOTYPE LOCAL DEFAULT 18 __irf_end
3: 0000000000000000 0 SECTION LOCAL DEFAULT 17 .text
4: 0000000000000000 0 SECTION LOCAL DEFAULT 18 .init.ramfs
This causes klp-build warnings like:
vmlinux.o: warning: objtool: no correlation: __irf_start
vmlinux.o: warning: objtool: no correlation: __irf_end
The problem is that Clang LTO is stripping the initramfs_data.o FILE
symbol, causing those two symbols to be orphaned and not noticed by
klp-diff's correlation logic. Add a loop to correlate any symbols found
before the first FILE symbol.
Petr Pavlu [Fri, 23 Jan 2026 10:26:57 +0000 (11:26 +0100)]
livepatch: Free klp_{object,func}_ext data after initialization
The klp_object_ext and klp_func_ext data, which are stored in the
__klp_objects and __klp_funcs sections, respectively, are not needed
after they are used to create the actual klp_object and klp_func
instances. This operation is implemented by the init function in
scripts/livepatch/init.c.
Prefix the two sections with ".init" so they are freed after the module
is initializated.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Link: https://patch.msgid.link/20260123102825.3521961-3-petr.pavlu@suse.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Petr Pavlu [Fri, 23 Jan 2026 10:26:56 +0000 (11:26 +0100)]
livepatch: Fix having __klp_objects relics in non-livepatch modules
The linker script scripts/module.lds.S specifies that all input
__klp_objects sections should be consolidated into an output section of
the same name, and start/stop symbols should be created to enable
scripts/livepatch/init.c to locate this data.
This start/stop pattern is not ideal for modules because the symbols are
created even if no __klp_objects input sections are present.
Consequently, a dummy __klp_objects section also appears in the
resulting module. This unnecessarily pollutes non-livepatch modules.
Instead, since modules are relocatable files, the usual method for
locating consolidated data in a module is to read its section table.
This approach avoids the aforementioned problem.
The klp_modinfo already stores a copy of the entire section table with
the final addresses. Introduce a helper function that
scripts/livepatch/init.c can call to obtain the location of the
__klp_objects section from this data.
Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Link: https://patch.msgid.link/20260123102825.3521961-2-petr.pavlu@suse.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
YunJe Shin [Wed, 28 Jan 2026 00:41:07 +0000 (09:41 +0900)]
nvmet-tcp: add bounds checks in nvmet_tcp_build_pdu_iovec
nvmet_tcp_build_pdu_iovec() could walk past cmd->req.sg when a PDU
length or offset exceeds sg_cnt and then use bogus sg->length/offset
values, leading to _copy_to_iter() GPF/KASAN. Guard sg_idx, remaining
entries, and sg->length/offset before building the bvec.
The initial state of dma_needs_unmap may be false, but change to true
while mapping the data iterator. Enabling swiotlb is one such case that
can change the result. The nvme driver needs to save the mapped dma
vectors to be unmapped later, so allocate as needed during iteration
rather than assume it was always allocated at the beginning. This fixes
a NULL dereference from accessing an uninitialized dma_vecs when the
device dma unmapping requirements change mid-iteration.
Steven Rostedt [Wed, 4 Feb 2026 16:36:28 +0000 (11:36 -0500)]
tracing: Fix ftrace event field alignments
The fields of ftrace specific events (events used to save ftrace internal
events like function traces and trace_printk) are generated similarly to
how normal trace event fields are generated. That is, the fields are added
to a trace_events_fields array that saves the name, offset, size,
alignment and signness of the field. It is used to produce the output in
the format file in tracefs so that tooling knows how to parse the binary
data of the trace events.
The issue is that some of the ftrace event structures are packed. The
function graph exit event structures are one of them. The 64 bit calltime
and rettime fields end up 4 byte aligned, but the algorithm to show to
userspace shows them as 8 byte aligned.
The macros that create the ftrace events has one for embedded structure
fields. There's two macros for theses fields:
__field_desc() and __field_packed()
The difference of the latter macro is that it treats the field as packed.
Rename that field to __field_desc_packed() and create replace the
__field_packed() to be a normal field that is packed and have the calltime
and rettime use those.
This showed up on 32bit architectures for function graph time fields. It
had:
~# cat /sys/kernel/tracing/events/ftrace/funcgraph_exit/format
[..]
field:unsigned long func; offset:8; size:4; signed:0;
field:unsigned int depth; offset:12; size:4; signed:0;
field:unsigned int overrun; offset:16; size:4; signed:0;
field:unsigned long long calltime; offset:24; size:8; signed:0;
field:unsigned long long rettime; offset:32; size:8; signed:0;
Notice that overrun is at offset 16 with size 4, where in the structure
calltime is at offset 20 (16 + 4), but it shows the offset at 24. That's
because it used the alignment of unsigned long long when used as a
declaration and not as a member of a structure where it would be aligned
by word size (in this case 4).
By using the proper structure alignment, the format has it at the correct
offset:
~# cat /sys/kernel/tracing/events/ftrace/funcgraph_exit/format
[..]
field:unsigned long func; offset:8; size:4; signed:0;
field:unsigned int depth; offset:12; size:4; signed:0;
field:unsigned int overrun; offset:16; size:4; signed:0;
field:unsigned long long calltime; offset:20; size:8; signed:0;
field:unsigned long long rettime; offset:28; size:8; signed:0;
Xu Yang [Wed, 4 Feb 2026 11:11:41 +0000 (19:11 +0800)]
pmdomain: imx8mp-blk-ctrl: Keep gpc power domain on for system wakeup
Current design will power off all dependent GPC power domains in
imx8mp_blk_ctrl_suspend(), even though the user device has enabled
wakeup capability. The result is that wakeup function never works
for such device.
An example will be USB wakeup on i.MX8MP. PHY device '382f0040.usb-phy'
is attached to power domain 'hsioblk-usb-phy2' which is spawned by hsio
block control. A virtual power domain device 'genpd:3:32f10000.blk-ctrl'
is created to build connection with 'hsioblk-usb-phy2' and it depends on
GPC power domain 'usb-otg2'. If device '382f0040.usb-phy' enable wakeup,
only power domain 'hsioblk-usb-phy2' keeps on during system suspend,
power domain 'usb-otg2' is off all the time. So the wakeup event can't
happen.
In order to further establish a connection between the power domains
related to GPC and block control during system suspend, register a genpd
power on/off notifier for the power_dev. This allows us to prevent the GPC
power domain from being powered off, in case the block control power
domain is kept on to serve system wakeup.
Marc Zyngier [Thu, 5 Feb 2026 09:17:58 +0000 (09:17 +0000)]
Merge branch kvm-arm64/misc-6.20 into kvmarm-master/next
* kvm-arm64/misc-6.20:
: .
: Misc KVM/arm64 changes for 6.20
:
: - Trivial FPSIMD cleanups
:
: - Calculate hyp VA size only once, avoiding potential mapping issues when
: VA bits is smaller than expected
:
: - Silence sparse warning for the HYP stack base
:
: - Fix error checking when handling FFA_VERSION
:
: - Add missing trap configuration for DBGWCR15_EL1
:
: - Don't try to deal with nested S2 when NV isn't enabled for a guest
:
: - Various spelling fixes
: .
KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported
KVM: arm64: Fix various comments
KVM: arm64: nv: Add trap config for DBGWCR<15>_EL1
KVM: arm64: Fix error checking for FFA_VERSION
KVM: arm64: Fix missing <asm/stackpage/nvhe.h> include
KVM: arm64: Calculate hyp VA size only once
KVM: arm64: Remove ISB after writing FPEXC32_EL2
KVM: arm64: Shuffle KVM_HOST_DATA_FLAG_* indices
KVM: arm64: Fix comment in fpsimd_lazy_switch_to_host()
Marc Zyngier [Thu, 5 Feb 2026 09:17:48 +0000 (09:17 +0000)]
Merge branch kvm-arm64/resx into kvmarm-master/next
* kvm-arm64/resx:
: .
: Add infrastructure to deal with the full gamut of RESx bits
: for NV. As a result, it is now possible to have the expected
: semantics for some bits such as SCTLR_EL2.SPAN.
: .
KVM: arm64: Add debugfs file dumping computed RESx values
KVM: arm64: Add sanitisation to SCTLR_EL2
KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE
KVM: arm64: Remove all traces of FEAT_TME
KVM: arm64: Simplify handling of full register invalid constraint
KVM: arm64: Get rid of FIXED_VALUE altogether
KVM: arm64: Simplify handling of HCR_EL2.E2H RESx
KVM: arm64: Move RESx into individual register descriptors
KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags
KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags
KVM: arm64: Simplify FIXED_VALUE handling
KVM: arm64: Convert HCR_EL2.RW to AS_RES1
KVM: arm64: Correctly handle SCTLR_EL1 RES1 bits for unsupported features
KVM: arm64: Allow RES1 bits to be inferred from configuration
KVM: arm64: Inherit RESx bits from FGT register descriptors
KVM: arm64: Extend unified RESx handling to runtime sanitisation
KVM: arm64: Introduce data structure tracking both RES0 and RES1 bits
KVM: arm64: Introduce standalone FGU computing primitive
KVM: arm64: Remove duplicate configuration for SCTLR_EL1.{EE,E0E}
arm64: Convert SCTLR_EL2 to sysreg infrastructure
Marc Zyngier [Thu, 5 Feb 2026 09:17:44 +0000 (09:17 +0000)]
Merge branch kvm-arm64/debugfs-fixes into kvmarm-master/next
* kvm-arm64/debugfs-fixes:
: .
: Cleanup of the debugfs iterator, which are way more complicated
: than they ought to be, courtesy of Fuad Tabba. From the cover letter:
:
: "This series refactors the debugfs implementations for `idregs` and
: `vgic-state` to use standard `seq_file` iterator patterns.
:
: The existing implementations relied on storing iterator state within
: global VM structures (`kvm_arch` and `vgic_dist`). This approach
: prevented concurrent reads of the debugfs files (returning -EBUSY) and
: created improper dependencies between transient file operations and
: long-lived VM state."
: .
KVM: arm64: Use standard seq_file iterator for vgic-debug debugfs
KVM: arm64: Reimplement vgic-debug XArray iteration
KVM: arm64: Use standard seq_file iterator for idregs debugfs
Marc Zyngier [Thu, 5 Feb 2026 09:17:30 +0000 (09:17 +0000)]
Merge branch kvm-arm64/gicv5-prologue into kvmarm-master/next
* kvm-arm64/gicv5-prologue:
: .
: Prologue to GICv5 support, courtesy of Sascha Bischoff.
:
: This is preliminary work that sets the scene for the full-blow
: support.
: .
irqchip/gic-v5: Check if impl is virt capable
KVM: arm64: gic: Set vgic_model before initing private IRQs
arm64/sysreg: Drop ICH_HFGRTR_EL2.ICC_HAPR_EL1 and make RES1
KVM: arm64: gic-v3: Switch vGIC-v3 to use generated ICH_VMCR_EL2
Marc Zyngier [Thu, 5 Feb 2026 09:17:04 +0000 (09:17 +0000)]
Merge branch kvm-arm64/gicv3-tdir-fixes into kvmarm-master/next
* kvm-arm64/gicv3-tdir-fixes:
: .
: Address two trapping-related issues when running legacy (i.e. GICv3)
: guests on GICv5 hosts, courtesy of Sascha Bischoff.
: .
KVM: arm64: Correct test for ICH_HCR_EL2_TDIR cap for GICv5 hosts
KVM: arm64: gic: Enable GICv3 CPUIF trapping on GICv5 hosts if required
Marc Zyngier [Thu, 5 Feb 2026 09:16:42 +0000 (09:16 +0000)]
Merge branch kvm-arm64/fwb-for-all into kvmarm-master/next
* kvm-arm64/fwb-for-all:
: .
: Allow pKVM's host stage-2 mappings to use the Force Write Back version
: of the memory attributes by using the "pass-through' encoding.
:
: This avoids having two separate encodings for S2 on a given platform.
: .
KVM: arm64: Simplify PAGE_S2_MEMATTR
KVM: arm64: Kill KVM_PGTABLE_S2_NOFWB
KVM: arm64: Switch pKVM host S2 over to KVM_PGTABLE_S2_AS_S1
KVM: arm64: Add KVM_PGTABLE_S2_AS_S1 flag
arm64: Add MT_S2{,_FWB}_AS_S1 encodings
Marc Zyngier [Thu, 5 Feb 2026 09:16:31 +0000 (09:16 +0000)]
Merge branch kvm-arm64/pkvm-no-mte into kvmarm-master/next
* kvm-arm64/pkvm-no-mte:
: .
: pKVM updates preventing the host from using MTE-related system
: sysrem registers when the feature is disabled from the kernel
: command-line (arm64.nomte), courtesy of Fuad Taba.
:
: From the cover letter:
:
: "If MTE is supported by the hardware (and is enabled at EL3), it remains
: available to lower exception levels by default. Disabling it in the host
: kernel (e.g., via 'arm64.nomte') only stops the kernel from advertising
: the feature; it does not physically disable MTE in the hardware.
:
: The ability to disable MTE in the host kernel is used by some systems,
: such as Android, so that the physical memory otherwise used as tag
: storage can be used for other things (i.e. treated just like the rest of
: memory). In this scenario, a malicious host could still access tags in
: pages donated to a guest using MTE instructions (e.g., STG and LDG),
: bypassing the kernel's configuration."
: .
KVM: arm64: Use kvm_has_mte() in pKVM trap initialization
KVM: arm64: Inject UNDEF when accessing MTE sysregs with MTE disabled
KVM: arm64: Trap MTE access and discovery when MTE is disabled
KVM: arm64: Remove dead code resetting HCR_EL2 for pKVM
Computing RESx values is hard. Verifying that they are correct is
harder. Add a debugfs file called "resx" that will dump all the RESx
values for a given VM.
Marc Zyngier [Mon, 2 Feb 2026 18:43:27 +0000 (18:43 +0000)]
KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE
MIOCNCE had the potential to eat your data, and also was never
implemented by anyone. It's been retrospectively removed from
the architecture, and we're happy to follow that lead.
Marc Zyngier [Mon, 2 Feb 2026 18:43:25 +0000 (18:43 +0000)]
KVM: arm64: Simplify handling of full register invalid constraint
Now that we embed the RESx bits in the register description, it becomes
easier to deal with registers that are simply not valid, as their
existence is not satisfied by the configuration (SCTLR2_ELx without
FEAT_SCTLR2, for example). Such registers essentially become RES0 for
any bit that wasn't already advertised as RESx.
Marc Zyngier [Mon, 2 Feb 2026 18:43:23 +0000 (18:43 +0000)]
KVM: arm64: Simplify handling of HCR_EL2.E2H RESx
Now that we can link the RESx behaviour with the value of HCR_EL2.E2H,
we can trivially express the tautological constraint that makes E2H
a reserved value at all times.
Marc Zyngier [Mon, 2 Feb 2026 18:43:21 +0000 (18:43 +0000)]
KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags
"Thanks" to VHE, SCTLR_EL2 radically changes shape depending on the
value of HCR_EL2.E2H, as a lot of the bits that didn't have much
meaning with E2H=0 start impacting EL0 with E2H=1.
This has a direct impact on the RESx behaviour of these bits, and
we need a way to express them.
For this purpose, introduce two new constaints that, when the
controlling feature is not present, force the field to RES1 depending
on the value of E2H. Note that RES0 is still implicit,
This allows diverging RESx values depending on the value of E2H,
something that is required by a bunch of SCTLR_EL2 bits.
Marc Zyngier [Mon, 2 Feb 2026 18:43:20 +0000 (18:43 +0000)]
KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags
A bunch of EL2 configuration are very similar to their EL1 counterpart,
with the added constraint that HCR_EL2.E2H being 1.
For us, this means HCR_EL2.E2H being RES1, which is something we can
statically evaluate.
Add a REQUIRES_E2H1 constraint, which allows us to express conditions
in a much simpler way (without extra code). Existing occurrences are
converted, before we add a lot more.
Marc Zyngier [Mon, 2 Feb 2026 18:43:19 +0000 (18:43 +0000)]
KVM: arm64: Simplify FIXED_VALUE handling
The FIXED_VALUE qualifier (mostly used for HCR_EL2) is pointlessly
complicated, as it tries to piggy-back on the previous RES0 handling
while being done in a different phase, on different data.
Instead, make it an integral part of the RESx computation, and allow
it to directly set RESx bits. This is much easier to understand.
It also paves the way for some additional changes to that will allow
the full removal of the FIXED_VALUE handling.