Paolo Bonzini [Wed, 13 Nov 2024 11:24:19 +0000 (06:24 -0500)]
Merge tag 'kvm-x86-generic-6.13' of https://github.com/kvm-x86/linux into HEAD
KVM generic changes for 6.13
- Rework kvm_vcpu_on_spin() to use a single for-loop instead of making two
partial poasses over "all" vCPUs. Opportunistically expand the comment
to better explain the motivation and logic.
- Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead
of RCU, so that running a vCPU on a different task doesn't encounter
long stalls due to having to wait for all CPUs become quiescent.
KVM: s390: selftests: Add regression tests for PFCR subfunctions
Check if the PFCR query reported in userspace coincides with the
kernel reported function list. Right now we don't mask the functions
in the kernel so they have to be the same.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Hariharan Mari <hari55@linux.ibm.com> Link: https://lore.kernel.org/r/20241107152319.77816-5-brueckner@linux.ibm.com
[frankja@linux.ibm.com: Added commit description] Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107152319.77816-5-brueckner@linux.ibm.com>
Message-security-assist 11 introduces pckmo subfunctions to encrypt
hmac keys.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241107152319.77816-3-brueckner@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107152319.77816-3-brueckner@linux.ibm.com>
KVM: s390: selftests: Verify reject memory region operations for ucontrol VMs
Add a test case verifying KVM_SET_USER_MEMORY_REGION and
KVM_SET_USER_MEMORY_REGION2 cannot be executed on ucontrol VMs.
Executing this test case on not patched kernels will cause a null
pointer dereference in the host kernel.
This is fixed with commit:
commit 7816e58967d0 ("kvm: s390: Reject memory region operations for ucontrol VMs")
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20241107141024.238916-4-schlameuss@linux.ibm.com
[frankja@linux.ibm.com: Fixed patch prefix] Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107141024.238916-4-schlameuss@linux.ibm.com>
Add a test case manipulating s390 storage keys from within the ucontrol
VM.
Storage key instruction (ISKE, SSKE and RRBE) intercepts and
Keyless-subset facility are disabled on first use, where the skeys are
setup by KVM in non ucontrol VMs.
KVM: s390: selftests: Add uc_map_unmap VM test case
Add a test case verifying basic running and interaction of ucontrol VMs.
Fill the segment and page tables for allocated memory and map memory on
first access.
* uc_map_unmap
Store and load data to mapped and unmapped memory and use pic segment
translation handling to map memory on access.
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link:
https://lore.kernel.org/r/20241107141024.238916-2-schlameuss@linux.ibm.com
[frankja@linux.ibm.com: Fixed patch prefix] Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107141024.238916-2-schlameuss@linux.ibm.com>
Björn Töpel [Mon, 4 Nov 2024 19:15:01 +0000 (20:15 +0100)]
riscv: kvm: Fix out-of-bounds array access
In kvm_riscv_vcpu_sbi_init() the entry->ext_idx can contain an
out-of-bound index. This is used as a special marker for the base
extensions, that cannot be disabled. However, when traversing the
extensions, that special marker is not checked prior indexing the
array.
Add an out-of-bounds check to the function.
Fixes: 56d8a385b605 ("RISC-V: KVM: Allow some SBI extensions to be disabled by default") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20241104191503.74725-1-bjorn@kernel.org Signed-off-by: Anup Patel <anup@brainfault.org>
Yong-Xuan Wang [Tue, 29 Oct 2024 08:55:39 +0000 (16:55 +0800)]
RISC-V: KVM: Fix APLIC in_clrip and clripnum write emulation
In the section "4.7 Precise effects on interrupt-pending bits"
of the RISC-V AIA specification defines that:
"If the source mode is Level1 or Level0 and the interrupt domain
is configured in MSI delivery mode (domaincfg.DM = 1):
The pending bit is cleared whenever the rectified input value is
low, when the interrupt is forwarded by MSI, or by a relevant
write to an in_clrip register or to clripnum."
Update the aplic_write_pending() to match the spec.
KVM: Protect vCPU's "last run PID" with rwlock, not RCU
To avoid jitter on KVM_RUN due to synchronize_rcu(), use a rwlock instead
of RCU to protect vcpu->pid, a.k.a. the pid of the task last used to a
vCPU. When userspace is doing M:N scheduling of tasks to vCPUs, e.g. to
run SEV migration helper vCPUs during post-copy, the synchronize_rcu()
needed to change the PID associated with the vCPU can stall for hundreds
of milliseconds, which is problematic for latency sensitive post-copy
operations.
In the directed yield path, do not acquire the lock if it's contended,
i.e. if the associated PID is changing, as that means the vCPU's task is
already running.
Reported-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Steve Rutherford <srutherford@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240802200136.329973-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
KVM: Return '0' directly when there's no task to yield to
Do "return 0" instead of initializing and returning a local variable in
kvm_vcpu_yield_to(), e.g. so that it's more obvious what the function
returns if there is no task.
KVM: Rework core loop of kvm_vcpu_on_spin() to use a single for-loop
Rework kvm_vcpu_on_spin() to use a single for-loop instead of making "two"
passes over all vCPUs. Given N=kvm->last_boosted_vcpu, the logic is to
iterate from vCPU[N+1]..vcpu[N-1], i.e. using two loops is just a kludgy
way of handling the wrap from the last vCPU to vCPU0 when a boostable vCPU
isn't found in vcpu[N+1]..vcpu[MAX].
Open code the xa_load() instead of using kvm_get_vcpu() to avoid reading
online_vcpus in every loop, as well as the accompanying smp_rmb(), i.e.
make it a custom kvm_for_each_vcpu(), for all intents and purposes.
Oppurtunistically clean up the comment explaining the logic.
Anup Patel [Sun, 20 Oct 2024 19:47:34 +0000 (01:17 +0530)]
RISC-V: KVM: Use NACL HFENCEs for KVM request based HFENCEs
When running under some other hypervisor, use SBI NACL based HFENCEs
for TLB shoot-down via KVM requests. This makes HFENCEs faster whenever
SBI nested acceleration is available.
Anup Patel [Sun, 20 Oct 2024 19:47:33 +0000 (01:17 +0530)]
RISC-V: KVM: Save trap CSRs in kvm_riscv_vcpu_enter_exit()
Save trap CSRs in the kvm_riscv_vcpu_enter_exit() function instead of
the kvm_arch_vcpu_ioctl_run() function so that HTVAL and HTINST CSRs
are accessed in more optimized manner while running under some other
hypervisor.
Anup Patel [Sun, 20 Oct 2024 19:47:32 +0000 (01:17 +0530)]
RISC-V: KVM: Use SBI sync SRET call when available
Implement an optimized KVM world-switch using SBI sync SRET call
when SBI nested acceleration extension is available. This improves
KVM world-switch when KVM RISC-V is running as a Guest under some
other hypervisor.
Anup Patel [Sun, 20 Oct 2024 19:47:31 +0000 (01:17 +0530)]
RISC-V: KVM: Use nacl_csr_xyz() for accessing AIA CSRs
When running under some other hypervisor, prefer nacl_csr_xyz()
for accessing AIA CSRs in the run-loop. This makes CSR access
faster whenever SBI nested acceleration is available.
Anup Patel [Sun, 20 Oct 2024 19:47:30 +0000 (01:17 +0530)]
RISC-V: KVM: Use nacl_csr_xyz() for accessing H-extension CSRs
When running under some other hypervisor, prefer nacl_csr_xyz()
for accessing H-extension CSRs in the run-loop. This makes CSR
access faster whenever SBI nested acceleration is available.
Anup Patel [Sun, 20 Oct 2024 19:47:29 +0000 (01:17 +0530)]
RISC-V: KVM: Add common nested acceleration support
Add a common nested acceleration support which will be shared by
all parts of KVM RISC-V. This nested acceleration support detects
and enables SBI NACL extension usage based on static keys which
ensures minimum impact on the non-nested scenario.
Anup Patel [Sun, 20 Oct 2024 19:47:26 +0000 (01:17 +0530)]
RISC-V: KVM: Replace aia_set_hvictl() with aia_hvictl_value()
The aia_set_hvictl() internally writes the HVICTL CSR which makes
it difficult to optimize the CSR write using SBI NACL extension for
kvm_riscv_vcpu_aia_update_hvip() function so replace aia_set_hvictl()
with new aia_hvictl_value() which only computes the HVICTL value.
Anup Patel [Sun, 20 Oct 2024 19:47:25 +0000 (01:17 +0530)]
RISC-V: KVM: Break down the __kvm_riscv_switch_to() into macros
Break down the __kvm_riscv_switch_to() function into macros so that
these macros can be later re-used by SBI NACL extension based low-level
switch function.
Anup Patel [Sun, 20 Oct 2024 19:47:24 +0000 (01:17 +0530)]
RISC-V: KVM: Save/restore SCOUNTEREN in C source
The SCOUNTEREN CSR need not be saved/restored in the low-level
__kvm_riscv_switch_to() function hence move the SCOUNTEREN CSR
save/restore to the kvm_riscv_vcpu_swap_in_guest_state() and
kvm_riscv_vcpu_swap_in_host_state() functions in C sources.
Also, re-arrange the CSR save/restore and related GPR usage in
the low-level __kvm_riscv_switch_to() low-level function.
Anup Patel [Sun, 20 Oct 2024 19:47:23 +0000 (01:17 +0530)]
RISC-V: KVM: Save/restore HSTATUS in C source
We will be optimizing HSTATUS CSR access via shared memory setup
using the SBI nested acceleration extension. To facilitate this,
we first move HSTATUS save/restore in kvm_riscv_vcpu_enter_exit().
Quan Zhou [Tue, 15 Oct 2024 02:58:37 +0000 (10:58 +0800)]
riscv: KVM: add basic support for host vs guest profiling
For the information collected on the host side, we need to
identify which data originates from the guest and record
these events separately, this can be achieved by having
KVM register perf callbacks.
Linus Torvalds [Sun, 27 Oct 2024 19:01:36 +0000 (09:01 -1000)]
Merge tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Prevent a certain range of pages which get marked as hypervisor-only,
to get allocated to a CoCo (SNP) guest which cannot use them and thus
fail booting
- Fix the microcode loader on AMD to pay attention to the stepping of a
patch and to handle the case where a BIOS config option splits the
machine into logical NUMA nodes per L3 cache slice
- Disable LAM from being built by default due to security concerns
* tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Ensure that RMP table fixups are reserved
x86/microcode/AMD: Split load_microcode_amd()
x86/microcode/AMD: Pay attention to the stepping dynamically
x86/lam: Disable ADDRESS_MASKING in most cases
Linus Torvalds [Sun, 27 Oct 2024 18:56:22 +0000 (08:56 -1000)]
Merge tag 'ftrace-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace fixes from Steven Rostedt:
- Fix missing mutex unlock in error path of register_ftrace_graph()
A previous fix added a return on an error path and forgot to unlock
the mutex. Instead of dealing with error paths, use guard(mutex) as
the mutex is just released at the exit of the function anyway. Other
functions in this file should be updated with this, but that's a
cleanup and not a fix.
- Change cpuhp setup name to be consistent with other cpuhp states
The same fix that the above patch fixes added a cpuhp_setup_state()
call with the name of "fgraph_idle_init". I was informed that it
should instead be something like: "fgraph:online". Update that too.
* tag 'ftrace-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fgraph: Change the name of cpuhp state to "fgraph:online"
fgraph: Fix missing unlock in register_ftrace_graph()
Linus Torvalds [Sun, 27 Oct 2024 18:40:33 +0000 (08:40 -1000)]
Merge tag 'platform-drivers-x86-v6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
- Asus thermal profile fix, fixing performance issues on Lunar Lake
- Intel PMC: one revert for a lockdep issue and one bugfix
- Dell WMI: Ignore some WMI events on suspend/resume to silence warnings
* tag 'platform-drivers-x86-v6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: asus-wmi: Fix thermal profile initialization
platform/x86: dell-wmi: Ignore suspend notifications
platform/x86/intel/pmc: Fix pmc_core_iounmap to call iounmap for valid addresses
platform/x86:intel/pmc: Revert "Enable the ACPI PM Timer to be turned off when suspended"
Linus Torvalds [Sun, 27 Oct 2024 18:36:01 +0000 (08:36 -1000)]
Merge tag 'firewire-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
"A single commit to resolve a regression existing in v6.11 or later.
The change in 1394 OHCI driver in v6.11 kernel could cause general
protection faults when rediscovering nodes in IEEE 1394 bus while
holding a spin lock. Consequently, watchdog checks can report a hard
lockup.
Currently, this issue is observed primarily during the system resume
phase when using an extra node with three ports or more is used.
However, it could potentially occur in the other cases as well"
* tag 'firewire-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: fix invalid port index for parent device
Linus Torvalds [Sun, 27 Oct 2024 18:29:36 +0000 (08:29 -1000)]
Merge tag 'block-6.12-20241026' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Pull request for MD via Song fixing a few issues
- Fix a wrong check in blk_rq_map_user_bvec(), causing IO errors on
passthrough IO (Xinyu)
* tag 'block-6.12-20241026' of git://git.kernel.dk/linux:
block: fix sanity checks in blk_rq_map_user_bvec
md/raid10: fix null ptr dereference in raid10_size()
md: ensure child flush IO does not affect origin bio->bi_status
Linus Torvalds [Sun, 27 Oct 2024 18:23:49 +0000 (08:23 -1000)]
Merge tag 'xfs-6.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
- Fix recovery of allocator ops after a growfs
- Do not fail repairs on metadata files with no attr fork
* tag 'xfs-6.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: update the pag for the last AG at recovery time
xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag
xfs: error out when a superblock buffer update reduces the agcount
xfs: update the file system geometry after recoverying superblock buffers
xfs: merge the perag freeing helpers
xfs: pass the exact range to initialize to xfs_initialize_perag
xfs: don't fail repairs on metadata files with no attr fork
Takashi Sakamoto [Fri, 25 Oct 2024 03:41:37 +0000 (12:41 +0900)]
firewire: core: fix invalid port index for parent device
In a commit 24b7f8e5cd65 ("firewire: core: use helper functions for self
ID sequence"), the enumeration over self ID sequence was refactored with
some helper functions with KUnit tests. These helper functions are
guaranteed to work expectedly by the KUnit tests, however their application
includes a mistake to assign invalid value to the index of port connected
to parent device.
This bug affects the case that any extra node devices which has three or
more ports are connected to 1394 OHCI controller. In the case, the path
to update the tree cache could hits WARN_ON(), and gets general protection
fault due to the access to invalid address computed by the invalid value.
This commit fixes the bug to assign correct port index.
Cc: stable@vger.kernel.org Reported-by: Edmund Raile <edmund.raile@proton.me> Closes: https://lore.kernel.org/lkml/8a9902a4ece9329af1e1e42f5fea76861f0bf0e8.camel@proton.me/ Fixes: 24b7f8e5cd65 ("firewire: core: use helper functions for self ID sequence") Link: https://lore.kernel.org/r/20241025034137.99317-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
When support for vivobook fan profiles was added, the initial
call to throttle_thermal_policy_set_default() was removed, which
however is necessary for full initialization.
Fix this by calling throttle_thermal_policy_set_default() again
when setting up the platform profile.
Fixes: bcbfcebda2cb ("platform/x86: asus-wmi: add support for vivobook fan profiles") Reported-by: Michael Larabel <Michael@phoronix.com> Closes: https://www.phoronix.com/review/lunar-lake-xe2/5 Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20241025191514.15032-2-W_Armin@gmx.de Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Linus Torvalds [Fri, 25 Oct 2024 22:25:02 +0000 (15:25 -0700)]
Merge tag '9p-for-6.12-rc5' of https://github.com/martinetd/linux
Pull more 9p reverts from Dominique Martinet:
"Revert patches causing inode collision problems.
The code simplification introduced significant regressions on servers
that do not remap inode numbers when exporting multiple underlying
filesystems with colliding inodes. See the top-most revert (commit be2ca3825372) for details.
This problem had been ignored for too long and the reverts will also
head to stable (6.9+).
I'm confident this set of patches gets us back to previous behaviour
(another related patch had already been reverted back in April and
we're almost back to square 1, and the rest didn't touch inode
lifecycle)"
* tag '9p-for-6.12-rc5' of https://github.com/martinetd/linux:
Revert "fs/9p: simplify iget to remove unnecessary paths"
Revert "fs/9p: fix uaf in in v9fs_stat2inode_dotl"
Revert "fs/9p: remove redundant pointer v9ses"
Revert " fs/9p: mitigate inode collisions"
Linus Torvalds [Fri, 25 Oct 2024 18:45:22 +0000 (11:45 -0700)]
Merge tag 'v6.12-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix init module error caseb
- Fix memory allocation error path (for passwords) in mount
* tag 'v6.12-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix warning when destroy 'cifs_io_request_pool'
smb: client: Handle kstrdup failures for passwords
Linus Torvalds [Fri, 25 Oct 2024 18:41:18 +0000 (11:41 -0700)]
Merge tag 'fuse-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
- Fix cached size after passthrough writes
This fix needed a trivial change in the backing-file API, which
resulted in some non-fuse files being touched.
- Revert a commit meant as a cleanup but which triggered a WARNING
- Remove a stray debug line left-over
* tag 'fuse-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: remove stray debug line
Revert "fuse: move initialization of fuse_file to fuse_writepages() instead of in callback"
fuse: update inode size after extending passthrough write
fs: pass offset and result to backing_file end_write() callback
Linus Torvalds [Fri, 25 Oct 2024 18:38:15 +0000 (11:38 -0700)]
Merge tag 'nfsd-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix a couple of use-after-free bugs
* tag 'nfsd-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: cancel nfsd_shrinker_work using sync mode in nfs4_state_shutdown_net
nfsd: fix race between laundromat and free_stateid
Linus Torvalds [Fri, 25 Oct 2024 18:04:34 +0000 (11:04 -0700)]
Merge tag 'acpi-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix an ACPI PRM (Platform Runtime Mechanism) issue and add two
new DMI quirks, one for an ACPI IRQ override and one for lid switch
detection:
- Make acpi_parse_prmt() look for EFI_MEMORY_RUNTIME memory regions
only to comply with the UEFI specification and make PRM use
efi_guid_t instead of guid_t to avoid a compiler warning triggered
by that change (Koba Ko, Dan Carpenter)
- Add an ACPI IRQ override quirk for LG 16T90SP (Christian Heusel)
- Add a lid switch detection quirk for Samsung Galaxy Book2 (Shubham
Panwar)"
* tag 'acpi-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PRM: Clean up guid type in struct prm_handler_info
ACPI: button: Add DMI quirk for Samsung Galaxy Book2 to fix initial lid detection issue
ACPI: resource: Add LG 16T90SP to irq1_level_low_skip_override[]
ACPI: PRM: Find EFI_MEMORY_RUNTIME block for PRM handler and context
Linus Torvalds [Fri, 25 Oct 2024 18:00:50 +0000 (11:00 -0700)]
Merge tag 'pm-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Update cpufreq documentation to match the code after recent changes
(Christian Loehle), fix a units conversion issue in the CPPC cpufreq
driver (liwei), and fix an error check in the dtpm_devfreq power
capping driver (Yuan Can)"
* tag 'pm-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: CPPC: fix perf_to_khz/khz_to_perf conversion exception
powercap: dtpm_devfreq: Fix error check against dev_pm_qos_add_request()
cpufreq: docs: Reflect latency changes in docs
Linus Torvalds [Fri, 25 Oct 2024 17:56:06 +0000 (10:56 -0700)]
Merge tag 'pci-v6.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Hold the rescan lock while adding devices to avoid race with
concurrent pwrctl rescan that can lead to a crash (Bartosz
Golaszewski)
- Avoid binding pwrctl driver to QCom WCN wifi if the DT lacks the
necessary PMU regulator descriptions (Bartosz Golaszewski)
* tag 'pci-v6.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/pwrctl: Abandon QCom WCN probe on pre-pwrseq device-trees
PCI: Hold rescan lock while adding devices during host probe
Linus Torvalds [Fri, 25 Oct 2024 17:42:29 +0000 (10:42 -0700)]
Merge tag 'ata-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
- Fix the handling of ATA commands that timeout (command that did not
receive a completion interrupt within the configured timeout time).
Commands that timeout, while also having either the FAILFAST flag
set, or the command being a passthrough command, should never be
retried. Restore this behavior (as it was before v6.12-rc1).
* tag 'ata-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata: Set DID_TIME_OUT for commands that actually timed out
Paolo Bonzini [Fri, 25 Oct 2024 17:38:16 +0000 (13:38 -0400)]
Merge branch 'kvm-no-struct-page' into HEAD
TL;DR: Eliminate KVM's long-standing (and heinous) behavior of essentially
guessing which pfns are refcounted pages (see kvm_pfn_to_refcounted_page()).
Getting there requires "fixing" arch code that isn't obviously broken.
Specifically, to get rid of kvm_pfn_to_refcounted_page(), KVM needs to
stop marking pages/folios dirty/accessed based solely on the pfn that's
stored in KVM's stage-2 page tables.
Instead of tracking which SPTEs correspond to refcounted pages, simply
remove all of the code that operates on "struct page" based ona the pfn
in stage-2 PTEs. This is the back ~40-50% of the series.
For x86 in particular, which sets accessed/dirty status when that info
would be "lost", e.g. when SPTEs are zapped or KVM clears the dirty flag
in a SPTE, foregoing the updates provides very measurable performance
improvements for related operations. E.g. when clearing dirty bits as
part of dirty logging, and zapping SPTEs to reconstitue huge pages when
disabling dirty logging.
The front ~40% of the series is cleanups and prep work, and most of it is
x86 focused (purely because x86 added the most special cases, *sigh*).
E.g. several of the inputs to hva_to_pfn() (and it's myriad wrappers),
can be removed by cleaning up and deduplicating x86 code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Fri, 25 Oct 2024 17:35:29 +0000 (10:35 -0700)]
Merge tag 'sound-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The majority of changes here are about ASoC.
There are two core changes in ASoC (the bump of minimal topology ABI
version and the fix for references of components in DAPM code), and
others are mostly various device-specific fixes for SoundWire, AMD,
Intel, SOF, Qualcomm and FSL, in addition to a few usual HD-audio
quirks and fixes"
* tag 'sound-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits)
ALSA: hda/realtek: Update default depop procedure
ASoC: qcom: sc7280: Fix missing Soundwire runtime stream alloc
ASoC: fsl_micfil: Add sample rate constraint
ASoC: rt722-sdca: increase clk_stop_timeout to fix clock stop issue
ALSA: hda/tas2781: select CRC32 instead of CRC32_SARWATE
ALSA: hda/realtek: Add subwoofer quirk for Acer Predator G9-593
ALSA: firewire-lib: Avoid division by zero in apply_constraint_to_size()
ASoC: fsl_micfil: Add a flag to distinguish with different volume control types
ASoC: codecs: lpass-rx-macro: fix RXn(rx,n) macro for DSM_CTL and SEC7 regs
ASoC: Change my e-mail to gmail
ASoC: Intel: soc-acpi: lnl: Add match entry for TM2 laptops
ASoC: amd: yc: Fix non-functional mic on ASUS E1404FA
ASoC: SOF: Intel: hda: Always clean up link DMA during stop
soundwire: intel_ace2x: Send PDI stream number during prepare
ASoC: SOF: Intel: hda: Handle prepare without close for non-HDA DAI's
ASoC: SOF: ipc4-topology: Do not set ALH node_id for aggregated DAIs
MAINTAINERS: Update maintainer list for MICROCHIP ASOC, SSC and MCP16502 drivers
ASoC: qcom: Select missing common Soundwire module code on SDM845
ASoC: fsl_esai: change dev_warn to dev_dbg in irq handler
ASoC: rsnd: Fix probe failure on HiHope boards due to endpoint parsing
...
Linus Torvalds [Fri, 25 Oct 2024 17:29:51 +0000 (10:29 -0700)]
Merge tag 'drm-fixes-2024-10-25' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly drm fixes, mostly amdgpu and xe, with minor bridge and an i915
Kconfig fix. Nothing too scary and it seems to be pretty quiet.
i915:
- Fix DRM_I915_GVT_KVMGT dependencies in Kconfig
xe:
- Increase invalidation timeout to avoid errors in some hosts
- Flush worker on timeout
- Better handling for force wake failure
- Improve argument check on user fence creation
- Don't restart parallel queues multiple times on GT reset
bridge:
- aux: Fix assignment of OF node
- tc358767: Add missing of_node_put() in error path"
* tag 'drm-fixes-2024-10-25' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe: Don't restart parallel queues multiple times on GT reset
drm/xe/ufence: Prefetch ufence addr to catch bogus address
drm/xe: Handle unreliable MMIO reads during forcewake
drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout
drm/xe: Enlarge the invalidation timeout from 150 to 500
drm/amdgpu: handle default profile on on devices without fullscreen 3D
drm/amd/display: Disable PSR-SU on Parade 08-01 TCON too
drm/amdgpu: fix random data corruption for sdma 7
drm/amd/display: temp w/a for DP Link Layer compliance
drm/amd/display: temp w/a for dGPU to enter idle optimizations
drm/amd/pm: update deep sleep status on smu v14.0.2/3
drm/amd/pm: update overdrive function on smu v14.0.2/3
drm/amd/pm: update the driver-fw interface file for smu v14.0.2/3
drm/amd: Guard against bad data for ATIF ACPI method
drm/bridge: tc358767: fix missing of_node_put() in for_each_endpoint_of_node()
drm/bridge: Fix assignment of the of_node of the parent to aux bridge
i915: fix DRM_I915_GVT_KVMGT dependencies
KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page"
Now that KVM no longer relies on an ugly heuristic to find its struct page
references, i.e. now that KVM can't get false positives on VM_MIXEDMAP
pfns, remove KVM's hack to elevate the refcount for pfns that happen to
have a valid struct page. In addition to removing a long-standing wart
in KVM, this allows KVM to map non-refcounted struct page memory into the
guest, e.g. for exposing GPU TTM buffers to KVM guests.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-86-seanjc@google.com>
KVM: arm64: Don't mark "struct page" accessed when making SPTE young
Don't mark pages/folios as accessed in the primary MMU when making a SPTE
young in KVM's secondary MMU, as doing so relies on
kvm_pfn_to_refcounted_page(), and generally speaking is unnecessary and
wasteful. KVM participates in page aging via mmu_notifiers, so there's no
need to push "accessed" updates to the primary MMU.
Dropping use of kvm_set_pfn_accessed() also paves the way for removing
kvm_pfn_to_refcounted_page() and all its users.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-84-seanjc@google.com>
KVM: x86/mmu: Don't mark "struct page" accessed when zapping SPTEs
Don't mark pages/folios as accessed in the primary MMU when zapping SPTEs,
as doing so relies on kvm_pfn_to_refcounted_page(), and generally speaking
is unnecessary and wasteful. KVM participates in page aging via
mmu_notifiers, so there's no need to push "accessed" updates to the
primary MMU.
And if KVM zaps a SPTe in response to an mmu_notifier, marking it accessed
_after_ the primary MMU has decided to zap the page is likely to go
unnoticed, i.e. odds are good that, if the page is being zapped for
reclaim, the page will be swapped out regardless of whether or not KVM
marks the page accessed.
Dropping x86's use of kvm_set_pfn_accessed() also paves the way for
removing kvm_pfn_to_refcounted_page() and all its users.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-83-seanjc@google.com>
KVM: Make kvm_follow_pfn.refcounted_page a required field
Now that the legacy gfn_to_pfn() APIs are gone, and all callers of
hva_to_pfn() pass in a refcounted_page pointer, make it a required field
to ensure all future usage in KVM plays nice.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-82-seanjc@google.com>
KVM: s390: Use kvm_release_page_dirty() to unpin "struct page" memory
Use kvm_release_page_dirty() when unpinning guest pages, as the pfn was
retrieved via pin_guest_page(), i.e. is guaranteed to be backed by struct
page memory. This will allow dropping kvm_release_pfn_dirty() and
friends.
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-81-seanjc@google.com>
KVM: PPC: Explicitly require struct page memory for Ultravisor sharing
Explicitly require "struct page" memory when sharing memory between
guest and host via an Ultravisor. Given the number of pfn_to_page()
calls in the code, it's safe to assume that KVM already requires that the
pfn returned by gfn_to_pfn() is backed by struct page, i.e. this is
likely a bug fix, not a reduction in KVM capabilities.
Switching to gfn_to_page() will eventually allow removing gfn_to_pfn()
and kvm_pfn_to_refcounted_page().
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-79-seanjc@google.com>
KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from userspace
Use __gfn_to_page() instead when copying MTE tags between guest and
userspace. This will eventually allow removing gfn_to_pfn_prot(),
gfn_to_pfn(), kvm_pfn_to_refcounted_page(), and related APIs.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-78-seanjc@google.com>
KVM: Add support for read-only usage of gfn_to_page()
Rework gfn_to_page() to support read-only accesses so that it can be used
by arm64 to get MTE tags out of guest memory.
Opportunistically rewrite the comment to be even more stern about using
gfn_to_page(), as there are very few scenarios where requiring a struct
page is actually the right thing to do (though there are such scenarios).
Add a FIXME to call out that KVM probably should be pinning pages, not
just getting pages.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-77-seanjc@google.com>
KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz instructions
Use kvm_vcpu_map() when patching dcbz in guest memory, as a regular GUP
isn't technically sufficient when writing to data in the target pages.
As per Documentation/core-api/pin_user_pages.rst:
Correct (uses FOLL_PIN calls):
pin_user_pages()
write to the data within the pages
unpin_user_pages()
INCORRECT (uses FOLL_GET calls):
get_user_pages()
write to the data within the pages
put_page()
As a happy bonus, using kvm_vcpu_{,un}map() takes care of creating a
mapping and marking the page dirty.
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-75-seanjc@google.com>
KVM: PPC: Remove extra get_page() to fix page refcount leak
Don't manually do get_page() when patching dcbz, as gfn_to_page() gifts
the caller a reference. I.e. doing get_page() will leak the page due to
not putting all references.
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-74-seanjc@google.com>
KVM: MIPS: Use kvm_faultin_pfn() to map pfns into the guest
Convert MIPS to kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-73-seanjc@google.com>
KVM: MIPS: Mark "struct page" pfns accessed prior to dropping mmu_lock
Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that MIPS can convert to kvm_release_faultin_page() without tripping
its lockdep assertion on mmu_lock being held.
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-72-seanjc@google.com>
KVM: MIPS: Mark "struct page" pfns accessed only in "slow" page fault path
Mark pages accessed only in the slow page fault path in order to remove
an unnecessary user of kvm_pfn_to_refcounted_page(). Marking pages
accessed in the primary MMU during KVM page fault handling isn't harmful,
but it's largely pointless and likely a waste of a cycles since the
primary MMU will call into KVM via mmu_notifiers when aging pages. I.e.
KVM participates in a "pull" model, so there's no need to also "push"
updates.
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-71-seanjc@google.com>
KVM: MIPS: Mark "struct page" pfns dirty only in "slow" page fault path
Mark pages/folios dirty only the slow page fault path, i.e. only when
mmu_lock is held and the operation is mmu_notifier-protected, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).
KVM: LoongArch: Use kvm_faultin_pfn() to map pfns into the guest
Convert LoongArch to kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-69-seanjc@google.com>
KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lock
Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that LoongArch can convert to kvm_release_faultin_page() without
tripping its lockdep assertion on mmu_lock being held.
Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-68-seanjc@google.com>
KVM: LoongArch: Mark "struct page" pfns accessed only in "slow" page fault path
Mark pages accessed only in the slow path, before dropping mmu_lock when
faulting in guest memory so that LoongArch can convert to
kvm_release_faultin_page() without tripping its lockdep assertion on
mmu_lock being held.
Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-67-seanjc@google.com>
KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path
Mark pages/folios dirty only the slow page fault path, i.e. only when
mmu_lock is held and the operation is mmu_notifier-protected, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).
KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PR
Convert Book3S PR to __kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-65-seanjc@google.com>
KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after installing PTE
Mark pages/folios dirty/accessed after installing a PTE, and more
specifically after acquiring mmu_lock and checking for an mmu_notifier
invalidation. Marking a page/folio dirty after it has been written back
can make some filesystems unhappy (backing KVM guests will such filesystem
files is uncommon, and the race is minuscule, hence the lack of complaints).
See the link below for details.
This will also allow converting Book3S to kvm_release_faultin_page(),
which requires that mmu_lock be held (for the aforementioned reason).
KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s Radix
Replace Book3s Radix's homebrewed (read: copy+pasted) fault-in logic with
__kvm_faultin_pfn(), which functionally does pretty much the exact same
thing.
Note, when the code was written, KVM indeed didn't do fast GUP without
"!atomic && !async", but that has long since changed (KVM tries fast GUP
for all writable mappings).
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-62-seanjc@google.com>
KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HV
Replace Book3s HV's homebrewed fault-in logic with __kvm_faultin_pfn(),
which functionally does pretty much the exact same thing.
Note, when the code was written, KVM indeed didn't do fast GUP without
"!atomic && !async", but that has long since changed (KVM tries fast GUP
for all writable mappings).
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-61-seanjc@google.com>
KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest
Convert RISC-V to __kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.
Opportunisticaly fix a s/priort/prior typo in the related comment.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-60-seanjc@google.com>
KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock
Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that RISC-V can convert to kvm_release_faultin_page() without tripping
its lockdep assertion on mmu_lock being held. Marking pages accessed
outside of mmu_lock is ok (not great, but safe), but marking pages _dirty_
outside of mmu_lock can make filesystems unhappy (see the link below).
Do both under mmu_lock to minimize the chances of doing the wrong thing in
the future.
Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-59-seanjc@google.com>
KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed
Don't mark pages dirty if KVM bails from the page fault handler without
installing a stage-2 mapping, i.e. if the page is guaranteed to not be
written by the guest.
In addition to being a (very) minor fix, this paves the way for converting
RISC-V to use kvm_release_faultin_page().
Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-58-seanjc@google.com>
KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock
Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).
While scary sounding, practically speaking the worst case scenario is that
KVM would trigger this WARN in filemap_unaccount_folio():
/*
* At this point folio must be either written or cleaned by
* truncate. Dirty folio here signals a bug and loss of
* unwritten data - on ordinary filesystems.
*
* But it's harmless on in-memory filesystems like tmpfs; and can
* occur when a driver which did get_user_pages() sets page dirty
* before putting it, while the inode is being finally evicted.
*
* Below fixes dirty accounting after removing the folio entirely
* but leaves the dirty flag set: it has no effect for truncated
* folio and anyway will be cleared before returning folio to
* buddy allocator.
*/
if (WARN_ON_ONCE(folio_test_dirty(folio) &&
mapping_can_writeback(mapping)))
folio_account_cleaned(folio, inode_to_wb(mapping->host));
KVM won't actually write memory because the stage-2 mappings are protected
by the mmu_notifier, i.e. there is no risk of loss of data, even if the
VM were backed by memory that needs writeback.
See the link below for additional details.
This will also allow converting arm64 to kvm_release_faultin_page(), which
requires that mmu_lock be held (for the aforementioned reason).
KVM: PPC: e500: Use __kvm_faultin_pfn() to handle page faults
Convert PPC e500 to use __kvm_faultin_pfn()+kvm_release_faultin_page(),
and continue the inexorable march towards the demise of
kvm_pfn_to_refcounted_page().
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-55-seanjc@google.com>
KVM: PPC: e500: Mark "struct page" pfn accessed before dropping mmu_lock
Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that shadow_map() can convert to kvm_release_faultin_page() without
tripping its lockdep assertion on mmu_lock being held. Marking pages
accessed outside of mmu_lock is ok (not great, but safe), but marking
pages _dirty_ outside of mmu_lock can make filesystems unhappy.
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-54-seanjc@google.com>
KVM: PPC: e500: Mark "struct page" dirty in kvmppc_e500_shadow_map()
Mark the underlying page as dirty in kvmppc_e500_ref_setup()'s sole
caller, kvmppc_e500_shadow_map(), which will allow converting e500 to
__kvm_faultin_pfn() + kvm_release_faultin_page() without having to do
a weird dance between ref_setup() and shadow_map().
Opportunistically drop the redundant kvm_set_pfn_accessed(), as
shadow_map() puts the page via kvm_release_pfn_clean().
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-53-seanjc@google.com>
KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn
Use __kvm_faultin_page() get the APIC access page so that KVM can
precisely release the refcounted page, i.e. to remove yet another user
of kvm_pfn_to_refcounted_page(). While the path isn't handling a guest
page fault, the semantics are effectively the same; KVM just happens to
be mapping the pfn into a VMCS field instead of a secondary MMU.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-52-seanjc@google.com>
KVM: VMX: Hold mmu_lock until page is released when updating APIC access page
Hold mmu_lock across kvm_release_pfn_clean() when refreshing the APIC
access page address to ensure that KVM doesn't mark a page/folio as
accessed after it has been unmapped. Practically speaking marking a folio
accesses is benign in this scenario, as KVM does hold a reference (it's
really just marking folios dirty that is problematic), but there's no
reason not to be paranoid (moving the APIC access page isn't a hot path),
and no reason to be different from other mmu_notifier-protected flows in
KVM.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-51-seanjc@google.com>
KVM: Move x86's API to release a faultin page to common KVM
Move KVM x86's helper that "finishes" the faultin process to common KVM
so that the logic can be shared across all architectures. Note, not all
architectures implement a fast page fault path, but the gist of the
comment applies to all architectures.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-50-seanjc@google.com>
KVM: x86/mmu: Don't mark unused faultin pages as accessed
When finishing guest page faults, don't mark pages as accessed if KVM
is resuming the guest _without_ installing a mapping, i.e. if the page
isn't being used. While it's possible that marking the page accessed
could avoid minor thrashing due to reclaiming a page that the guest is
about to access, it's far more likely that the gfn=>pfn mapping was
was invalidated, e.g. due a memslot change, or because the corresponding
VMA is being modified.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-49-seanjc@google.com>
KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns
Now that all x86 page fault paths precisely track refcounted pages, use
Use kvm_page_fault.refcounted_page to put references to struct page memory
when finishing page faults. This is a baby step towards eliminating
kvm_pfn_to_refcounted_page().
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-48-seanjc@google.com>
KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn()
Provide the "struct page" associated with a guest_memfd pfn as an output
from __kvm_gmem_get_pfn() so that KVM guest page fault handlers can
directly put the page instead of having to rely on
kvm_pfn_to_refcounted_page().
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-47-seanjc@google.com>
KVM: guest_memfd: Pass index, not gfn, to __kvm_gmem_get_pfn()
Refactor guest_memfd usage of __kvm_gmem_get_pfn() to pass the index into
the guest_memfd file instead of the gfn, i.e. resolve the index based on
the slot+gfn in the caller instead of in __kvm_gmem_get_pfn(). This will
allow kvm_gmem_get_pfn() to retrieve and return the specific "struct page",
which requires the index into the folio, without a redoing the index
calculation multiple times (which isn't costly, just hard to follow).
Opportunistically add a kvm_gmem_get_index() helper to make the copy+pasted
code easier to understand.
Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-46-seanjc@google.com>
KVM: x86/mmu: Convert page fault paths to kvm_faultin_pfn()
Convert KVM x86 to use the recently introduced __kvm_faultin_pfn().
Opportunstically capture the refcounted_page grabbed by KVM for use in
future changes.
No functional change intended.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-45-seanjc@google.com>
KVM: Add kvm_faultin_pfn() to specifically service guest page faults
Add a new dedicated API, kvm_faultin_pfn(), for servicing guest page
faults, i.e. for getting pages/pfns that will be mapped into the guest via
an mmu_notifier-protected KVM MMU. Keep struct kvm_follow_pfn buried in
internal code, as having __kvm_faultin_pfn() take "out" params is actually
cleaner for several architectures, e.g. it allows the caller to have its
own "page fault" structure without having to marshal data to/from
kvm_follow_pfn.
Long term, common KVM would ideally provide a kvm_page_fault structure, a
la x86's struct of the same name. But all architectures need to be
converted to a common API before that can happen.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-44-seanjc@google.com>
KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte()
Move the marking of folios dirty from make_spte() out to its callers,
which have access to the _struct page_, not just the underlying pfn.
Once all architectures follow suit, this will allow removing KVM's ugly
hack where KVM elevates the refcount of VM_MIXEDMAP pfns that happen to
be struct page memory.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-42-seanjc@google.com>
KVM: x86/mmu: Add helper to "finish" handling a guest page fault
Add a helper to finish/complete the handling of a guest page, e.g. to
mark the pages accessed and put any held references. In the near
future, this will allow improving the logic without having to copy+paste
changes into all page fault paths. And in the less near future, will
allow sharing the "finish" API across all architectures.
No functional change intended.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-41-seanjc@google.com>