git.ipfire.org Git - thirdparty/linux.git/log

Merge tag 'kvm-x86-generic-6.13' of https://github.com/kvm-x86/linux into HEAD

KVM generic changes for 6.13

- Rework kvm_vcpu_on_spin() to use a single for-loop instead of making two
   partial poasses over "all" vCPUs.  Opportunistically expand the comment
   to better explain the motivation and logic.

- Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead
   of RCU, so that running a vCPU on a different task doesn't encounter
   long stalls due to having to wait for all CPUs become quiescent.

Merge tag 'kvm-s390-next-6.13-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

- second part of the ucontrol selftest
- cpumodel sanity check selftest
- gen17 cpumodel changes

KVM: s390: selftests: Add regression tests for PFCR subfunctions

Check if the PFCR query reported in userspace coincides with the
kernel reported function list. Right now we don't mask the functions
in the kernel so they have to be the same.

Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Hariharan Mari <hari55@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107152319.77816-5-brueckner@linux.ibm.com
[frankja@linux.ibm.com: Added commit description]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107152319.77816-5-brueckner@linux.ibm.com>

KVM: s390: add gen17 facilities to CPU model

Add gen17 facilities and let KVM_CAP_S390_VECTOR_REGISTERS handle
the enablement of the vector extension facilities.

Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107152319.77816-4-brueckner@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107152319.77816-4-brueckner@linux.ibm.com>

KVM: s390: add msa11 to cpu model

Message-security-assist 11 introduces pckmo subfunctions to encrypt
hmac keys.

Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107152319.77816-3-brueckner@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107152319.77816-3-brueckner@linux.ibm.com>

KVM: s390: add concurrent-function facility to cpu model

Adding support for concurrent-functions facility which provides
additional subfunctions.

Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107152319.77816-2-brueckner@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107152319.77816-2-brueckner@linux.ibm.com>

KVM: s390: selftests: correct IP.b length in uc_handle_sieic debug output

The length of the interrupt parameters (IP) are:
a: 2 bytes
b: 4 bytes

Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107141024.238916-6-schlameuss@linux.ibm.com
[frankja@linux.ibm.com: Fixed patch prefix]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107141024.238916-6-schlameuss@linux.ibm.com>

KVM: s390: selftests: Fix whitespace confusion in ucontrol test

Checkpatch thinks that we're doing a multiplication but we're obviously
not. Fix 4 instances where we adhered to wrong checkpatch advice.

Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107141024.238916-5-schlameuss@linux.ibm.com
[frankja@linux.ibm.com: Fixed patch prefix]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107141024.238916-5-schlameuss@linux.ibm.com>

KVM: s390: selftests: Verify reject memory region operations for ucontrol VMs

Add a test case verifying KVM_SET_USER_MEMORY_REGION and
KVM_SET_USER_MEMORY_REGION2 cannot be executed on ucontrol VMs.

Executing this test case on not patched kernels will cause a null
pointer dereference in the host kernel.
This is fixed with commit:
commit 7816e58967d0 ("kvm: s390: Reject memory region operations for ucontrol VMs")

Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20241107141024.238916-4-schlameuss@linux.ibm.com
[frankja@linux.ibm.com: Fixed patch prefix]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107141024.238916-4-schlameuss@linux.ibm.com>

KVM: s390: selftests: Add uc_skey VM test case

Add a test case manipulating s390 storage keys from within the ucontrol
VM.

Storage key instruction (ISKE, SSKE and RRBE) intercepts and
Keyless-subset facility are disabled on first use, where the skeys are
setup by KVM in non ucontrol VMs.

Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Link: https://lore.kernel.org/r/20241108091620.289406-1-schlameuss@linux.ibm.com
Acked-by: Janosch Frank <frankja@linux.ibm.com>
[frankja@linux.ibm.com: Fixed patch prefix]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241108091620.289406-1-schlameuss@linux.ibm.com>

KVM: s390: selftests: Add uc_map_unmap VM test case

Add a test case verifying basic running and interaction of ucontrol VMs.
Fill the segment and page tables for allocated memory and map memory on
first access.

* uc_map_unmap
Store and load data to mapped and unmapped memory and use pic segment
translation handling to map memory on access.

Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link:
https://lore.kernel.org/r/20241107141024.238916-2-schlameuss@linux.ibm.com
[frankja@linux.ibm.com: Fixed patch prefix]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20241107141024.238916-2-schlameuss@linux.ibm.com>

Merge tag 'kvm-riscv-6.13-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.13

- Accelerate KVM RISC-V when running as a guest
- Perf support to collect KVM guest statistics from host side

riscv: kvm: Fix out-of-bounds array access

In kvm_riscv_vcpu_sbi_init() the entry->ext_idx can contain an
out-of-bound index. This is used as a special marker for the base
extensions, that cannot be disabled. However, when traversing the
extensions, that special marker is not checked prior indexing the
array.

Add an out-of-bounds check to the function.

Fixes: 56d8a385b605 ("RISC-V: KVM: Allow some SBI extensions to be disabled by default")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20241104191503.74725-1-bjorn@kernel.org
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Fix APLIC in_clrip and clripnum write emulation

In the section "4.7 Precise effects on interrupt-pending bits"
of the RISC-V AIA specification defines that:

"If the source mode is Level1 or Level0 and the interrupt domain
is configured in MSI delivery mode (domaincfg.DM = 1):
The pending bit is cleared whenever the rectified input value is
low, when the interrupt is forwarded by MSI, or by a relevant
write to an in_clrip register or to clripnum."

Update the aplic_write_pending() to match the spec.

Fixes: d8dd9f113e16 ("RISC-V: KVM: Fix APLIC setipnum_le/be write emulation")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20241029085542.30541-1-yongxuan.wang@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>

KVM: Protect vCPU's "last run PID" with rwlock, not RCU

To avoid jitter on KVM_RUN due to synchronize_rcu(), use a rwlock instead
of RCU to protect vcpu->pid, a.k.a. the pid of the task last used to a
vCPU. When userspace is doing M:N scheduling of tasks to vCPUs, e.g. to
run SEV migration helper vCPUs during post-copy, the synchronize_rcu()
needed to change the PID associated with the vCPU can stall for hundreds
of milliseconds, which is problematic for latency sensitive post-copy
operations.

In the directed yield path, do not acquire the lock if it's contended,
i.e. if the associated PID is changing, as that means the vCPU's task is
already running.

Reported-by: Steve Rutherford <srutherford@google.com>
Reviewed-by: Steve Rutherford <srutherford@google.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240802200136.329973-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: Return '0' directly when there's no task to yield to

Do "return 0" instead of initializing and returning a local variable in
kvm_vcpu_yield_to(), e.g. so that it's more obvious what the function
returns if there is no task.

No functional change intended.

Acked-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240802200136.329973-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

KVM: Rework core loop of kvm_vcpu_on_spin() to use a single for-loop

Rework kvm_vcpu_on_spin() to use a single for-loop instead of making "two"
passes over all vCPUs. Given N=kvm->last_boosted_vcpu, the logic is to
iterate from vCPU[N+1]..vcpu[N-1], i.e. using two loops is just a kludgy
way of handling the wrap from the last vCPU to vCPU0 when a boostable vCPU
isn't found in vcpu[N+1]..vcpu[MAX].

Open code the xa_load() instead of using kvm_get_vcpu() to avoid reading
online_vcpus in every loop, as well as the accompanying smp_rmb(), i.e.
make it a custom kvm_for_each_vcpu(), for all intents and purposes.

Oppurtunistically clean up the comment explaining the logic.

Link: https://lore.kernel.org/r/20240802202121.341348-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

RISC-V: KVM: Use NACL HFENCEs for KVM request based HFENCEs

When running under some other hypervisor, use SBI NACL based HFENCEs
for TLB shoot-down via KVM requests. This makes HFENCEs faster whenever
SBI nested acceleration is available.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-14-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Save trap CSRs in kvm_riscv_vcpu_enter_exit()

Save trap CSRs in the kvm_riscv_vcpu_enter_exit() function instead of
the kvm_arch_vcpu_ioctl_run() function so that HTVAL and HTINST CSRs
are accessed in more optimized manner while running under some other
hypervisor.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-13-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Use SBI sync SRET call when available

Implement an optimized KVM world-switch using SBI sync SRET call
when SBI nested acceleration extension is available. This improves
KVM world-switch when KVM RISC-V is running as a Guest under some
other hypervisor.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-12-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Use nacl_csr_xyz() for accessing AIA CSRs

When running under some other hypervisor, prefer nacl_csr_xyz()
for accessing AIA CSRs in the run-loop. This makes CSR access
faster whenever SBI nested acceleration is available.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-11-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Use nacl_csr_xyz() for accessing H-extension CSRs

When running under some other hypervisor, prefer nacl_csr_xyz()
for accessing H-extension CSRs in the run-loop. This makes CSR
access faster whenever SBI nested acceleration is available.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-10-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Add common nested acceleration support

Add a common nested acceleration support which will be shared by
all parts of KVM RISC-V. This nested acceleration support detects
and enables SBI NACL extension usage based on static keys which
ensures minimum impact on the non-nested scenario.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-9-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: Add defines for the SBI nested acceleration extension

Add defines for the new SBI nested acceleration extension which was
ratified as part of the SBI v2.0 specification.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-8-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Don't setup SGEI for zero guest external interrupts

No need to setup SGEI local interrupt when there are zero guest
external interrupts (i.e. zero HW IMSIC guest files).

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-7-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Replace aia_set_hvictl() with aia_hvictl_value()

The aia_set_hvictl() internally writes the HVICTL CSR which makes
it difficult to optimize the CSR write using SBI NACL extension for
kvm_riscv_vcpu_aia_update_hvip() function so replace aia_set_hvictl()
with new aia_hvictl_value() which only computes the HVICTL value.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-6-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Break down the __kvm_riscv_switch_to() into macros

Break down the __kvm_riscv_switch_to() function into macros so that
these macros can be later re-used by SBI NACL extension based low-level
switch function.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-5-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Save/restore SCOUNTEREN in C source

The SCOUNTEREN CSR need not be saved/restored in the low-level
__kvm_riscv_switch_to() function hence move the SCOUNTEREN CSR
save/restore to the kvm_riscv_vcpu_swap_in_guest_state() and
kvm_riscv_vcpu_swap_in_host_state() functions in C sources.

Also, re-arrange the CSR save/restore and related GPR usage in
the low-level __kvm_riscv_switch_to() low-level function.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-4-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Save/restore HSTATUS in C source

We will be optimizing HSTATUS CSR access via shared memory setup
using the SBI nested acceleration extension. To facilitate this,
we first move HSTATUS save/restore in kvm_riscv_vcpu_enter_exit().

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-3-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

RISC-V: KVM: Order the object files alphabetically

Order the object files alphabetically in the Makefile so that
it is very predictable inserting new object files in the future.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20241020194734.58686-2-apatel@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>

riscv: KVM: add basic support for host vs guest profiling

For the information collected on the host side, we need to
identify which data originates from the guest and record
these events separately, this can be achieved by having
KVM register perf callbacks.

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/00342d535311eb0629b9ba4f1e457a48e2abee33.1728957131.git.zhouquan@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>

riscv: perf: add guest vs host distinction

Introduce basic guest support in perf, enabling it to distinguish
between PMU interrupts in the host or guest, and collect
fundamental information.

Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/a67d527dc1b11493fe11f7f53584772fdd983744.1728957131.git.zhouquan@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>

Linux 6.12-rc5

Merge tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- Prevent a certain range of pages which get marked as hypervisor-only,
   to get allocated to a CoCo (SNP) guest which cannot use them and thus
   fail booting

- Fix the microcode loader on AMD to pay attention to the stepping of a
   patch and to handle the case where a BIOS config option splits the
   machine into logical NUMA nodes per L3 cache slice

- Disable LAM from being built by default due to security concerns

* tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Ensure that RMP table fixups are reserved
  x86/microcode/AMD: Split load_microcode_amd()
  x86/microcode/AMD: Pay attention to the stepping dynamically
  x86/lam: Disable ADDRESS_MASKING in most cases

Merge tag 'ftrace-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ftrace fixes from Steven Rostedt:

- Fix missing mutex unlock in error path of register_ftrace_graph()

   A previous fix added a return on an error path and forgot to unlock
   the mutex. Instead of dealing with error paths, use guard(mutex) as
   the mutex is just released at the exit of the function anyway. Other
   functions in this file should be updated with this, but that's a
   cleanup and not a fix.

- Change cpuhp setup name to be consistent with other cpuhp states

   The same fix that the above patch fixes added a cpuhp_setup_state()
   call with the name of "fgraph_idle_init". I was informed that it
   should instead be something like: "fgraph:online". Update that too.

* tag 'ftrace-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  fgraph: Change the name of cpuhp state to "fgraph:online"
  fgraph: Fix missing unlock in register_ftrace_graph()

Merge tag 'platform-drivers-x86-v6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

- Asus thermal profile fix, fixing performance issues on Lunar Lake

- Intel PMC: one revert for a lockdep issue and one bugfix

- Dell WMI: Ignore some WMI events on suspend/resume to silence warnings

* tag 'platform-drivers-x86-v6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: asus-wmi: Fix thermal profile initialization
  platform/x86: dell-wmi: Ignore suspend notifications
  platform/x86/intel/pmc: Fix pmc_core_iounmap to call iounmap for valid addresses
  platform/x86:intel/pmc: Revert "Enable the ACPI PM Timer to be turned off when suspended"

Merge tag 'firewire-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
"A single commit to resolve a regression existing in v6.11 or later.

  The change in 1394 OHCI driver in v6.11 kernel could cause general
  protection faults when rediscovering nodes in IEEE 1394 bus while
  holding a spin lock. Consequently, watchdog checks can report a hard
  lockup.

  Currently, this issue is observed primarily during the system resume
  phase when using an extra node with three ports or more is used.
  However, it could potentially occur in the other cases as well"

* tag 'firewire-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: core: fix invalid port index for parent device

Merge tag 'block-6.12-20241026' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

- Pull request for MD via Song fixing a few issues

- Fix a wrong check in blk_rq_map_user_bvec(), causing IO errors on
   passthrough IO (Xinyu)

* tag 'block-6.12-20241026' of git://git.kernel.dk/linux:
  block: fix sanity checks in blk_rq_map_user_bvec
  md/raid10: fix null ptr dereference in raid10_size()
  md: ensure child flush IO does not affect origin bio->bi_status

Merge tag 'xfs-6.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:

- Fix recovery of allocator ops after a growfs

- Do not fail repairs on metadata files with no attr fork

* tag 'xfs-6.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: update the pag for the last AG at recovery time
  xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag
  xfs: error out when a superblock buffer update reduces the agcount
  xfs: update the file system geometry after recoverying superblock buffers
  xfs: merge the perag freeing helpers
  xfs: pass the exact range to initialize to xfs_initialize_perag
  xfs: don't fail repairs on metadata files with no attr fork

firewire: core: fix invalid port index for parent device

In a commit 24b7f8e5cd65 ("firewire: core: use helper functions for self
ID sequence"), the enumeration over self ID sequence was refactored with
some helper functions with KUnit tests. These helper functions are
guaranteed to work expectedly by the KUnit tests, however their application
includes a mistake to assign invalid value to the index of port connected
to parent device.

This bug affects the case that any extra node devices which has three or
more ports are connected to 1394 OHCI controller. In the case, the path
to update the tree cache could hits WARN_ON(), and gets general protection
fault due to the access to invalid address computed by the invalid value.

This commit fixes the bug to assign correct port index.

Cc: stable@vger.kernel.org
Reported-by: Edmund Raile <edmund.raile@proton.me>
Closes: https://lore.kernel.org/lkml/8a9902a4ece9329af1e1e42f5fea76861f0bf0e8.camel@proton.me/
Fixes: 24b7f8e5cd65 ("firewire: core: use helper functions for self ID sequence")
Link: https://lore.kernel.org/r/20241025034137.99317-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>

platform/x86: asus-wmi: Fix thermal profile initialization

When support for vivobook fan profiles was added, the initial
call to throttle_thermal_policy_set_default() was removed, which
however is necessary for full initialization.

Fix this by calling throttle_thermal_policy_set_default() again
when setting up the platform profile.

Fixes: bcbfcebda2cb ("platform/x86: asus-wmi: add support for vivobook fan profiles")
Reported-by: Michael Larabel <Michael@phoronix.com>
Closes: https://www.phoronix.com/review/lunar-lake-xe2/5
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20241025191514.15032-2-W_Armin@gmx.de
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>

Merge tag '9p-for-6.12-rc5' of https://github.com/martinetd/linux

Pull more 9p reverts from Dominique Martinet:
"Revert patches causing inode collision problems.

  The code simplification introduced significant regressions on servers
  that do not remap inode numbers when exporting multiple underlying
  filesystems with colliding inodes. See the top-most revert (commit
  be2ca3825372) for details.

  This problem had been ignored for too long and the reverts will also
  head to stable (6.9+).

  I'm confident this set of patches gets us back to previous behaviour
  (another related patch had already been reverted back in April and
  we're almost back to square 1, and the rest didn't touch inode
  lifecycle)"

* tag '9p-for-6.12-rc5' of https://github.com/martinetd/linux:
  Revert "fs/9p: simplify iget to remove unnecessary paths"
  Revert "fs/9p: fix uaf in in v9fs_stat2inode_dotl"
  Revert "fs/9p: remove redundant pointer v9ses"
  Revert " fs/9p: mitigate inode collisions"

Merge tag 'v6.12-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix init module error caseb

- Fix memory allocation error path (for passwords) in mount

* tag 'v6.12-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix warning when destroy 'cifs_io_request_pool'
smb: client: Handle kstrdup failures for passwords

Merge tag 'fuse-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:

- Fix cached size after passthrough writes

   This fix needed a trivial change in the backing-file API, which
   resulted in some non-fuse files being touched.

- Revert a commit meant as a cleanup but which triggered a WARNING

- Remove a stray debug line left-over

* tag 'fuse-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: remove stray debug line
  Revert "fuse: move initialization of fuse_file to fuse_writepages() instead of in callback"
  fuse: update inode size after extending passthrough write
  fs: pass offset and result to backing_file end_write() callback

Merge tag 'nfsd-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- Fix a couple of use-after-free bugs

* tag 'nfsd-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: cancel nfsd_shrinker_work using sync mode in nfs4_state_shutdown_net
nfsd: fix race between laundromat and free_stateid

Merge tag 'acpi-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These fix an ACPI PRM (Platform Runtime Mechanism) issue and add two
  new DMI quirks, one for an ACPI IRQ override and one for lid switch
  detection:

   - Make acpi_parse_prmt() look for EFI_MEMORY_RUNTIME memory regions
     only to comply with the UEFI specification and make PRM use
     efi_guid_t instead of guid_t to avoid a compiler warning triggered
     by that change (Koba Ko, Dan Carpenter)

   - Add an ACPI IRQ override quirk for LG 16T90SP (Christian Heusel)

   - Add a lid switch detection quirk for Samsung Galaxy Book2 (Shubham
     Panwar)"

* tag 'acpi-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: PRM: Clean up guid type in struct prm_handler_info
  ACPI: button: Add DMI quirk for Samsung Galaxy Book2 to fix initial lid detection issue
  ACPI: resource: Add LG 16T90SP to irq1_level_low_skip_override[]
  ACPI: PRM: Find EFI_MEMORY_RUNTIME block for PRM handler and context

Merge tag 'pm-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"Update cpufreq documentation to match the code after recent changes
  (Christian Loehle), fix a units conversion issue in the CPPC cpufreq
  driver (liwei), and fix an error check in the dtpm_devfreq power
  capping driver (Yuan Can)"

* tag 'pm-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: CPPC: fix perf_to_khz/khz_to_perf conversion exception
  powercap: dtpm_devfreq: Fix error check against dev_pm_qos_add_request()
  cpufreq: docs: Reflect latency changes in docs

Merge tag 'pci-v6.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

- Hold the rescan lock while adding devices to avoid race with
   concurrent pwrctl rescan that can lead to a crash (Bartosz
   Golaszewski)

- Avoid binding pwrctl driver to QCom WCN wifi if the DT lacks the
   necessary PMU regulator descriptions (Bartosz Golaszewski)

* tag 'pci-v6.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/pwrctl: Abandon QCom WCN probe on pre-pwrseq device-trees
  PCI: Hold rescan lock while adding devices during host probe

Merge tag 'fbdev-for-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev fixes from Helge Deller:

- Fix some build warnings and failures with CONFIG_FB_IOMEM_FOPS and
   CONFIG_FB_DEVICE

- Remove the da8xx fbdev driver

- Constify struct sbus_mmap_map and fix indentation warning

* tag 'fbdev-for-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: wm8505fb: select CONFIG_FB_IOMEM_FOPS
  fbdev: da8xx: remove the driver
  fbdev: Constify struct sbus_mmap_map
  fbdev: nvidiafb: fix inconsistent indentation warning
  fbdev: sstfb: Make CONFIG_FB_DEVICE optional

Merge tag 'gpio-fixes-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fix from Bartosz Golaszewski:
"Update MAINTAINERS with a keyword pattern for legacy GPIO API

  The goal is to alert us to anyone trying to use the deprecated, legacy
  API (this happens almost every release)"

* tag 'gpio-fixes-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  MAINTAINERS: add a keyword entry for the GPIO subsystem

Merge tag 'ata-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fix from Niklas Cassel:

- Fix the handling of ATA commands that timeout (command that did not
   receive a completion interrupt within the configured timeout time).

   Commands that timeout, while also having either the FAILFAST flag
   set, or the command being a passthrough command, should never be
   retried. Restore this behavior (as it was before v6.12-rc1).

* tag 'ata-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata: Set DID_TIME_OUT for commands that actually timed out

Merge branch 'kvm-no-struct-page' into HEAD

TL;DR: Eliminate KVM's long-standing (and heinous) behavior of essentially
guessing which pfns are refcounted pages (see kvm_pfn_to_refcounted_page()).

Getting there requires "fixing" arch code that isn't obviously broken.
Specifically, to get rid of kvm_pfn_to_refcounted_page(), KVM needs to
stop marking pages/folios dirty/accessed based solely on the pfn that's
stored in KVM's stage-2 page tables.

Instead of tracking which SPTEs correspond to refcounted pages, simply
remove all of the code that operates on "struct page" based ona the pfn
in stage-2 PTEs. This is the back ~40-50% of the series.

For x86 in particular, which sets accessed/dirty status when that info
would be "lost", e.g. when SPTEs are zapped or KVM clears the dirty flag
in a SPTE, foregoing the updates provides very measurable performance
improvements for related operations. E.g. when clearing dirty bits as
part of dirty logging, and zapping SPTEs to reconstitue huge pages when
disabling dirty logging.

The front ~40% of the series is cleanups and prep work, and most of it is
x86 focused (purely because x86 added the most special cases, *sigh*).
E.g. several of the inputs to hva_to_pfn() (and it's myriad wrappers),
can be removed by cleaning up and deduplicating x86 code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'sound-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"The majority of changes here are about ASoC.

  There are two core changes in ASoC (the bump of minimal topology ABI
  version and the fix for references of components in DAPM code), and
  others are mostly various device-specific fixes for SoundWire, AMD,
  Intel, SOF, Qualcomm and FSL, in addition to a few usual HD-audio
  quirks and fixes"

* tag 'sound-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (33 commits)
  ALSA: hda/realtek: Update default depop procedure
  ASoC: qcom: sc7280: Fix missing Soundwire runtime stream alloc
  ASoC: fsl_micfil: Add sample rate constraint
  ASoC: rt722-sdca: increase clk_stop_timeout to fix clock stop issue
  ALSA: hda/tas2781: select CRC32 instead of CRC32_SARWATE
  ALSA: hda/realtek: Add subwoofer quirk for Acer Predator G9-593
  ALSA: firewire-lib: Avoid division by zero in apply_constraint_to_size()
  ASoC: fsl_micfil: Add a flag to distinguish with different volume control types
  ASoC: codecs: lpass-rx-macro: fix RXn(rx,n) macro for DSM_CTL and SEC7 regs
  ASoC: Change my e-mail to gmail
  ASoC: Intel: soc-acpi: lnl: Add match entry for TM2 laptops
  ASoC: amd: yc: Fix non-functional mic on ASUS E1404FA
  ASoC: SOF: Intel: hda: Always clean up link DMA during stop
  soundwire: intel_ace2x: Send PDI stream number during prepare
  ASoC: SOF: Intel: hda: Handle prepare without close for non-HDA DAI's
  ASoC: SOF: ipc4-topology: Do not set ALH node_id for aggregated DAIs
  MAINTAINERS: Update maintainer list for MICROCHIP ASOC, SSC and MCP16502 drivers
  ASoC: qcom: Select missing common Soundwire module code on SDM845
  ASoC: fsl_esai: change dev_warn to dev_dbg in irq handler
  ASoC: rsnd: Fix probe failure on HiHope boards due to endpoint parsing
  ...

Merge tag 'drm-fixes-2024-10-25' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly drm fixes, mostly amdgpu and xe, with minor bridge and an i915
  Kconfig fix. Nothing too scary and it seems to be pretty quiet.

  amdgpu:
   - ACPI method handling fixes
   - SMU 14.x fixes
   - Display idle optimization fix
   - DP link layer compliance fix
   - SDMA 7.x fix
   - PSR-SU fix
   - SWSMU fix

  i915:
   - Fix DRM_I915_GVT_KVMGT dependencies in Kconfig

  xe:
   - Increase invalidation timeout to avoid errors in some hosts
   - Flush worker on timeout
   - Better handling for force wake failure
   - Improve argument check on user fence creation
   - Don't restart parallel queues multiple times on GT reset

  bridge:
   - aux: Fix assignment of OF node
   - tc358767: Add missing of_node_put() in error path"

* tag 'drm-fixes-2024-10-25' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe: Don't restart parallel queues multiple times on GT reset
  drm/xe/ufence: Prefetch ufence addr to catch bogus address
  drm/xe: Handle unreliable MMIO reads during forcewake
  drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout
  drm/xe: Enlarge the invalidation timeout from 150 to 500
  drm/amdgpu: handle default profile on on devices without fullscreen 3D
  drm/amd/display: Disable PSR-SU on Parade 08-01 TCON too
  drm/amdgpu: fix random data corruption for sdma 7
  drm/amd/display: temp w/a for DP Link Layer compliance
  drm/amd/display: temp w/a for dGPU to enter idle optimizations
  drm/amd/pm: update deep sleep status on smu v14.0.2/3
  drm/amd/pm: update overdrive function on smu v14.0.2/3
  drm/amd/pm: update the driver-fw interface file for smu v14.0.2/3
  drm/amd: Guard against bad data for ATIF ACPI method
  drm/bridge: tc358767: fix missing of_node_put() in for_each_endpoint_of_node()
  drm/bridge: Fix assignment of the of_node of the parent to aux bridge
  i915: fix DRM_I915_GVT_KVMGT dependencies

KVM: Don't grab reference on VM_MIXEDMAP pfns that have a "struct page"

Now that KVM no longer relies on an ugly heuristic to find its struct page
references, i.e. now that KVM can't get false positives on VM_MIXEDMAP
pfns, remove KVM's hack to elevate the refcount for pfns that happen to
have a valid struct page. In addition to removing a long-standing wart
in KVM, this allows KVM to map non-refcounted struct page memory into the
guest, e.g. for exposing GPU TTM buffers to KVM guests.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-86-seanjc@google.com>

KVM: Drop APIs that manipulate "struct page" via pfns

Remove all kvm_{release,set}_pfn_*() APIs now that all users are gone.

No functional change intended.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-85-seanjc@google.com>

KVM: arm64: Don't mark "struct page" accessed when making SPTE young

Don't mark pages/folios as accessed in the primary MMU when making a SPTE
young in KVM's secondary MMU, as doing so relies on
kvm_pfn_to_refcounted_page(), and generally speaking is unnecessary and
wasteful. KVM participates in page aging via mmu_notifiers, so there's no
need to push "accessed" updates to the primary MMU.

Dropping use of kvm_set_pfn_accessed() also paves the way for removing
kvm_pfn_to_refcounted_page() and all its users.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-84-seanjc@google.com>

KVM: x86/mmu: Don't mark "struct page" accessed when zapping SPTEs

Don't mark pages/folios as accessed in the primary MMU when zapping SPTEs,
as doing so relies on kvm_pfn_to_refcounted_page(), and generally speaking
is unnecessary and wasteful. KVM participates in page aging via
mmu_notifiers, so there's no need to push "accessed" updates to the
primary MMU.

And if KVM zaps a SPTe in response to an mmu_notifier, marking it accessed
_after_ the primary MMU has decided to zap the page is likely to go
unnoticed, i.e. odds are good that, if the page is being zapped for
reclaim, the page will be swapped out regardless of whether or not KVM
marks the page accessed.

Dropping x86's use of kvm_set_pfn_accessed() also paves the way for
removing kvm_pfn_to_refcounted_page() and all its users.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-83-seanjc@google.com>

KVM: Make kvm_follow_pfn.refcounted_page a required field

Now that the legacy gfn_to_pfn() APIs are gone, and all callers of
hva_to_pfn() pass in a refcounted_page pointer, make it a required field
to ensure all future usage in KVM plays nice.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-82-seanjc@google.com>

KVM: s390: Use kvm_release_page_dirty() to unpin "struct page" memory

Use kvm_release_page_dirty() when unpinning guest pages, as the pfn was
retrieved via pin_guest_page(), i.e. is guaranteed to be backed by struct
page memory. This will allow dropping kvm_release_pfn_dirty() and
friends.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-81-seanjc@google.com>

KVM: Drop gfn_to_pfn() APIs now that all users are gone

Drop gfn_to_pfn() and all its variants now that all users are gone.

No functional change intended.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-80-seanjc@google.com>

KVM: PPC: Explicitly require struct page memory for Ultravisor sharing

Explicitly require "struct page" memory when sharing memory between
guest and host via an Ultravisor. Given the number of pfn_to_page()
calls in the code, it's safe to assume that KVM already requires that the
pfn returned by gfn_to_pfn() is backed by struct page, i.e. this is
likely a bug fix, not a reduction in KVM capabilities.

Switching to gfn_to_page() will eventually allow removing gfn_to_pfn()
and kvm_pfn_to_refcounted_page().

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-79-seanjc@google.com>

KVM: arm64: Use __gfn_to_page() when copying MTE tags to/from userspace

Use __gfn_to_page() instead when copying MTE tags between guest and
userspace. This will eventually allow removing gfn_to_pfn_prot(),
gfn_to_pfn(), kvm_pfn_to_refcounted_page(), and related APIs.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-78-seanjc@google.com>

KVM: Add support for read-only usage of gfn_to_page()

Rework gfn_to_page() to support read-only accesses so that it can be used
by arm64 to get MTE tags out of guest memory.

Opportunistically rewrite the comment to be even more stern about using
gfn_to_page(), as there are very few scenarios where requiring a struct
page is actually the right thing to do (though there are such scenarios).
Add a FIXME to call out that KVM probably should be pinning pages, not
just getting pages.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-77-seanjc@google.com>

KVM: Convert gfn_to_page() to use kvm_follow_pfn()

Convert gfn_to_page() to the new kvm_follow_pfn() internal API, which will
eventually allow removing gfn_to_pfn() and kvm_pfn_to_refcounted_page().

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-76-seanjc@google.com>

KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz instructions

Use kvm_vcpu_map() when patching dcbz in guest memory, as a regular GUP
isn't technically sufficient when writing to data in the target pages.
As per Documentation/core-api/pin_user_pages.rst:

      Correct (uses FOLL_PIN calls):
          pin_user_pages()
          write to the data within the pages
          unpin_user_pages()

      INCORRECT (uses FOLL_GET calls):
          get_user_pages()
          write to the data within the pages
          put_page()

As a happy bonus, using kvm_vcpu_{,un}map() takes care of creating a
mapping and marking the page dirty.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-75-seanjc@google.com>

KVM: PPC: Remove extra get_page() to fix page refcount leak

Don't manually do get_page() when patching dcbz, as gfn_to_page() gifts
the caller a reference. I.e. doing get_page() will leak the page due to
not putting all references.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-74-seanjc@google.com>

KVM: MIPS: Use kvm_faultin_pfn() to map pfns into the guest

Convert MIPS to kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-73-seanjc@google.com>

KVM: MIPS: Mark "struct page" pfns accessed prior to dropping mmu_lock

Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that MIPS can convert to kvm_release_faultin_page() without tripping
its lockdep assertion on mmu_lock being held.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-72-seanjc@google.com>

KVM: MIPS: Mark "struct page" pfns accessed only in "slow" page fault path

Mark pages accessed only in the slow page fault path in order to remove
an unnecessary user of kvm_pfn_to_refcounted_page(). Marking pages
accessed in the primary MMU during KVM page fault handling isn't harmful,
but it's largely pointless and likely a waste of a cycles since the
primary MMU will call into KVM via mmu_notifiers when aging pages. I.e.
KVM participates in a "pull" model, so there's no need to also "push"
updates.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-71-seanjc@google.com>

KVM: MIPS: Mark "struct page" pfns dirty only in "slow" page fault path

Mark pages/folios dirty only the slow page fault path, i.e. only when
mmu_lock is held and the operation is mmu_notifier-protected, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).

See the link below for details.

Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-70-seanjc@google.com>

KVM: LoongArch: Use kvm_faultin_pfn() to map pfns into the guest

Convert LoongArch to kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-69-seanjc@google.com>

KVM: LoongArch: Mark "struct page" pfn accessed before dropping mmu_lock

Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that LoongArch can convert to kvm_release_faultin_page() without
tripping its lockdep assertion on mmu_lock being held.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-68-seanjc@google.com>

KVM: LoongArch: Mark "struct page" pfns accessed only in "slow" page fault path

Mark pages accessed only in the slow path, before dropping mmu_lock when
faulting in guest memory so that LoongArch can convert to
kvm_release_faultin_page() without tripping its lockdep assertion on
mmu_lock being held.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-67-seanjc@google.com>

KVM: LoongArch: Mark "struct page" pfns dirty only in "slow" page fault path

Mark pages/folios dirty only the slow page fault path, i.e. only when
mmu_lock is held and the operation is mmu_notifier-protected, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).

See the link below for details.

Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-66-seanjc@google.com>

KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PR

Convert Book3S PR to __kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-65-seanjc@google.com>

KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after installing PTE

Mark pages/folios dirty/accessed after installing a PTE, and more
specifically after acquiring mmu_lock and checking for an mmu_notifier
invalidation. Marking a page/folio dirty after it has been written back
can make some filesystems unhappy (backing KVM guests will such filesystem
files is uncommon, and the race is minuscule, hence the lack of complaints).
See the link below for details.

This will also allow converting Book3S to kvm_release_faultin_page(),
which requires that mmu_lock be held (for the aforementioned reason).

Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-64-seanjc@google.com>

KVM: PPC: Drop unused @kvm_ro param from kvmppc_book3s_instantiate_page()

Drop @kvm_ro from kvmppc_book3s_instantiate_page() as it is now only
written, and never read.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-63-seanjc@google.com>

KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s Radix

Replace Book3s Radix's homebrewed (read: copy+pasted) fault-in logic with
__kvm_faultin_pfn(), which functionally does pretty much the exact same
thing.

Note, when the code was written, KVM indeed didn't do fast GUP without
"!atomic && !async", but that has long since changed (KVM tries fast GUP
for all writable mappings).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-62-seanjc@google.com>

KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HV

Replace Book3s HV's homebrewed fault-in logic with __kvm_faultin_pfn(),
which functionally does pretty much the exact same thing.

Note, when the code was written, KVM indeed didn't do fast GUP without
"!atomic && !async", but that has long since changed (KVM tries fast GUP
for all writable mappings).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-61-seanjc@google.com>

KVM: RISC-V: Use kvm_faultin_pfn() when mapping pfns into the guest

Convert RISC-V to __kvm_faultin_pfn()+kvm_release_faultin_page(), which
are new APIs to consolidate arch code and provide consistent behavior
across all KVM architectures.

Opportunisticaly fix a s/priort/prior typo in the related comment.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-60-seanjc@google.com>

KVM: RISC-V: Mark "struct page" pfns accessed before dropping mmu_lock

Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that RISC-V can convert to kvm_release_faultin_page() without tripping
its lockdep assertion on mmu_lock being held. Marking pages accessed
outside of mmu_lock is ok (not great, but safe), but marking pages _dirty_
outside of mmu_lock can make filesystems unhappy (see the link below).
Do both under mmu_lock to minimize the chances of doing the wrong thing in
the future.

Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-59-seanjc@google.com>

KVM: RISC-V: Mark "struct page" pfns dirty iff a stage-2 PTE is installed

Don't mark pages dirty if KVM bails from the page fault handler without
installing a stage-2 mapping, i.e. if the page is guaranteed to not be
written by the guest.

In addition to being a (very) minor fix, this paves the way for converting
RISC-V to use kvm_release_faultin_page().

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-58-seanjc@google.com>

KVM: arm64: Use __kvm_faultin_pfn() to handle memory aborts

Convert arm64 to use __kvm_faultin_pfn()+kvm_release_faultin_page().
Three down, six to go.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-57-seanjc@google.com>

KVM: arm64: Mark "struct page" pfns accessed/dirty before dropping mmu_lock

Mark pages/folios accessed+dirty prior to dropping mmu_lock, as marking a
page/folio dirty after it has been written back can make some filesystems
unhappy (backing KVM guests will such filesystem files is uncommon, and
the race is minuscule, hence the lack of complaints).

While scary sounding, practically speaking the worst case scenario is that
KVM would trigger this WARN in filemap_unaccount_folio():

        /*
         * At this point folio must be either written or cleaned by
         * truncate.  Dirty folio here signals a bug and loss of
         * unwritten data - on ordinary filesystems.
         *
         * But it's harmless on in-memory filesystems like tmpfs; and can
         * occur when a driver which did get_user_pages() sets page dirty
         * before putting it, while the inode is being finally evicted.
         *
         * Below fixes dirty accounting after removing the folio entirely
         * but leaves the dirty flag set: it has no effect for truncated
         * folio and anyway will be cleared before returning folio to
         * buddy allocator.
         */
        if (WARN_ON_ONCE(folio_test_dirty(folio) &&
                         mapping_can_writeback(mapping)))
                folio_account_cleaned(folio, inode_to_wb(mapping->host));

KVM won't actually write memory because the stage-2 mappings are protected
by the mmu_notifier, i.e. there is no risk of loss of data, even if the
VM were backed by memory that needs writeback.

See the link below for additional details.

This will also allow converting arm64 to kvm_release_faultin_page(), which
requires that mmu_lock be held (for the aforementioned reason).

Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-56-seanjc@google.com>

KVM: PPC: e500: Use __kvm_faultin_pfn() to handle page faults

Convert PPC e500 to use __kvm_faultin_pfn()+kvm_release_faultin_page(),
and continue the inexorable march towards the demise of
kvm_pfn_to_refcounted_page().

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-55-seanjc@google.com>

KVM: PPC: e500: Mark "struct page" pfn accessed before dropping mmu_lock

Mark pages accessed before dropping mmu_lock when faulting in guest memory
so that shadow_map() can convert to kvm_release_faultin_page() without
tripping its lockdep assertion on mmu_lock being held. Marking pages
accessed outside of mmu_lock is ok (not great, but safe), but marking
pages _dirty_ outside of mmu_lock can make filesystems unhappy.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-54-seanjc@google.com>

KVM: PPC: e500: Mark "struct page" dirty in kvmppc_e500_shadow_map()

Mark the underlying page as dirty in kvmppc_e500_ref_setup()'s sole
caller, kvmppc_e500_shadow_map(), which will allow converting e500 to
__kvm_faultin_pfn() + kvm_release_faultin_page() without having to do
a weird dance between ref_setup() and shadow_map().

Opportunistically drop the redundant kvm_set_pfn_accessed(), as
shadow_map() puts the page via kvm_release_pfn_clean().

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-53-seanjc@google.com>

KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn

Use __kvm_faultin_page() get the APIC access page so that KVM can
precisely release the refcounted page, i.e. to remove yet another user
of kvm_pfn_to_refcounted_page(). While the path isn't handling a guest
page fault, the semantics are effectively the same; KVM just happens to
be mapping the pfn into a VMCS field instead of a secondary MMU.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-52-seanjc@google.com>

KVM: VMX: Hold mmu_lock until page is released when updating APIC access page

Hold mmu_lock across kvm_release_pfn_clean() when refreshing the APIC
access page address to ensure that KVM doesn't mark a page/folio as
accessed after it has been unmapped. Practically speaking marking a folio
accesses is benign in this scenario, as KVM does hold a reference (it's
really just marking folios dirty that is problematic), but there's no
reason not to be paranoid (moving the APIC access page isn't a hot path),
and no reason to be different from other mmu_notifier-protected flows in
KVM.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-51-seanjc@google.com>

KVM: Move x86's API to release a faultin page to common KVM

Move KVM x86's helper that "finishes" the faultin process to common KVM
so that the logic can be shared across all architectures. Note, not all
architectures implement a fast page fault path, but the gist of the
comment applies to all architectures.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-50-seanjc@google.com>

KVM: x86/mmu: Don't mark unused faultin pages as accessed

When finishing guest page faults, don't mark pages as accessed if KVM
is resuming the guest _without_ installing a mapping, i.e. if the page
isn't being used. While it's possible that marking the page accessed
could avoid minor thrashing due to reclaiming a page that the guest is
about to access, it's far more likely that the gfn=>pfn mapping was
was invalidated, e.g. due a memslot change, or because the corresponding
VMA is being modified.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-49-seanjc@google.com>

KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns

Now that all x86 page fault paths precisely track refcounted pages, use
Use kvm_page_fault.refcounted_page to put references to struct page memory
when finishing page faults. This is a baby step towards eliminating
kvm_pfn_to_refcounted_page().

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-48-seanjc@google.com>

KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn()

Provide the "struct page" associated with a guest_memfd pfn as an output
from __kvm_gmem_get_pfn() so that KVM guest page fault handlers can
directly put the page instead of having to rely on
kvm_pfn_to_refcounted_page().

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-47-seanjc@google.com>

KVM: guest_memfd: Pass index, not gfn, to __kvm_gmem_get_pfn()

Refactor guest_memfd usage of __kvm_gmem_get_pfn() to pass the index into
the guest_memfd file instead of the gfn, i.e. resolve the index based on
the slot+gfn in the caller instead of in __kvm_gmem_get_pfn(). This will
allow kvm_gmem_get_pfn() to retrieve and return the specific "struct page",
which requires the index into the folio, without a redoing the index
calculation multiple times (which isn't costly, just hard to follow).

Opportunistically add a kvm_gmem_get_index() helper to make the copy+pasted
code easier to understand.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-46-seanjc@google.com>

KVM: x86/mmu: Convert page fault paths to kvm_faultin_pfn()

Convert KVM x86 to use the recently introduced __kvm_faultin_pfn().
Opportunstically capture the refcounted_page grabbed by KVM for use in
future changes.

No functional change intended.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-45-seanjc@google.com>

KVM: Add kvm_faultin_pfn() to specifically service guest page faults

Add a new dedicated API, kvm_faultin_pfn(), for servicing guest page
faults, i.e. for getting pages/pfns that will be mapped into the guest via
an mmu_notifier-protected KVM MMU. Keep struct kvm_follow_pfn buried in
internal code, as having __kvm_faultin_pfn() take "out" params is actually
cleaner for several architectures, e.g. it allows the caller to have its
own "page fault" structure without having to marshal data to/from
kvm_follow_pfn.

Long term, common KVM would ideally provide a kvm_page_fault structure, a
la x86's struct of the same name. But all architectures need to be
converted to a common API before that can happen.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-44-seanjc@google.com>

KVM: Move declarations of memslot accessors up in kvm_host.h

Move the memslot lookup helpers further up in kvm_host.h so that they can
be used by inlined "to pfn" wrappers.

No functional change intended.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-43-seanjc@google.com>

KVM: x86/mmu: Mark pages/folios dirty at the origin of make_spte()

Move the marking of folios dirty from make_spte() out to its callers,
which have access to the _struct page_, not just the underlying pfn.
Once all architectures follow suit, this will allow removing KVM's ugly
hack where KVM elevates the refcount of VM_MIXEDMAP pfns that happen to
be struct page memory.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-42-seanjc@google.com>

KVM: x86/mmu: Add helper to "finish" handling a guest page fault

Add a helper to finish/complete the handling of a guest page, e.g. to
mark the pages accessed and put any held references. In the near
future, this will allow improving the logic without having to copy+paste
changes into all page fault paths. And in the less near future, will
allow sharing the "finish" API across all architectures.

No functional change intended.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-41-seanjc@google.com>