]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob
760ae8c55a8dbc0fbe196c7a0196939673b46106
[thirdparty/kernel/stable-queue.git] /
1 From a373830f96db288a3eb43a8692b6bcd0bd88dfe1 Mon Sep 17 00:00:00 2001
2 From: Gautam Menghani <gautam@linux.ibm.com>
3 Date: Mon, 28 Oct 2024 14:34:09 +0530
4 Subject: KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to avoid spurious interrupts
5
6 From: Gautam Menghani <gautam@linux.ibm.com>
7
8 commit a373830f96db288a3eb43a8692b6bcd0bd88dfe1 upstream.
9
10 Running a L2 vCPU (see [1] for terminology) with LPCR_MER bit set and no
11 pending interrupts results in that L2 vCPU getting an infinite flood of
12 spurious interrupts. The 'if check' in kvmhv_run_single_vcpu() sets the
13 LPCR_MER bit if there are pending interrupts.
14
15 The spurious flood problem can be observed in 2 cases:
16 1. Crashing the guest while interrupt heavy workload is running
17 a. Start a L2 guest and run an interrupt heavy workload (eg: ipistorm)
18 b. While the workload is running, crash the guest (make sure kdump
19 is configured)
20 c. Any one of the vCPUs of the guest will start getting an infinite
21 flood of spurious interrupts.
22
23 2. Running LTP stress tests in multiple guests at the same time
24 a. Start 4 L2 guests.
25 b. Start running LTP stress tests on all 4 guests at same time.
26 c. In some time, any one/more of the vCPUs of any of the guests will
27 start getting an infinite flood of spurious interrupts.
28
29 The root cause of both the above issues is the same:
30 1. A NMI is sent to a running vCPU that has LPCR_MER bit set.
31 2. In the NMI path, all registers are refreshed, i.e, H_GUEST_GET_STATE
32 is called for all the registers.
33 3. When H_GUEST_GET_STATE is called for LPCR, the vcpu->arch.vcore->lpcr
34 of that vCPU at L1 level gets updated with LPCR_MER set to 1, and this
35 new value is always used whenever that vCPU runs, regardless of whether
36 there was a pending interrupt.
37 4. Since LPCR_MER is set, the vCPU in L2 always jumps to the external
38 interrupt handler, and this cycle never ends.
39
40 Fix the spurious flood by masking off the LPCR_MER bit before running a
41 L2 vCPU to ensure that it is not set if there are no pending interrupts.
42
43 [1] Terminology:
44 1. L0 : PAPR hypervisor running in HV mode
45 2. L1 : Linux guest (logical partition) running on top of L0
46 3. L2 : KVM guest running on top of L1
47
48 Fixes: ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0")
49 Cc: stable@vger.kernel.org # v6.8+
50 Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
51 Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
52 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
53 ---
54 arch/powerpc/kvm/book3s_hv.c | 12 ++++++++++++
55 1 file changed, 12 insertions(+)
56
57 --- a/arch/powerpc/kvm/book3s_hv.c
58 +++ b/arch/powerpc/kvm/book3s_hv.c
59 @@ -4892,6 +4892,18 @@ int kvmhv_run_single_vcpu(struct kvm_vcp
60 BOOK3S_INTERRUPT_EXTERNAL, 0);
61 else
62 lpcr |= LPCR_MER;
63 + } else {
64 + /*
65 + * L1's copy of L2's LPCR (vcpu->arch.vcore->lpcr) can get its MER bit
66 + * unexpectedly set - for e.g. during NMI handling when all register
67 + * states are synchronized from L0 to L1. L1 needs to inform L0 about
68 + * MER=1 only when there are pending external interrupts.
69 + * In the above if check, MER bit is set if there are pending
70 + * external interrupts. Hence, explicity mask off MER bit
71 + * here as otherwise it may generate spurious interrupts in L2 KVM
72 + * causing an endless loop, which results in L2 guest getting hung.
73 + */
74 + lpcr &= ~LPCR_MER;
75 }
76 } else if (vcpu->arch.pending_exceptions ||
77 vcpu->arch.doorbell_request ||