]>
Commit | Line | Data |
---|---|---|
5c17fab8 GKH |
1 | From 9e985cbf2942a1bb8fcef9adc2a17d90fd7ca8ee Mon Sep 17 00:00:00 2001 |
2 | From: Sean Christopherson <seanjc@google.com> | |
3 | Date: Wed, 6 Mar 2024 16:58:33 -0800 | |
4 | Subject: KVM: x86/pmu: Disable support for adaptive PEBS | |
5 | ||
6 | From: Sean Christopherson <seanjc@google.com> | |
7 | ||
8 | commit 9e985cbf2942a1bb8fcef9adc2a17d90fd7ca8ee upstream. | |
9 | ||
10 | Drop support for virtualizing adaptive PEBS, as KVM's implementation is | |
11 | architecturally broken without an obvious/easy path forward, and because | |
12 | exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak | |
13 | host kernel addresses to the guest. | |
14 | ||
15 | Bug #1 is that KVM doesn't account for the upper 32 bits of | |
16 | IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g | |
17 | fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters() | |
18 | stores local variables as u8s and truncates the upper bits too, etc. | |
19 | ||
20 | Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value | |
21 | for PEBS events, perf will _always_ generate an adaptive record, even if | |
22 | the guest requested a basic record. Note, KVM will also enable adaptive | |
23 | PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the | |
24 | guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero, | |
25 | i.e. the guest will only ever see Basic records. | |
26 | ||
27 | Bug #3 is in perf. intel_pmu_disable_fixed() doesn't clear the upper | |
28 | bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and | |
29 | intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE | |
30 | either. I.e. perf _always_ enables ADAPTIVE counters, regardless of what | |
31 | KVM requests. | |
32 | ||
33 | Bug #4 is that adaptive PEBS *might* effectively bypass event filters set | |
34 | by the host, as "Updated Memory Access Info Group" records information | |
35 | that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER. | |
36 | ||
37 | Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least | |
38 | zeros) when entering a vCPU with adaptive PEBS, which allows the guest | |
39 | to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries" | |
40 | records. | |
41 | ||
42 | Disable adaptive PEBS support as an immediate fix due to the severity of | |
43 | the LBR leak in particular, and because fixing all of the bugs will be | |
44 | non-trivial, e.g. not suitable for backporting to stable kernels. | |
45 | ||
46 | Note! This will break live migration, but trying to make KVM play nice | |
47 | with live migration would be quite complicated, wouldn't be guaranteed to | |
48 | work (i.e. KVM might still kill/confuse the guest), and it's not clear | |
49 | that there are any publicly available VMMs that support adaptive PEBS, | |
50 | let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't | |
51 | support PEBS in any capacity. | |
52 | ||
53 | Link: https://lore.kernel.org/all/20240306230153.786365-1-seanjc@google.com | |
54 | Link: https://lore.kernel.org/all/ZeepGjHCeSfadANM@google.com | |
55 | Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") | |
56 | Cc: stable@vger.kernel.org | |
57 | Cc: Like Xu <like.xu.linux@gmail.com> | |
58 | Cc: Mingwei Zhang <mizhang@google.com> | |
59 | Cc: Zhenyu Wang <zhenyuw@linux.intel.com> | |
60 | Cc: Zhang Xiong <xiong.y.zhang@intel.com> | |
61 | Cc: Lv Zhiyuan <zhiyuan.lv@intel.com> | |
62 | Cc: Dapeng Mi <dapeng1.mi@intel.com> | |
63 | Cc: Jim Mattson <jmattson@google.com> | |
64 | Acked-by: Like Xu <likexu@tencent.com> | |
65 | Link: https://lore.kernel.org/r/20240307005833.827147-1-seanjc@google.com | |
66 | Signed-off-by: Sean Christopherson <seanjc@google.com> | |
67 | Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | |
68 | --- | |
69 | arch/x86/kvm/vmx/vmx.c | 24 ++++++++++++++++++++++-- | |
70 | 1 file changed, 22 insertions(+), 2 deletions(-) | |
71 | ||
72 | --- a/arch/x86/kvm/vmx/vmx.c | |
73 | +++ b/arch/x86/kvm/vmx/vmx.c | |
74 | @@ -7857,8 +7857,28 @@ static u64 vmx_get_perf_capabilities(voi | |
75 | ||
76 | if (vmx_pebs_supported()) { | |
77 | perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK; | |
78 | - if ((perf_cap & PERF_CAP_PEBS_FORMAT) < 4) | |
79 | - perf_cap &= ~PERF_CAP_PEBS_BASELINE; | |
80 | + | |
81 | + /* | |
82 | + * Disallow adaptive PEBS as it is functionally broken, can be | |
83 | + * used by the guest to read *host* LBRs, and can be used to | |
84 | + * bypass userspace event filters. To correctly and safely | |
85 | + * support adaptive PEBS, KVM needs to: | |
86 | + * | |
87 | + * 1. Account for the ADAPTIVE flag when (re)programming fixed | |
88 | + * counters. | |
89 | + * | |
90 | + * 2. Gain support from perf (or take direct control of counter | |
91 | + * programming) to support events without adaptive PEBS | |
92 | + * enabled for the hardware counter. | |
93 | + * | |
94 | + * 3. Ensure LBR MSRs cannot hold host data on VM-Entry with | |
95 | + * adaptive PEBS enabled and MSR_PEBS_DATA_CFG.LBRS=1. | |
96 | + * | |
97 | + * 4. Document which PMU events are effectively exposed to the | |
98 | + * guest via adaptive PEBS, and make adaptive PEBS mutually | |
99 | + * exclusive with KVM_SET_PMU_EVENT_FILTER if necessary. | |
100 | + */ | |
101 | + perf_cap &= ~PERF_CAP_PEBS_BASELINE; | |
102 | } | |
103 | ||
104 | return perf_cap; |