From: Sasha Levin Date: Mon, 6 Dec 2021 04:03:03 +0000 (-0500) Subject: Fixes for 5.10 X-Git-Tag: v4.4.294~18 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=3cab6015b30874bef2c2157b9807285f9a7f4429;p=thirdparty%2Fkernel%2Fstable-queue.git Fixes for 5.10 Signed-off-by: Sasha Levin --- diff --git a/queue-5.10/kvm-vmx-set-failure-code-in-prepare_vmcs02.patch b/queue-5.10/kvm-vmx-set-failure-code-in-prepare_vmcs02.patch new file mode 100644 index 00000000000..8ea8f00df86 --- /dev/null +++ b/queue-5.10/kvm-vmx-set-failure-code-in-prepare_vmcs02.patch @@ -0,0 +1,41 @@ +From 7a256e68fa685bbbc887f3c77d10bd0c2f49004d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 30 Nov 2021 15:53:37 +0300 +Subject: KVM: VMX: Set failure code in prepare_vmcs02() + +From: Dan Carpenter + +[ Upstream commit bfbb307c628676929c2d329da0daf9d22afa8ad2 ] + +The error paths in the prepare_vmcs02() function are supposed to set +*entry_failure_code but this path does not. It leads to using an +uninitialized variable in the caller. + +Fixes: 71f7347025bf ("KVM: nVMX: Load GUEST_IA32_PERF_GLOBAL_CTRL MSR on VM-Entry") +Signed-off-by: Dan Carpenter +Message-Id: <20211130125337.GB24578@kili> +Signed-off-by: Paolo Bonzini +Signed-off-by: Sasha Levin +--- + arch/x86/kvm/vmx/nested.c | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c +index 257ec2cbf69a4..36661b15c3d04 100644 +--- a/arch/x86/kvm/vmx/nested.c ++++ b/arch/x86/kvm/vmx/nested.c +@@ -2619,8 +2619,10 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, + + if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) && + WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, +- vmcs12->guest_ia32_perf_global_ctrl))) ++ vmcs12->guest_ia32_perf_global_ctrl))) { ++ *entry_failure_code = ENTRY_FAIL_DEFAULT; + return -EINVAL; ++ } + + kvm_rsp_write(vcpu, vmcs12->guest_rsp); + kvm_rip_write(vcpu, vmcs12->guest_rip); +-- +2.33.0 + diff --git a/queue-5.10/kvm-x86-pmu-fix-reserved-bits-for-amd-perfevtseln-re.patch b/queue-5.10/kvm-x86-pmu-fix-reserved-bits-for-amd-perfevtseln-re.patch new file mode 100644 index 00000000000..242f6203b8c --- /dev/null +++ b/queue-5.10/kvm-x86-pmu-fix-reserved-bits-for-amd-perfevtseln-re.patch @@ -0,0 +1,55 @@ +From 980b7c2fb5a6612ec1c2edea008f19b54b17e92f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 18 Nov 2021 21:03:20 +0800 +Subject: KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register + +From: Like Xu + +[ Upstream commit cb1d220da0faa5ca0deb93449aff953f0c2cce6d ] + +If we run the following perf command in an AMD Milan guest: + + perf stat \ + -e cpu/event=0x1d0/ \ + -e cpu/event=0x1c7/ \ + -e cpu/umask=0x1f,event=0x18e/ \ + -e cpu/umask=0x7,event=0x18e/ \ + -e cpu/umask=0x18,event=0x18e/ \ + ./workload + +dmesg will report a #GP warning from an unchecked MSR access +error on MSR_F15H_PERF_CTLx. + +This is because according to APM (Revision: 4.03) Figure 13-7, +the bits [35:32] of AMD PerfEvtSeln register is a part of the +event select encoding, which extends the EVENT_SELECT field +from 8 bits to 12 bits. + +Opportunistically update pmu->reserved_bits for reserved bit 19. + +Reported-by: Jim Mattson +Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") +Signed-off-by: Like Xu +Message-Id: <20211118130320.95997-1-likexu@tencent.com> +Signed-off-by: Paolo Bonzini +Signed-off-by: Sasha Levin +--- + arch/x86/kvm/svm/pmu.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c +index 035da07500e8b..5a5c165a30ed1 100644 +--- a/arch/x86/kvm/svm/pmu.c ++++ b/arch/x86/kvm/svm/pmu.c +@@ -274,7 +274,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu) + pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS; + + pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1; +- pmu->reserved_bits = 0xffffffff00200000ull; ++ pmu->reserved_bits = 0xfffffff000280000ull; + pmu->version = 1; + /* not applicable to AMD; but clean them to prevent any fall out */ + pmu->counter_bitmask[KVM_PMC_FIXED] = 0; +-- +2.33.0 + diff --git a/queue-5.10/sched-uclamp-fix-rq-uclamp_max-not-set-on-first-enqu.patch b/queue-5.10/sched-uclamp-fix-rq-uclamp_max-not-set-on-first-enqu.patch new file mode 100644 index 00000000000..cb1ceca4a87 --- /dev/null +++ b/queue-5.10/sched-uclamp-fix-rq-uclamp_max-not-set-on-first-enqu.patch @@ -0,0 +1,66 @@ +From 6ede1aa40599f3abdcb4af51a07e4dcf7d058ba9 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 2 Dec 2021 11:20:33 +0000 +Subject: sched/uclamp: Fix rq->uclamp_max not set on first enqueue + +From: Qais Yousef + +[ Upstream commit 315c4f884800c45cb6bd8c90422fad554a8b9588 ] + +Commit d81ae8aac85c ("sched/uclamp: Fix initialization of struct +uclamp_rq") introduced a bug where uclamp_max of the rq is not reset to +match the woken up task's uclamp_max when the rq is idle. + +The code was relying on rq->uclamp_max initialized to zero, so on first +enqueue + + static inline void uclamp_rq_inc_id(struct rq *rq, struct task_struct *p, + enum uclamp_id clamp_id) + { + ... + + if (uc_se->value > READ_ONCE(uc_rq->value)) + WRITE_ONCE(uc_rq->value, uc_se->value); + } + +was actually resetting it. But since commit d81ae8aac85c changed the +default to 1024, this no longer works. And since rq->uclamp_flags is +also initialized to 0, neither above code path nor uclamp_idle_reset() +update the rq->uclamp_max on first wake up from idle. + +This is only visible from first wake up(s) until the first dequeue to +idle after enabling the static key. And it only matters if the +uclamp_max of this task is < 1024 since only then its uclamp_max will be +effectively ignored. + +Fix it by properly initializing rq->uclamp_flags = UCLAMP_FLAG_IDLE to +ensure uclamp_idle_reset() is called which then will update the rq +uclamp_max value as expected. + +Fixes: d81ae8aac85c ("sched/uclamp: Fix initialization of struct uclamp_rq") +Signed-off-by: Qais Yousef +Signed-off-by: Peter Zijlstra (Intel) +Reviewed-by: Valentin Schneider +Tested-by: Dietmar Eggemann +Link: https://lkml.kernel.org/r/20211202112033.1705279-1-qais.yousef@arm.com +Signed-off-by: Sasha Levin +--- + kernel/sched/core.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/kernel/sched/core.c b/kernel/sched/core.c +index 304aad997da11..0a5f9fad45e4b 100644 +--- a/kernel/sched/core.c ++++ b/kernel/sched/core.c +@@ -1526,7 +1526,7 @@ static void __init init_uclamp_rq(struct rq *rq) + }; + } + +- rq->uclamp_flags = 0; ++ rq->uclamp_flags = UCLAMP_FLAG_IDLE; + } + + static void __init init_uclamp(void) +-- +2.33.0 + diff --git a/queue-5.10/series b/queue-5.10/series index 7aa8c8af0f3..95fbe65086f 100644 --- a/queue-5.10/series +++ b/queue-5.10/series @@ -100,3 +100,11 @@ atlantic-add-missing-dids-and-fix-115c.patch remove-half-duplex-mode-speed-capabilities.patch atlantic-fix-statistics-logic-for-production-hardware.patch atlantic-remove-warn-trace-message.patch +kvm-x86-pmu-fix-reserved-bits-for-amd-perfevtseln-re.patch +kvm-vmx-set-failure-code-in-prepare_vmcs02.patch +x86-sev-fix-sev-es-ins-outs-instructions-for-word-dw.patch +x86-entry-use-the-correct-fence-macro-after-swapgs-i.patch +x86-xen-add-xenpv_restore_regs_and_return_to_usermod.patch +sched-uclamp-fix-rq-uclamp_max-not-set-on-first-enqu.patch +x86-pv-switch-swapgs-to-alternative.patch +x86-entry-add-a-fence-for-kernel-entry-swapgs-in-par.patch diff --git a/queue-5.10/x86-entry-add-a-fence-for-kernel-entry-swapgs-in-par.patch b/queue-5.10/x86-entry-add-a-fence-for-kernel-entry-swapgs-in-par.patch new file mode 100644 index 00000000000..786c64d0061 --- /dev/null +++ b/queue-5.10/x86-entry-add-a-fence-for-kernel-entry-swapgs-in-par.patch @@ -0,0 +1,80 @@ +From d5a0fcd335821d7ec7e27acf6ae625a3ae41ba7b Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 26 Nov 2021 18:11:21 +0800 +Subject: x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry() + +From: Lai Jiangshan + +[ Upstream commit c07e45553da1808aa802e9f0ffa8108cfeaf7a17 ] + +Commit + + 18ec54fdd6d18 ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations") + +added FENCE_SWAPGS_{KERNEL|USER}_ENTRY for conditional SWAPGS. In +paranoid_entry(), it uses only FENCE_SWAPGS_KERNEL_ENTRY for both +branches. This is because the fence is required for both cases since the +CR3 write is conditional even when PTI is enabled. + +But + + 96b2371413e8f ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry") + +changed the order of SWAPGS and the CR3 write. And it missed the needed +FENCE_SWAPGS_KERNEL_ENTRY for the user gsbase case. + +Add it back by changing the branches so that FENCE_SWAPGS_KERNEL_ENTRY +can cover both branches. + + [ bp: Massage, fix typos, remove obsolete comment while at it. ] + +Fixes: 96b2371413e8f ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry") +Signed-off-by: Lai Jiangshan +Signed-off-by: Borislav Petkov +Link: https://lkml.kernel.org/r/20211126101209.8613-2-jiangshanlai@gmail.com +Signed-off-by: Sasha Levin +--- + arch/x86/entry/entry_64.S | 16 +++++----------- + 1 file changed, 5 insertions(+), 11 deletions(-) + +diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S +index 166554a109aeb..a24ce5905ab82 100644 +--- a/arch/x86/entry/entry_64.S ++++ b/arch/x86/entry/entry_64.S +@@ -936,6 +936,7 @@ SYM_CODE_START_LOCAL(paranoid_entry) + .Lparanoid_entry_checkgs: + /* EBX = 1 -> kernel GSBASE active, no restore required */ + movl $1, %ebx ++ + /* + * The kernel-enforced convention is a negative GSBASE indicates + * a kernel value. No SWAPGS needed on entry and exit. +@@ -943,21 +944,14 @@ SYM_CODE_START_LOCAL(paranoid_entry) + movl $MSR_GS_BASE, %ecx + rdmsr + testl %edx, %edx +- jns .Lparanoid_entry_swapgs +- ret ++ js .Lparanoid_kernel_gsbase + +-.Lparanoid_entry_swapgs: ++ /* EBX = 0 -> SWAPGS required on exit */ ++ xorl %ebx, %ebx + swapgs ++.Lparanoid_kernel_gsbase: + +- /* +- * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an +- * unconditional CR3 write, even in the PTI case. So do an lfence +- * to prevent GS speculation, regardless of whether PTI is enabled. +- */ + FENCE_SWAPGS_KERNEL_ENTRY +- +- /* EBX = 0 -> SWAPGS required on exit */ +- xorl %ebx, %ebx + ret + SYM_CODE_END(paranoid_entry) + +-- +2.33.0 + diff --git a/queue-5.10/x86-entry-use-the-correct-fence-macro-after-swapgs-i.patch b/queue-5.10/x86-entry-use-the-correct-fence-macro-after-swapgs-i.patch new file mode 100644 index 00000000000..9e2e2d17e64 --- /dev/null +++ b/queue-5.10/x86-entry-use-the-correct-fence-macro-after-swapgs-i.patch @@ -0,0 +1,70 @@ +From e3b6d4da55bc8a47d47384b83d435c32eb21e1fc Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 26 Nov 2021 18:11:22 +0800 +Subject: x86/entry: Use the correct fence macro after swapgs in kernel CR3 + +From: Lai Jiangshan + +[ Upstream commit 1367afaa2ee90d1c956dfc224e199fcb3ff3f8cc ] + +The commit + + c75890700455 ("x86/entry/64: Remove unneeded kernel CR3 switching") + +removed a CR3 write in the faulting path of load_gs_index(). + +But the path's FENCE_SWAPGS_USER_ENTRY has no fence operation if PTI is +enabled, see spectre_v1_select_mitigation(). + +Rather, it depended on the serializing CR3 write of SWITCH_TO_KERNEL_CR3 +and since it got removed, add a FENCE_SWAPGS_KERNEL_ENTRY call to make +sure speculation is blocked. + + [ bp: Massage commit message and comment. ] + +Fixes: c75890700455 ("x86/entry/64: Remove unneeded kernel CR3 switching") +Signed-off-by: Lai Jiangshan +Signed-off-by: Borislav Petkov +Acked-by: Peter Zijlstra (Intel) +Link: https://lkml.kernel.org/r/20211126101209.8613-3-jiangshanlai@gmail.com +Signed-off-by: Sasha Levin +--- + arch/x86/entry/entry_64.S | 15 ++++++++------- + 1 file changed, 8 insertions(+), 7 deletions(-) + +diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S +index f18f3932e971a..a806d68b96990 100644 +--- a/arch/x86/entry/entry_64.S ++++ b/arch/x86/entry/entry_64.S +@@ -1035,11 +1035,6 @@ SYM_CODE_START_LOCAL(error_entry) + pushq %r12 + ret + +-.Lerror_entry_done_lfence: +- FENCE_SWAPGS_KERNEL_ENTRY +-.Lerror_entry_done: +- ret +- + /* + * There are two places in the kernel that can potentially fault with + * usergs. Handle them here. B stepping K8s sometimes report a +@@ -1062,8 +1057,14 @@ SYM_CODE_START_LOCAL(error_entry) + * .Lgs_change's error handler with kernel gsbase. + */ + SWAPGS +- FENCE_SWAPGS_USER_ENTRY +- jmp .Lerror_entry_done ++ ++ /* ++ * Issue an LFENCE to prevent GS speculation, regardless of whether it is a ++ * kernel or user gsbase. ++ */ ++.Lerror_entry_done_lfence: ++ FENCE_SWAPGS_KERNEL_ENTRY ++ ret + + .Lbstep_iret: + /* Fix truncated RIP */ +-- +2.33.0 + diff --git a/queue-5.10/x86-pv-switch-swapgs-to-alternative.patch b/queue-5.10/x86-pv-switch-swapgs-to-alternative.patch new file mode 100644 index 00000000000..ada5aa0f9c8 --- /dev/null +++ b/queue-5.10/x86-pv-switch-swapgs-to-alternative.patch @@ -0,0 +1,237 @@ +From 2ef3d718e2c4ab4483e0b62e56fcc3bb83fa400f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 20 Jan 2021 14:55:44 +0100 +Subject: x86/pv: Switch SWAPGS to ALTERNATIVE + +From: Juergen Gross + +[ Upstream commit 53c9d9240944088274aadbbbafc6138ca462db4f ] + +SWAPGS is used only for interrupts coming from user mode or for +returning to user mode. So there is no reason to use the PARAVIRT +framework, as it can easily be replaced by an ALTERNATIVE depending +on X86_FEATURE_XENPV. + +There are several instances using the PV-aware SWAPGS macro in paths +which are never executed in a Xen PV guest. Replace those with the +plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies. + +Signed-off-by: Juergen Gross +Signed-off-by: Borislav Petkov +Reviewed-by: Borislav Petkov +Reviewed-by: Thomas Gleixner +Acked-by: Andy Lutomirski +Acked-by: Peter Zijlstra (Intel) +Link: https://lkml.kernel.org/r/20210120135555.32594-5-jgross@suse.com +Signed-off-by: Sasha Levin +--- + arch/x86/entry/entry_64.S | 10 +++++----- + arch/x86/include/asm/irqflags.h | 20 ++++++++------------ + arch/x86/include/asm/paravirt.h | 20 -------------------- + arch/x86/include/asm/paravirt_types.h | 2 -- + arch/x86/kernel/asm-offsets_64.c | 1 - + arch/x86/kernel/paravirt.c | 1 - + arch/x86/kernel/paravirt_patch.c | 3 --- + arch/x86/xen/enlighten_pv.c | 3 --- + 8 files changed, 13 insertions(+), 47 deletions(-) + +diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S +index de541ea2788eb..166554a109aeb 100644 +--- a/arch/x86/entry/entry_64.S ++++ b/arch/x86/entry/entry_64.S +@@ -673,7 +673,7 @@ native_irq_return_ldt: + */ + + pushq %rdi /* Stash user RDI */ +- SWAPGS /* to kernel GS */ ++ swapgs /* to kernel GS */ + SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi /* to kernel CR3 */ + + movq PER_CPU_VAR(espfix_waddr), %rdi +@@ -703,7 +703,7 @@ native_irq_return_ldt: + orq PER_CPU_VAR(espfix_stack), %rax + + SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi +- SWAPGS /* to user GS */ ++ swapgs /* to user GS */ + popq %rdi /* Restore user RDI */ + + movq %rax, %rsp +@@ -947,7 +947,7 @@ SYM_CODE_START_LOCAL(paranoid_entry) + ret + + .Lparanoid_entry_swapgs: +- SWAPGS ++ swapgs + + /* + * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an +@@ -1005,7 +1005,7 @@ SYM_CODE_START_LOCAL(paranoid_exit) + jnz restore_regs_and_return_to_kernel + + /* We are returning to a context with user GSBASE */ +- SWAPGS_UNSAFE_STACK ++ swapgs + jmp restore_regs_and_return_to_kernel + SYM_CODE_END(paranoid_exit) + +@@ -1431,7 +1431,7 @@ nmi_no_fsgsbase: + jnz nmi_restore + + nmi_swapgs: +- SWAPGS_UNSAFE_STACK ++ swapgs + + nmi_restore: + POP_REGS +diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h +index 2dfc8d380dab1..8c86edefa1150 100644 +--- a/arch/x86/include/asm/irqflags.h ++++ b/arch/x86/include/asm/irqflags.h +@@ -131,18 +131,6 @@ static __always_inline unsigned long arch_local_irq_save(void) + #define SAVE_FLAGS(x) pushfq; popq %rax + #endif + +-#define SWAPGS swapgs +-/* +- * Currently paravirt can't handle swapgs nicely when we +- * don't have a stack we can rely on (such as a user space +- * stack). So we either find a way around these or just fault +- * and emulate if a guest tries to call swapgs directly. +- * +- * Either way, this is a good way to document that we don't +- * have a reliable stack. x86_64 only. +- */ +-#define SWAPGS_UNSAFE_STACK swapgs +- + #define INTERRUPT_RETURN jmp native_iret + #define USERGS_SYSRET64 \ + swapgs; \ +@@ -170,6 +158,14 @@ static __always_inline int arch_irqs_disabled(void) + + return arch_irqs_disabled_flags(flags); + } ++#else ++#ifdef CONFIG_X86_64 ++#ifdef CONFIG_XEN_PV ++#define SWAPGS ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV ++#else ++#define SWAPGS swapgs ++#endif ++#endif + #endif /* !__ASSEMBLY__ */ + + #endif +diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h +index d25cc6830e895..5647bcdba776e 100644 +--- a/arch/x86/include/asm/paravirt.h ++++ b/arch/x86/include/asm/paravirt.h +@@ -776,26 +776,6 @@ extern void default_banner(void); + + #ifdef CONFIG_X86_64 + #ifdef CONFIG_PARAVIRT_XXL +-/* +- * If swapgs is used while the userspace stack is still current, +- * there's no way to call a pvop. The PV replacement *must* be +- * inlined, or the swapgs instruction must be trapped and emulated. +- */ +-#define SWAPGS_UNSAFE_STACK \ +- PARA_SITE(PARA_PATCH(PV_CPU_swapgs), swapgs) +- +-/* +- * Note: swapgs is very special, and in practise is either going to be +- * implemented with a single "swapgs" instruction or something very +- * special. Either way, we don't need to save any registers for +- * it. +- */ +-#define SWAPGS \ +- PARA_SITE(PARA_PATCH(PV_CPU_swapgs), \ +- ANNOTATE_RETPOLINE_SAFE; \ +- call PARA_INDIRECT(pv_ops+PV_CPU_swapgs); \ +- ) +- + #define USERGS_SYSRET64 \ + PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64), \ + ANNOTATE_RETPOLINE_SAFE; \ +diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h +index 0fad9f61c76ab..903d71884fa25 100644 +--- a/arch/x86/include/asm/paravirt_types.h ++++ b/arch/x86/include/asm/paravirt_types.h +@@ -169,8 +169,6 @@ struct pv_cpu_ops { + frame set up. */ + void (*iret)(void); + +- void (*swapgs)(void); +- + void (*start_context_switch)(struct task_struct *prev); + void (*end_context_switch)(struct task_struct *next); + #endif +diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c +index 828be792231e9..1354bc30614d7 100644 +--- a/arch/x86/kernel/asm-offsets_64.c ++++ b/arch/x86/kernel/asm-offsets_64.c +@@ -15,7 +15,6 @@ int main(void) + #ifdef CONFIG_PARAVIRT_XXL + OFFSET(PV_CPU_usergs_sysret64, paravirt_patch_template, + cpu.usergs_sysret64); +- OFFSET(PV_CPU_swapgs, paravirt_patch_template, cpu.swapgs); + #ifdef CONFIG_DEBUG_ENTRY + OFFSET(PV_IRQ_save_fl, paravirt_patch_template, irq.save_fl); + #endif +diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c +index 6c3407ba6ee98..5e5fcf5c376de 100644 +--- a/arch/x86/kernel/paravirt.c ++++ b/arch/x86/kernel/paravirt.c +@@ -312,7 +312,6 @@ struct paravirt_patch_template pv_ops = { + + .cpu.usergs_sysret64 = native_usergs_sysret64, + .cpu.iret = native_iret, +- .cpu.swapgs = native_swapgs, + + #ifdef CONFIG_X86_IOPL_IOPERM + .cpu.invalidate_io_bitmap = native_tss_invalidate_io_bitmap, +diff --git a/arch/x86/kernel/paravirt_patch.c b/arch/x86/kernel/paravirt_patch.c +index ace6e334cb393..7c518b08aa3c5 100644 +--- a/arch/x86/kernel/paravirt_patch.c ++++ b/arch/x86/kernel/paravirt_patch.c +@@ -28,7 +28,6 @@ struct patch_xxl { + const unsigned char irq_restore_fl[2]; + const unsigned char cpu_wbinvd[2]; + const unsigned char cpu_usergs_sysret64[6]; +- const unsigned char cpu_swapgs[3]; + const unsigned char mov64[3]; + }; + +@@ -43,7 +42,6 @@ static const struct patch_xxl patch_data_xxl = { + .cpu_wbinvd = { 0x0f, 0x09 }, // wbinvd + .cpu_usergs_sysret64 = { 0x0f, 0x01, 0xf8, + 0x48, 0x0f, 0x07 }, // swapgs; sysretq +- .cpu_swapgs = { 0x0f, 0x01, 0xf8 }, // swapgs + .mov64 = { 0x48, 0x89, 0xf8 }, // mov %rdi, %rax + }; + +@@ -86,7 +84,6 @@ unsigned int native_patch(u8 type, void *insn_buff, unsigned long addr, + PATCH_CASE(mmu, write_cr3, xxl, insn_buff, len); + + PATCH_CASE(cpu, usergs_sysret64, xxl, insn_buff, len); +- PATCH_CASE(cpu, swapgs, xxl, insn_buff, len); + PATCH_CASE(cpu, wbinvd, xxl, insn_buff, len); + #endif + +diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c +index 5af0421ef74ba..16ff25d6935e7 100644 +--- a/arch/x86/xen/enlighten_pv.c ++++ b/arch/x86/xen/enlighten_pv.c +@@ -1083,9 +1083,6 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = { + #endif + .io_delay = xen_io_delay, + +- /* Xen takes care of %gs when switching to usermode for us */ +- .swapgs = paravirt_nop, +- + .start_context_switch = paravirt_start_context_switch, + .end_context_switch = xen_end_context_switch, + }; +-- +2.33.0 + diff --git a/queue-5.10/x86-sev-fix-sev-es-ins-outs-instructions-for-word-dw.patch b/queue-5.10/x86-sev-fix-sev-es-ins-outs-instructions-for-word-dw.patch new file mode 100644 index 00000000000..e0c02eeb61f --- /dev/null +++ b/queue-5.10/x86-sev-fix-sev-es-ins-outs-instructions-for-word-dw.patch @@ -0,0 +1,157 @@ +From f384c2b7d5113952e57e614a17b06cabcb21f8fd Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 19 Nov 2021 15:27:57 -0800 +Subject: x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword + +From: Michael Sterritt + +[ Upstream commit 1d5379d0475419085d3575bd9155f2e558e96390 ] + +Properly type the operands being passed to __put_user()/__get_user(). +Otherwise, these routines truncate data for dependent instructions +(e.g., INSW) and only read/write one byte. + +This has been tested by sending a string with REP OUTSW to a port and +then reading it back in with REP INSW on the same port. + +Previous behavior was to only send and receive the first char of the +size. For example, word operations for "abcd" would only read/write +"ac". With change, the full string is now written and read back. + +Fixes: f980f9c31a923 (x86/sev-es: Compile early handler code into kernel image) +Signed-off-by: Michael Sterritt +Signed-off-by: Borislav Petkov +Reviewed-by: Paolo Bonzini +Reviewed-by: Marc Orr +Reviewed-by: Peter Gonda +Reviewed-by: Joerg Roedel +Link: https://lkml.kernel.org/r/20211119232757.176201-1-sterritt@google.com +Signed-off-by: Sasha Levin +--- + arch/x86/kernel/sev-es.c | 57 +++++++++++++++++++++++++++------------- + 1 file changed, 39 insertions(+), 18 deletions(-) + +diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c +index 865e234ea24bd..c222fab112cbd 100644 +--- a/arch/x86/kernel/sev-es.c ++++ b/arch/x86/kernel/sev-es.c +@@ -260,11 +260,6 @@ static enum es_result vc_write_mem(struct es_em_ctxt *ctxt, + char *dst, char *buf, size_t size) + { + unsigned long error_code = X86_PF_PROT | X86_PF_WRITE; +- char __user *target = (char __user *)dst; +- u64 d8; +- u32 d4; +- u16 d2; +- u8 d1; + + /* + * This function uses __put_user() independent of whether kernel or user +@@ -286,26 +281,42 @@ static enum es_result vc_write_mem(struct es_em_ctxt *ctxt, + * instructions here would cause infinite nesting. + */ + switch (size) { +- case 1: ++ case 1: { ++ u8 d1; ++ u8 __user *target = (u8 __user *)dst; ++ + memcpy(&d1, buf, 1); + if (__put_user(d1, target)) + goto fault; + break; +- case 2: ++ } ++ case 2: { ++ u16 d2; ++ u16 __user *target = (u16 __user *)dst; ++ + memcpy(&d2, buf, 2); + if (__put_user(d2, target)) + goto fault; + break; +- case 4: ++ } ++ case 4: { ++ u32 d4; ++ u32 __user *target = (u32 __user *)dst; ++ + memcpy(&d4, buf, 4); + if (__put_user(d4, target)) + goto fault; + break; +- case 8: ++ } ++ case 8: { ++ u64 d8; ++ u64 __user *target = (u64 __user *)dst; ++ + memcpy(&d8, buf, 8); + if (__put_user(d8, target)) + goto fault; + break; ++ } + default: + WARN_ONCE(1, "%s: Invalid size: %zu\n", __func__, size); + return ES_UNSUPPORTED; +@@ -328,11 +339,6 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt, + char *src, char *buf, size_t size) + { + unsigned long error_code = X86_PF_PROT; +- char __user *s = (char __user *)src; +- u64 d8; +- u32 d4; +- u16 d2; +- u8 d1; + + /* + * This function uses __get_user() independent of whether kernel or user +@@ -354,26 +360,41 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt, + * instructions here would cause infinite nesting. + */ + switch (size) { +- case 1: ++ case 1: { ++ u8 d1; ++ u8 __user *s = (u8 __user *)src; ++ + if (__get_user(d1, s)) + goto fault; + memcpy(buf, &d1, 1); + break; +- case 2: ++ } ++ case 2: { ++ u16 d2; ++ u16 __user *s = (u16 __user *)src; ++ + if (__get_user(d2, s)) + goto fault; + memcpy(buf, &d2, 2); + break; +- case 4: ++ } ++ case 4: { ++ u32 d4; ++ u32 __user *s = (u32 __user *)src; ++ + if (__get_user(d4, s)) + goto fault; + memcpy(buf, &d4, 4); + break; +- case 8: ++ } ++ case 8: { ++ u64 d8; ++ u64 __user *s = (u64 __user *)src; + if (__get_user(d8, s)) + goto fault; + memcpy(buf, &d8, 8); + break; ++ } + default: + WARN_ONCE(1, "%s: Invalid size: %zu\n", __func__, size); + return ES_UNSUPPORTED; +-- +2.33.0 + diff --git a/queue-5.10/x86-xen-add-xenpv_restore_regs_and_return_to_usermod.patch b/queue-5.10/x86-xen-add-xenpv_restore_regs_and_return_to_usermod.patch new file mode 100644 index 00000000000..91d5dc381ad --- /dev/null +++ b/queue-5.10/x86-xen-add-xenpv_restore_regs_and_return_to_usermod.patch @@ -0,0 +1,95 @@ +From 93f5997a2d59d8e4f299ad3143cc57fa9ec7bafb Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 26 Nov 2021 18:11:23 +0800 +Subject: x86/xen: Add xenpv_restore_regs_and_return_to_usermode() + +From: Lai Jiangshan + +[ Upstream commit 5c8f6a2e316efebb3ba93d8c1af258155dcf5632 ] + +In the native case, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is the +trampoline stack. But XEN pv doesn't use trampoline stack, so +PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is also the kernel stack. + +In that case, source and destination stacks are identical, which means +that reusing swapgs_restore_regs_and_return_to_usermode() in XEN pv +would cause %rsp to move up to the top of the kernel stack and leave the +IRET frame below %rsp. + +This is dangerous as it can be corrupted if #NMI / #MC hit as either of +these events occurring in the middle of the stack pushing would clobber +data on the (original) stack. + +And, with XEN pv, swapgs_restore_regs_and_return_to_usermode() pushing +the IRET frame on to the original address is useless and error-prone +when there is any future attempt to modify the code. + + [ bp: Massage commit message. ] + +Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries") +Signed-off-by: Lai Jiangshan +Signed-off-by: Borislav Petkov +Reviewed-by: Boris Ostrovsky +Link: https://lkml.kernel.org/r/20211126101209.8613-4-jiangshanlai@gmail.com +Signed-off-by: Sasha Levin +--- + arch/x86/entry/entry_64.S | 4 ++++ + arch/x86/xen/xen-asm.S | 20 ++++++++++++++++++++ + 2 files changed, 24 insertions(+) + +diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S +index a806d68b96990..de541ea2788eb 100644 +--- a/arch/x86/entry/entry_64.S ++++ b/arch/x86/entry/entry_64.S +@@ -575,6 +575,10 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL) + ud2 + 1: + #endif ++#ifdef CONFIG_XEN_PV ++ ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV ++#endif ++ + POP_REGS pop_rdi=0 + + /* +diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S +index 53cf8aa35032d..011ec649f3886 100644 +--- a/arch/x86/xen/xen-asm.S ++++ b/arch/x86/xen/xen-asm.S +@@ -19,6 +19,7 @@ + + #include + #include ++#include <../entry/calling.h> + + /* + * Enable events. This clears the event mask and tests the pending +@@ -235,6 +236,25 @@ SYM_CODE_START(xen_sysret64) + jmp hypercall_iret + SYM_CODE_END(xen_sysret64) + ++/* ++ * XEN pv doesn't use trampoline stack, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is ++ * also the kernel stack. Reusing swapgs_restore_regs_and_return_to_usermode() ++ * in XEN pv would cause %rsp to move up to the top of the kernel stack and ++ * leave the IRET frame below %rsp, which is dangerous to be corrupted if #NMI ++ * interrupts. And swapgs_restore_regs_and_return_to_usermode() pushing the IRET ++ * frame at the same address is useless. ++ */ ++SYM_CODE_START(xenpv_restore_regs_and_return_to_usermode) ++ UNWIND_HINT_REGS ++ POP_REGS ++ ++ /* stackleak_erase() can work safely on the kernel stack. */ ++ STACKLEAK_ERASE_NOCLOBBER ++ ++ addq $8, %rsp /* skip regs->orig_ax */ ++ jmp xen_iret ++SYM_CODE_END(xenpv_restore_regs_and_return_to_usermode) ++ + /* + * Xen handles syscall callbacks much like ordinary exceptions, which + * means we have: +-- +2.33.0 +