From: Greg Kroah-Hartman Date: Mon, 5 Feb 2018 17:38:14 +0000 (-0800) Subject: 4.14-stable patches X-Git-Tag: v3.18.94~15 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=aba6eb3aac5cd83043108dd1929341ce7f1fe696;p=thirdparty%2Fkernel%2Fstable-queue.git 4.14-stable patches added patches: Documentation_Document_array_index_nospec.patch KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch KVM_VMX_introduce_alloc_loaded_vmcs.patch KVM_VMX_make_MSR_bitmaps_per-VCPU.patch KVM_nVMX_Eliminate_vmcs02_pool.patch KVMx86_Add_IBPB_support.patch KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch array_index_nospec_Sanitize_speculative_array_de-references.patch nl80211_Sanitize_array_index_in_parse_txq_params.patch objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch objtool_Improve_retpoline_alternative_handling.patch objtool_Warn_on_stripped_section_symbol.patch vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch x86_Implement_array_index_mask_nospec.patch x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch x86_Introduce_barrier_nospec.patch x86asm_Move_status_from_thread_struct_to_thread_info.patch x86cpuid_Fix_up_virtual_IBRSIBPBSTIBP_feature_bits_on_Intel.patch x86entry64_Push_extra_regs_right_away.patch x86entry64_Remove_the_SYSCALL64_fast_path.patch x86get_user_Use_pointer_masking_to_limit_speculation.patch x86kvm_Update_spectre-v1_mitigation.patch x86mm_Fix_overlap_of_i386_CPU_ENTRY_AREA_with_FIX_BTMAP.patch x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch x86pti_Mark_constant_arrays_as___initconst.patch x86retpoline_Avoid_retpolines_for_built-in___init_functions.patch x86spectre_Check_CONFIG_RETPOLINE_in_command_line_parser.patch x86spectre_Fix_spelling_mistake_vunerable-_vulnerable.patch x86spectre_Report_get_user_mitigation_for_spectre_v1.patch x86spectre_Simplify_spectre_v2_command_line_parsing.patch x86speculation_Fix_typo_IBRS_ATT_which_should_be_IBRS_ALL.patch x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch --- diff --git a/queue-4.14/Documentation_Document_array_index_nospec.patch b/queue-4.14/Documentation_Document_array_index_nospec.patch new file mode 100644 index 00000000000..d6cfa4bc94b --- /dev/null +++ b/queue-4.14/Documentation_Document_array_index_nospec.patch @@ -0,0 +1,123 @@ +Subject: Documentation: Document array_index_nospec +From: Mark Rutland mark.rutland@arm.com +Date: Mon Jan 29 17:02:16 2018 -0800 + +From: Mark Rutland mark.rutland@arm.com + +commit f84a56f73dddaeac1dba8045b007f742f61cd2da + +Document the rationale and usage of the new array_index_nospec() helper. + +Signed-off-by: Mark Rutland +Signed-off-by: Will Deacon +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Reviewed-by: Kees Cook +Cc: linux-arch@vger.kernel.org +Cc: Jonathan Corbet +Cc: Peter Zijlstra +Cc: gregkh@linuxfoundation.org +Cc: kernel-hardening@lists.openwall.com +Cc: torvalds@linux-foundation.org +Cc: alan@linux.intel.com +Link: https://lkml.kernel.org/r/151727413645.33451.15878817161436755393.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + Documentation/speculation.txt | 90 ++++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 90 insertions(+) + +--- /dev/null ++++ b/Documentation/speculation.txt +@@ -0,0 +1,90 @@ ++This document explains potential effects of speculation, and how undesirable ++effects can be mitigated portably using common APIs. ++ ++=========== ++Speculation ++=========== ++ ++To improve performance and minimize average latencies, many contemporary CPUs ++employ speculative execution techniques such as branch prediction, performing ++work which may be discarded at a later stage. ++ ++Typically speculative execution cannot be observed from architectural state, ++such as the contents of registers. However, in some cases it is possible to ++observe its impact on microarchitectural state, such as the presence or ++absence of data in caches. Such state may form side-channels which can be ++observed to extract secret information. ++ ++For example, in the presence of branch prediction, it is possible for bounds ++checks to be ignored by code which is speculatively executed. Consider the ++following code: ++ ++ int load_array(int *array, unsigned int index) ++ { ++ if (index >= MAX_ARRAY_ELEMS) ++ return 0; ++ else ++ return array[index]; ++ } ++ ++Which, on arm64, may be compiled to an assembly sequence such as: ++ ++ CMP , #MAX_ARRAY_ELEMS ++ B.LT less ++ MOV , #0 ++ RET ++ less: ++ LDR , [, ] ++ RET ++ ++It is possible that a CPU mis-predicts the conditional branch, and ++speculatively loads array[index], even if index >= MAX_ARRAY_ELEMS. This ++value will subsequently be discarded, but the speculated load may affect ++microarchitectural state which can be subsequently measured. ++ ++More complex sequences involving multiple dependent memory accesses may ++result in sensitive information being leaked. Consider the following ++code, building on the prior example: ++ ++ int load_dependent_arrays(int *arr1, int *arr2, int index) ++ { ++ int val1, val2, ++ ++ val1 = load_array(arr1, index); ++ val2 = load_array(arr2, val1); ++ ++ return val2; ++ } ++ ++Under speculation, the first call to load_array() may return the value ++of an out-of-bounds address, while the second call will influence ++microarchitectural state dependent on this value. This may provide an ++arbitrary read primitive. ++ ++==================================== ++Mitigating speculation side-channels ++==================================== ++ ++The kernel provides a generic API to ensure that bounds checks are ++respected even under speculation. Architectures which are affected by ++speculation-based side-channels are expected to implement these ++primitives. ++ ++The array_index_nospec() helper in can be used to ++prevent information from being leaked via side-channels. ++ ++A call to array_index_nospec(index, size) returns a sanitized index ++value that is bounded to [0, size) even under cpu speculation ++conditions. ++ ++This can be used to protect the earlier load_array() example: ++ ++ int load_array(int *array, unsigned int index) ++ { ++ if (index >= MAX_ARRAY_ELEMS) ++ return 0; ++ else { ++ index = array_index_nospec(index, MAX_ARRAY_ELEMS); ++ return array[index]; ++ } ++ } diff --git a/queue-4.14/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch b/queue-4.14/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch new file mode 100644 index 00000000000..3b996f999f7 --- /dev/null +++ b/queue-4.14/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch @@ -0,0 +1,189 @@ +Subject: KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL +From: KarimAllah Ahmed karahmed@amazon.de +Date: Sat Feb 3 15:56:23 2018 +0100 + +From: KarimAllah Ahmed karahmed@amazon.de + +commit b2ac58f90540e39324e7a29a7ad471407ae0bf48 + +[ Based on a patch from Paolo Bonzini ] + +... basically doing exactly what we do for VMX: + +- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID) +- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest + actually used it. + +Signed-off-by: KarimAllah Ahmed +Signed-off-by: David Woodhouse +Signed-off-by: Thomas Gleixner +Reviewed-by: Darren Kenny +Reviewed-by: Konrad Rzeszutek Wilk +Cc: Andrea Arcangeli +Cc: Andi Kleen +Cc: Jun Nakajima +Cc: kvm@vger.kernel.org +Cc: Dave Hansen +Cc: Tim Chen +Cc: Andy Lutomirski +Cc: Asit Mallick +Cc: Arjan Van De Ven +Cc: Greg KH +Cc: Paolo Bonzini +Cc: Dan Williams +Cc: Linus Torvalds +Cc: Ashok Raj +Link: https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karahmed@amazon.de +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/kvm/svm.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 88 insertions(+) + +--- a/arch/x86/kvm/svm.c ++++ b/arch/x86/kvm/svm.c +@@ -184,6 +184,8 @@ struct vcpu_svm { + u64 gs_base; + } host; + ++ u64 spec_ctrl; ++ + u32 *msrpm; + + ulong nmi_iret_rip; +@@ -249,6 +251,7 @@ static const struct svm_direct_access_ms + { .index = MSR_CSTAR, .always = true }, + { .index = MSR_SYSCALL_MASK, .always = true }, + #endif ++ { .index = MSR_IA32_SPEC_CTRL, .always = false }, + { .index = MSR_IA32_PRED_CMD, .always = false }, + { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, + { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, +@@ -882,6 +885,25 @@ static bool valid_msr_intercept(u32 inde + return false; + } + ++static bool msr_write_intercepted(struct kvm_vcpu *vcpu, unsigned msr) ++{ ++ u8 bit_write; ++ unsigned long tmp; ++ u32 offset; ++ u32 *msrpm; ++ ++ msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm: ++ to_svm(vcpu)->msrpm; ++ ++ offset = svm_msrpm_offset(msr); ++ bit_write = 2 * (msr & 0x0f) + 1; ++ tmp = msrpm[offset]; ++ ++ BUG_ON(offset == MSR_INVALID); ++ ++ return !!test_bit(bit_write, &tmp); ++} ++ + static void set_msr_interception(u32 *msrpm, unsigned msr, + int read, int write) + { +@@ -1587,6 +1609,8 @@ static void svm_vcpu_reset(struct kvm_vc + u32 dummy; + u32 eax = 1; + ++ svm->spec_ctrl = 0; ++ + if (!init_event) { + svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE | + MSR_IA32_APICBASE_ENABLE; +@@ -3591,6 +3615,13 @@ static int svm_get_msr(struct kvm_vcpu * + case MSR_VM_CR: + msr_info->data = svm->nested.vm_cr_msr; + break; ++ case MSR_IA32_SPEC_CTRL: ++ if (!msr_info->host_initiated && ++ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) ++ return 1; ++ ++ msr_info->data = svm->spec_ctrl; ++ break; + case MSR_IA32_UCODE_REV: + msr_info->data = 0x01000065; + break; +@@ -3682,6 +3713,33 @@ static int svm_set_msr(struct kvm_vcpu * + case MSR_IA32_TSC: + kvm_write_tsc(vcpu, msr); + break; ++ case MSR_IA32_SPEC_CTRL: ++ if (!msr->host_initiated && ++ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) ++ return 1; ++ ++ /* The STIBP bit doesn't fault even if it's not advertised */ ++ if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) ++ return 1; ++ ++ svm->spec_ctrl = data; ++ ++ if (!data) ++ break; ++ ++ /* ++ * For non-nested: ++ * When it's written (to non-zero) for the first time, pass ++ * it through. ++ * ++ * For nested: ++ * The handling of the MSR bitmap for L2 guests is done in ++ * nested_svm_vmrun_msrpm. ++ * We update the L1 MSR bit as well since it will end up ++ * touching the MSR anyway now. ++ */ ++ set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1); ++ break; + case MSR_IA32_PRED_CMD: + if (!msr->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) +@@ -4950,6 +5008,15 @@ static void svm_vcpu_run(struct kvm_vcpu + + local_irq_enable(); + ++ /* ++ * If this vCPU has touched SPEC_CTRL, restore the guest's value if ++ * it's non-zero. Since vmentry is serialising on affected CPUs, there ++ * is no need to worry about the conditional branch over the wrmsr ++ * being speculatively taken. ++ */ ++ if (svm->spec_ctrl) ++ wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); ++ + asm volatile ( + "push %%" _ASM_BP "; \n\t" + "mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t" +@@ -5042,6 +5109,27 @@ static void svm_vcpu_run(struct kvm_vcpu + #endif + ); + ++ /* ++ * We do not use IBRS in the kernel. If this vCPU has used the ++ * SPEC_CTRL MSR it may have left it on; save the value and ++ * turn it off. This is much more efficient than blindly adding ++ * it to the atomic save/restore list. Especially as the former ++ * (Saving guest MSRs on vmexit) doesn't even exist in KVM. ++ * ++ * For non-nested case: ++ * If the L01 MSR bitmap does not intercept the MSR, then we need to ++ * save it. ++ * ++ * For nested case: ++ * If the L02 MSR bitmap does not intercept the MSR, then we need to ++ * save it. ++ */ ++ if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) ++ rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); ++ ++ if (svm->spec_ctrl) ++ wrmsrl(MSR_IA32_SPEC_CTRL, 0); ++ + /* Eliminate branch target predictions from guest mode */ + vmexit_fill_RSB(); + diff --git a/queue-4.14/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch b/queue-4.14/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch new file mode 100644 index 00000000000..93511142ab8 --- /dev/null +++ b/queue-4.14/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch @@ -0,0 +1,278 @@ +Subject: KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL +From: KarimAllah Ahmed karahmed@amazon.de +Date: Thu Feb 1 22:59:45 2018 +0100 + +From: KarimAllah Ahmed karahmed@amazon.de + +commit d28b387fb74da95d69d2615732f50cceb38e9a4d + +[ Based on a patch from Ashok Raj ] + +Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for +guests that will only mitigate Spectre V2 through IBRS+IBPB and will not +be using a retpoline+IBPB based approach. + +To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for +guests that do not actually use the MSR, only start saving and restoring +when a non-zero is written to it. + +No attempt is made to handle STIBP here, intentionally. Filtering STIBP +may be added in a future patch, which may require trapping all writes +if we don't want to pass it through directly to the guest. + +[dwmw2: Clean up CPUID bits, save/restore manually, handle reset] + +Signed-off-by: KarimAllah Ahmed +Signed-off-by: David Woodhouse +Signed-off-by: Thomas Gleixner +Reviewed-by: Darren Kenny +Reviewed-by: Konrad Rzeszutek Wilk +Reviewed-by: Jim Mattson +Cc: Andrea Arcangeli +Cc: Andi Kleen +Cc: Jun Nakajima +Cc: kvm@vger.kernel.org +Cc: Dave Hansen +Cc: Tim Chen +Cc: Andy Lutomirski +Cc: Asit Mallick +Cc: Arjan Van De Ven +Cc: Greg KH +Cc: Paolo Bonzini +Cc: Dan Williams +Cc: Linus Torvalds +Cc: Ashok Raj +Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon.de +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/kvm/cpuid.c | 9 ++-- + arch/x86/kvm/vmx.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++- + arch/x86/kvm/x86.c | 2 + 3 files changed, 110 insertions(+), 6 deletions(-) + +--- a/arch/x86/kvm/cpuid.c ++++ b/arch/x86/kvm/cpuid.c +@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct + + /* cpuid 0x80000008.ebx */ + const u32 kvm_cpuid_8000_0008_ebx_x86_features = +- F(IBPB); ++ F(IBPB) | F(IBRS); + + /* cpuid 0xC0000001.edx */ + const u32 kvm_cpuid_C000_0001_edx_x86_features = +@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct + + /* cpuid 7.0.edx*/ + const u32 kvm_cpuid_7_0_edx_x86_features = +- F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); ++ F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | ++ F(ARCH_CAPABILITIES); + + /* all calls to cpuid_count() should be made on the same cpu */ + get_cpu(); +@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct + g_phys_as = phys_as; + entry->eax = g_phys_as | (virt_as << 8); + entry->edx = 0; +- /* IBPB isn't necessarily present in hardware cpuid */ ++ /* IBRS and IBPB aren't necessarily present in hardware cpuid */ + if (boot_cpu_has(X86_FEATURE_IBPB)) + entry->ebx |= F(IBPB); ++ if (boot_cpu_has(X86_FEATURE_IBRS)) ++ entry->ebx |= F(IBRS); + entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; + cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); + break; +--- a/arch/x86/kvm/vmx.c ++++ b/arch/x86/kvm/vmx.c +@@ -584,6 +584,7 @@ struct vcpu_vmx { + #endif + + u64 arch_capabilities; ++ u64 spec_ctrl; + + u32 vm_entry_controls_shadow; + u32 vm_exit_controls_shadow; +@@ -1906,6 +1907,29 @@ static void update_exception_bitmap(stru + } + + /* ++ * Check if MSR is intercepted for currently loaded MSR bitmap. ++ */ ++static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr) ++{ ++ unsigned long *msr_bitmap; ++ int f = sizeof(unsigned long); ++ ++ if (!cpu_has_vmx_msr_bitmap()) ++ return true; ++ ++ msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap; ++ ++ if (msr <= 0x1fff) { ++ return !!test_bit(msr, msr_bitmap + 0x800 / f); ++ } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) { ++ msr &= 0x1fff; ++ return !!test_bit(msr, msr_bitmap + 0xc00 / f); ++ } ++ ++ return true; ++} ++ ++/* + * Check if MSR is intercepted for L01 MSR bitmap. + */ + static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr) +@@ -3259,6 +3283,14 @@ static int vmx_get_msr(struct kvm_vcpu * + case MSR_IA32_TSC: + msr_info->data = guest_read_tsc(vcpu); + break; ++ case MSR_IA32_SPEC_CTRL: ++ if (!msr_info->host_initiated && ++ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) && ++ !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) ++ return 1; ++ ++ msr_info->data = to_vmx(vcpu)->spec_ctrl; ++ break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) +@@ -3372,6 +3404,37 @@ static int vmx_set_msr(struct kvm_vcpu * + case MSR_IA32_TSC: + kvm_write_tsc(vcpu, msr_info); + break; ++ case MSR_IA32_SPEC_CTRL: ++ if (!msr_info->host_initiated && ++ !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) && ++ !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) ++ return 1; ++ ++ /* The STIBP bit doesn't fault even if it's not advertised */ ++ if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) ++ return 1; ++ ++ vmx->spec_ctrl = data; ++ ++ if (!data) ++ break; ++ ++ /* ++ * For non-nested: ++ * When it's written (to non-zero) for the first time, pass ++ * it through. ++ * ++ * For nested: ++ * The handling of the MSR bitmap for L2 guests is done in ++ * nested_vmx_merge_msr_bitmap. We should not touch the ++ * vmcs02.msr_bitmap here since it gets completely overwritten ++ * in the merging. We update the vmcs01 here for L1 as well ++ * since it will end up touching the MSR anyway now. ++ */ ++ vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, ++ MSR_IA32_SPEC_CTRL, ++ MSR_TYPE_RW); ++ break; + case MSR_IA32_PRED_CMD: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) && +@@ -5697,6 +5760,7 @@ static void vmx_vcpu_reset(struct kvm_vc + u64 cr0; + + vmx->rmode.vm86_active = 0; ++ vmx->spec_ctrl = 0; + + vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val(); + kvm_set_cr8(vcpu, 0); +@@ -9360,6 +9424,15 @@ static void __noclone vmx_vcpu_run(struc + + vmx_arm_hv_timer(vcpu); + ++ /* ++ * If this vCPU has touched SPEC_CTRL, restore the guest's value if ++ * it's non-zero. Since vmentry is serialising on affected CPUs, there ++ * is no need to worry about the conditional branch over the wrmsr ++ * being speculatively taken. ++ */ ++ if (vmx->spec_ctrl) ++ wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); ++ + vmx->__launched = vmx->loaded_vmcs->launched; + asm( + /* Store host registers */ +@@ -9478,6 +9551,27 @@ static void __noclone vmx_vcpu_run(struc + #endif + ); + ++ /* ++ * We do not use IBRS in the kernel. If this vCPU has used the ++ * SPEC_CTRL MSR it may have left it on; save the value and ++ * turn it off. This is much more efficient than blindly adding ++ * it to the atomic save/restore list. Especially as the former ++ * (Saving guest MSRs on vmexit) doesn't even exist in KVM. ++ * ++ * For non-nested case: ++ * If the L01 MSR bitmap does not intercept the MSR, then we need to ++ * save it. ++ * ++ * For nested case: ++ * If the L02 MSR bitmap does not intercept the MSR, then we need to ++ * save it. ++ */ ++ if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) ++ rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); ++ ++ if (vmx->spec_ctrl) ++ wrmsrl(MSR_IA32_SPEC_CTRL, 0); ++ + /* Eliminate branch target predictions from guest mode */ + vmexit_fill_RSB(); + +@@ -10109,7 +10203,7 @@ static inline bool nested_vmx_merge_msr_ + unsigned long *msr_bitmap_l1; + unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; + /* +- * pred_cmd is trying to verify two things: ++ * pred_cmd & spec_ctrl are trying to verify two things: + * + * 1. L0 gave a permission to L1 to actually passthrough the MSR. This + * ensures that we do not accidentally generate an L02 MSR bitmap +@@ -10122,9 +10216,10 @@ static inline bool nested_vmx_merge_msr_ + * the MSR. + */ + bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD); ++ bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL); + + if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && +- !pred_cmd) ++ !pred_cmd && !spec_ctrl) + return false; + + page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap); +@@ -10158,6 +10253,12 @@ static inline bool nested_vmx_merge_msr_ + } + } + ++ if (spec_ctrl) ++ nested_vmx_disable_intercept_for_msr( ++ msr_bitmap_l1, msr_bitmap_l0, ++ MSR_IA32_SPEC_CTRL, ++ MSR_TYPE_R | MSR_TYPE_W); ++ + if (pred_cmd) + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, +--- a/arch/x86/kvm/x86.c ++++ b/arch/x86/kvm/x86.c +@@ -1006,7 +1006,7 @@ static u32 msrs_to_save[] = { + #endif + MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, + MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, +- MSR_IA32_ARCH_CAPABILITIES ++ MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES + }; + + static unsigned num_msrs_to_save; diff --git a/queue-4.14/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch b/queue-4.14/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch new file mode 100644 index 00000000000..33de208b6cd --- /dev/null +++ b/queue-4.14/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch @@ -0,0 +1,111 @@ +Subject: KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES +From: KarimAllah Ahmed karahmed@amazon.de +Date: Thu Feb 1 22:59:44 2018 +0100 + +From: KarimAllah Ahmed karahmed@amazon.de + +commit 28c1c9fabf48d6ad596273a11c46e0d0da3e14cd + +Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO +(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the +contents will come directly from the hardware, but user-space can still +override it. + +[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional] + +Signed-off-by: KarimAllah Ahmed +Signed-off-by: David Woodhouse +Signed-off-by: Thomas Gleixner +Reviewed-by: Paolo Bonzini +Reviewed-by: Darren Kenny +Reviewed-by: Jim Mattson +Reviewed-by: Konrad Rzeszutek Wilk +Cc: Andrea Arcangeli +Cc: Andi Kleen +Cc: Jun Nakajima +Cc: kvm@vger.kernel.org +Cc: Dave Hansen +Cc: Linus Torvalds +Cc: Andy Lutomirski +Cc: Asit Mallick +Cc: Arjan Van De Ven +Cc: Greg KH +Cc: Dan Williams +Cc: Tim Chen +Cc: Ashok Raj +Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon.de +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/kvm/cpuid.c | 2 +- + arch/x86/kvm/vmx.c | 15 +++++++++++++++ + arch/x86/kvm/x86.c | 1 + + 3 files changed, 17 insertions(+), 1 deletion(-) + +--- a/arch/x86/kvm/cpuid.c ++++ b/arch/x86/kvm/cpuid.c +@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct + + /* cpuid 7.0.edx*/ + const u32 kvm_cpuid_7_0_edx_x86_features = +- F(AVX512_4VNNIW) | F(AVX512_4FMAPS); ++ F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); + + /* all calls to cpuid_count() should be made on the same cpu */ + get_cpu(); +--- a/arch/x86/kvm/vmx.c ++++ b/arch/x86/kvm/vmx.c +@@ -583,6 +583,8 @@ struct vcpu_vmx { + u64 msr_guest_kernel_gs_base; + #endif + ++ u64 arch_capabilities; ++ + u32 vm_entry_controls_shadow; + u32 vm_exit_controls_shadow; + u32 secondary_exec_control; +@@ -3257,6 +3259,12 @@ static int vmx_get_msr(struct kvm_vcpu * + case MSR_IA32_TSC: + msr_info->data = guest_read_tsc(vcpu); + break; ++ case MSR_IA32_ARCH_CAPABILITIES: ++ if (!msr_info->host_initiated && ++ !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) ++ return 1; ++ msr_info->data = to_vmx(vcpu)->arch_capabilities; ++ break; + case MSR_IA32_SYSENTER_CS: + msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); + break; +@@ -3392,6 +3400,11 @@ static int vmx_set_msr(struct kvm_vcpu * + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, + MSR_TYPE_W); + break; ++ case MSR_IA32_ARCH_CAPABILITIES: ++ if (!msr_info->host_initiated) ++ return 1; ++ vmx->arch_capabilities = data; ++ break; + case MSR_IA32_CR_PAT: + if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { + if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) +@@ -5652,6 +5665,8 @@ static int vmx_vcpu_setup(struct vcpu_vm + ++vmx->nmsrs; + } + ++ if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) ++ rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities); + + vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl); + +--- a/arch/x86/kvm/x86.c ++++ b/arch/x86/kvm/x86.c +@@ -1006,6 +1006,7 @@ static u32 msrs_to_save[] = { + #endif + MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, + MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, ++ MSR_IA32_ARCH_CAPABILITIES + }; + + static unsigned num_msrs_to_save; diff --git a/queue-4.14/KVM_VMX_introduce_alloc_loaded_vmcs.patch b/queue-4.14/KVM_VMX_introduce_alloc_loaded_vmcs.patch new file mode 100644 index 00000000000..48dd2474018 --- /dev/null +++ b/queue-4.14/KVM_VMX_introduce_alloc_loaded_vmcs.patch @@ -0,0 +1,89 @@ +Subject: KVM: VMX: introduce alloc_loaded_vmcs +From: Paolo Bonzini pbonzini@redhat.com +Date: Thu Jan 11 12:16:15 2018 +0100 + +From: Paolo Bonzini pbonzini@redhat.com + +commit f21f165ef922c2146cc5bdc620f542953c41714b + +Group together the calls to alloc_vmcs and loaded_vmcs_init. Soon we'll also +allocate an MSR bitmap there. + +Cc: stable@vger.kernel.org # prereq for Spectre mitigation +Signed-off-by: Paolo Bonzini +Signed-off-by: Greg Kroah-Hartman + +--- + arch/x86/kvm/vmx.c | 36 ++++++++++++++++++++++-------------- + 1 file changed, 22 insertions(+), 14 deletions(-) + +--- a/arch/x86/kvm/vmx.c ++++ b/arch/x86/kvm/vmx.c +@@ -3814,11 +3814,6 @@ static struct vmcs *alloc_vmcs_cpu(int c + return vmcs; + } + +-static struct vmcs *alloc_vmcs(void) +-{ +- return alloc_vmcs_cpu(raw_smp_processor_id()); +-} +- + static void free_vmcs(struct vmcs *vmcs) + { + free_pages((unsigned long)vmcs, vmcs_config.order); +@@ -3837,6 +3832,22 @@ static void free_loaded_vmcs(struct load + WARN_ON(loaded_vmcs->shadow_vmcs != NULL); + } + ++static struct vmcs *alloc_vmcs(void) ++{ ++ return alloc_vmcs_cpu(raw_smp_processor_id()); ++} ++ ++static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) ++{ ++ loaded_vmcs->vmcs = alloc_vmcs(); ++ if (!loaded_vmcs->vmcs) ++ return -ENOMEM; ++ ++ loaded_vmcs->shadow_vmcs = NULL; ++ loaded_vmcs_init(loaded_vmcs); ++ return 0; ++} ++ + static void free_kvm_area(void) + { + int cpu; +@@ -7135,12 +7146,11 @@ static int enter_vmx_operation(struct kv + { + struct vcpu_vmx *vmx = to_vmx(vcpu); + struct vmcs *shadow_vmcs; ++ int r; + +- vmx->nested.vmcs02.vmcs = alloc_vmcs(); +- vmx->nested.vmcs02.shadow_vmcs = NULL; +- if (!vmx->nested.vmcs02.vmcs) ++ r = alloc_loaded_vmcs(&vmx->nested.vmcs02); ++ if (r < 0) + goto out_vmcs02; +- loaded_vmcs_init(&vmx->nested.vmcs02); + + if (cpu_has_vmx_msr_bitmap()) { + vmx->nested.msr_bitmap = +@@ -9535,13 +9545,11 @@ static struct kvm_vcpu *vmx_create_vcpu( + if (!vmx->guest_msrs) + goto free_pml; + +- vmx->loaded_vmcs = &vmx->vmcs01; +- vmx->loaded_vmcs->vmcs = alloc_vmcs(); +- vmx->loaded_vmcs->shadow_vmcs = NULL; +- if (!vmx->loaded_vmcs->vmcs) ++ err = alloc_loaded_vmcs(&vmx->vmcs01); ++ if (err < 0) + goto free_msrs; +- loaded_vmcs_init(vmx->loaded_vmcs); + ++ vmx->loaded_vmcs = &vmx->vmcs01; + cpu = get_cpu(); + vmx_vcpu_load(&vmx->vcpu, cpu); + vmx->vcpu.cpu = cpu; diff --git a/queue-4.14/KVM_VMX_make_MSR_bitmaps_per-VCPU.patch b/queue-4.14/KVM_VMX_make_MSR_bitmaps_per-VCPU.patch new file mode 100644 index 00000000000..7ae3a985fa9 --- /dev/null +++ b/queue-4.14/KVM_VMX_make_MSR_bitmaps_per-VCPU.patch @@ -0,0 +1,504 @@ +Subject: KVM: VMX: make MSR bitmaps per-VCPU +From: Paolo Bonzini pbonzini@redhat.com +Date: Tue Jan 16 16:51:18 2018 +0100 + +From: Paolo Bonzini pbonzini@redhat.com + +commit 904e14fb7cb96401a7dc803ca2863fd5ba32ffe6 + +Place the MSR bitmap in struct loaded_vmcs, and update it in place +every time the x2apic or APICv state can change. This is rare and +the loop can handle 64 MSRs per iteration, in a similar fashion as +nested_vmx_prepare_msr_bitmap. + +This prepares for choosing, on a per-VM basis, whether to intercept +the SPEC_CTRL and PRED_CMD MSRs. + +Cc: stable@vger.kernel.org # prereq for Spectre mitigation +Suggested-by: Jim Mattson +Signed-off-by: Paolo Bonzini +Signed-off-by: Greg Kroah-Hartman + +--- + arch/x86/kvm/vmx.c | 276 ++++++++++++++++++++++++++++------------------------- + 1 file changed, 150 insertions(+), 126 deletions(-) + +--- a/arch/x86/kvm/vmx.c ++++ b/arch/x86/kvm/vmx.c +@@ -108,6 +108,14 @@ static u64 __read_mostly host_xss; + static bool __read_mostly enable_pml = 1; + module_param_named(pml, enable_pml, bool, S_IRUGO); + ++#define MSR_TYPE_R 1 ++#define MSR_TYPE_W 2 ++#define MSR_TYPE_RW 3 ++ ++#define MSR_BITMAP_MODE_X2APIC 1 ++#define MSR_BITMAP_MODE_X2APIC_APICV 2 ++#define MSR_BITMAP_MODE_LM 4 ++ + #define KVM_VMX_TSC_MULTIPLIER_MAX 0xffffffffffffffffULL + + /* Guest_tsc -> host_tsc conversion requires 64-bit division. */ +@@ -206,6 +214,7 @@ struct loaded_vmcs { + int soft_vnmi_blocked; + ktime_t entry_time; + s64 vnmi_blocked_time; ++ unsigned long *msr_bitmap; + struct list_head loaded_vmcss_on_cpu_link; + }; + +@@ -446,8 +455,6 @@ struct nested_vmx { + bool pi_pending; + u16 posted_intr_nv; + +- unsigned long *msr_bitmap; +- + struct hrtimer preemption_timer; + bool preemption_timer_expired; + +@@ -562,6 +569,7 @@ struct vcpu_vmx { + struct kvm_vcpu vcpu; + unsigned long host_rsp; + u8 fail; ++ u8 msr_bitmap_mode; + u32 exit_intr_info; + u32 idt_vectoring_info; + ulong rflags; +@@ -919,6 +927,7 @@ static bool vmx_get_nmi_mask(struct kvm_ + static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); + static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, + u16 error_code); ++static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); + + static DEFINE_PER_CPU(struct vmcs *, vmxarea); + static DEFINE_PER_CPU(struct vmcs *, current_vmcs); +@@ -938,12 +947,6 @@ static DEFINE_PER_CPU(spinlock_t, blocke + enum { + VMX_IO_BITMAP_A, + VMX_IO_BITMAP_B, +- VMX_MSR_BITMAP_LEGACY, +- VMX_MSR_BITMAP_LONGMODE, +- VMX_MSR_BITMAP_LEGACY_X2APIC_APICV, +- VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV, +- VMX_MSR_BITMAP_LEGACY_X2APIC, +- VMX_MSR_BITMAP_LONGMODE_X2APIC, + VMX_VMREAD_BITMAP, + VMX_VMWRITE_BITMAP, + VMX_BITMAP_NR +@@ -953,12 +956,6 @@ static unsigned long *vmx_bitmap[VMX_BIT + + #define vmx_io_bitmap_a (vmx_bitmap[VMX_IO_BITMAP_A]) + #define vmx_io_bitmap_b (vmx_bitmap[VMX_IO_BITMAP_B]) +-#define vmx_msr_bitmap_legacy (vmx_bitmap[VMX_MSR_BITMAP_LEGACY]) +-#define vmx_msr_bitmap_longmode (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE]) +-#define vmx_msr_bitmap_legacy_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC_APICV]) +-#define vmx_msr_bitmap_longmode_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV]) +-#define vmx_msr_bitmap_legacy_x2apic (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC]) +-#define vmx_msr_bitmap_longmode_x2apic (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC]) + #define vmx_vmread_bitmap (vmx_bitmap[VMX_VMREAD_BITMAP]) + #define vmx_vmwrite_bitmap (vmx_bitmap[VMX_VMWRITE_BITMAP]) + +@@ -2559,36 +2556,6 @@ static void move_msr_up(struct vcpu_vmx + vmx->guest_msrs[from] = tmp; + } + +-static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu) +-{ +- unsigned long *msr_bitmap; +- +- if (is_guest_mode(vcpu)) +- msr_bitmap = to_vmx(vcpu)->nested.msr_bitmap; +- else if (cpu_has_secondary_exec_ctrls() && +- (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & +- SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { +- if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) { +- if (is_long_mode(vcpu)) +- msr_bitmap = vmx_msr_bitmap_longmode_x2apic_apicv; +- else +- msr_bitmap = vmx_msr_bitmap_legacy_x2apic_apicv; +- } else { +- if (is_long_mode(vcpu)) +- msr_bitmap = vmx_msr_bitmap_longmode_x2apic; +- else +- msr_bitmap = vmx_msr_bitmap_legacy_x2apic; +- } +- } else { +- if (is_long_mode(vcpu)) +- msr_bitmap = vmx_msr_bitmap_longmode; +- else +- msr_bitmap = vmx_msr_bitmap_legacy; +- } +- +- vmcs_write64(MSR_BITMAP, __pa(msr_bitmap)); +-} +- + /* + * Set up the vmcs to automatically save and restore system + * msrs. Don't touch the 64-bit msrs if the guest is in legacy +@@ -2629,7 +2596,7 @@ static void setup_msrs(struct vcpu_vmx * + vmx->save_nmsrs = save_nmsrs; + + if (cpu_has_vmx_msr_bitmap()) +- vmx_set_msr_bitmap(&vmx->vcpu); ++ vmx_update_msr_bitmap(&vmx->vcpu); + } + + /* +@@ -3829,6 +3796,8 @@ static void free_loaded_vmcs(struct load + loaded_vmcs_clear(loaded_vmcs); + free_vmcs(loaded_vmcs->vmcs); + loaded_vmcs->vmcs = NULL; ++ if (loaded_vmcs->msr_bitmap) ++ free_page((unsigned long)loaded_vmcs->msr_bitmap); + WARN_ON(loaded_vmcs->shadow_vmcs != NULL); + } + +@@ -3845,7 +3814,18 @@ static int alloc_loaded_vmcs(struct load + + loaded_vmcs->shadow_vmcs = NULL; + loaded_vmcs_init(loaded_vmcs); ++ ++ if (cpu_has_vmx_msr_bitmap()) { ++ loaded_vmcs->msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); ++ if (!loaded_vmcs->msr_bitmap) ++ goto out_vmcs; ++ memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE); ++ } + return 0; ++ ++out_vmcs: ++ free_loaded_vmcs(loaded_vmcs); ++ return -ENOMEM; + } + + static void free_kvm_area(void) +@@ -4920,10 +4900,8 @@ static void free_vpid(int vpid) + spin_unlock(&vmx_vpid_lock); + } + +-#define MSR_TYPE_R 1 +-#define MSR_TYPE_W 2 +-static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, +- u32 msr, int type) ++static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, ++ u32 msr, int type) + { + int f = sizeof(unsigned long); + +@@ -4957,6 +4935,50 @@ static void __vmx_disable_intercept_for_ + } + } + ++static void __always_inline vmx_enable_intercept_for_msr(unsigned long *msr_bitmap, ++ u32 msr, int type) ++{ ++ int f = sizeof(unsigned long); ++ ++ if (!cpu_has_vmx_msr_bitmap()) ++ return; ++ ++ /* ++ * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals ++ * have the write-low and read-high bitmap offsets the wrong way round. ++ * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff. ++ */ ++ if (msr <= 0x1fff) { ++ if (type & MSR_TYPE_R) ++ /* read-low */ ++ __set_bit(msr, msr_bitmap + 0x000 / f); ++ ++ if (type & MSR_TYPE_W) ++ /* write-low */ ++ __set_bit(msr, msr_bitmap + 0x800 / f); ++ ++ } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) { ++ msr &= 0x1fff; ++ if (type & MSR_TYPE_R) ++ /* read-high */ ++ __set_bit(msr, msr_bitmap + 0x400 / f); ++ ++ if (type & MSR_TYPE_W) ++ /* write-high */ ++ __set_bit(msr, msr_bitmap + 0xc00 / f); ++ ++ } ++} ++ ++static void __always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap, ++ u32 msr, int type, bool value) ++{ ++ if (value) ++ vmx_enable_intercept_for_msr(msr_bitmap, msr, type); ++ else ++ vmx_disable_intercept_for_msr(msr_bitmap, msr, type); ++} ++ + /* + * If a msr is allowed by L0, we should check whether it is allowed by L1. + * The corresponding bit will be cleared unless both of L0 and L1 allow it. +@@ -5003,28 +5025,68 @@ static void nested_vmx_disable_intercept + } + } + +-static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only) ++static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu) + { +- if (!longmode_only) +- __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, +- msr, MSR_TYPE_R | MSR_TYPE_W); +- __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, +- msr, MSR_TYPE_R | MSR_TYPE_W); +-} +- +-static void vmx_disable_intercept_msr_x2apic(u32 msr, int type, bool apicv_active) +-{ +- if (apicv_active) { +- __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic_apicv, +- msr, type); +- __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic_apicv, +- msr, type); +- } else { +- __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, +- msr, type); +- __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, +- msr, type); ++ u8 mode = 0; ++ ++ if (cpu_has_secondary_exec_ctrls() && ++ (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & ++ SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { ++ mode |= MSR_BITMAP_MODE_X2APIC; ++ if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) ++ mode |= MSR_BITMAP_MODE_X2APIC_APICV; ++ } ++ ++ if (is_long_mode(vcpu)) ++ mode |= MSR_BITMAP_MODE_LM; ++ ++ return mode; ++} ++ ++#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4)) ++ ++static void vmx_update_msr_bitmap_x2apic(unsigned long *msr_bitmap, ++ u8 mode) ++{ ++ int msr; ++ ++ for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { ++ unsigned word = msr / BITS_PER_LONG; ++ msr_bitmap[word] = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0; ++ msr_bitmap[word + (0x800 / sizeof(long))] = ~0; + } ++ ++ if (mode & MSR_BITMAP_MODE_X2APIC) { ++ /* ++ * TPR reads and writes can be virtualized even if virtual interrupt ++ * delivery is not in use. ++ */ ++ vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW); ++ if (mode & MSR_BITMAP_MODE_X2APIC_APICV) { ++ vmx_enable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R); ++ vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_EOI), MSR_TYPE_W); ++ vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W); ++ } ++ } ++} ++ ++static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu) ++{ ++ struct vcpu_vmx *vmx = to_vmx(vcpu); ++ unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap; ++ u8 mode = vmx_msr_bitmap_mode(vcpu); ++ u8 changed = mode ^ vmx->msr_bitmap_mode; ++ ++ if (!changed) ++ return; ++ ++ vmx_set_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW, ++ !(mode & MSR_BITMAP_MODE_LM)); ++ ++ if (changed & (MSR_BITMAP_MODE_X2APIC | MSR_BITMAP_MODE_X2APIC_APICV)) ++ vmx_update_msr_bitmap_x2apic(msr_bitmap, mode); ++ ++ vmx->msr_bitmap_mode = mode; + } + + static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu) +@@ -5272,7 +5334,7 @@ static void vmx_refresh_apicv_exec_ctrl( + } + + if (cpu_has_vmx_msr_bitmap()) +- vmx_set_msr_bitmap(vcpu); ++ vmx_update_msr_bitmap(vcpu); + } + + static u32 vmx_exec_control(struct vcpu_vmx *vmx) +@@ -5459,7 +5521,7 @@ static int vmx_vcpu_setup(struct vcpu_vm + vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap)); + } + if (cpu_has_vmx_msr_bitmap()) +- vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy)); ++ vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap)); + + vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */ + +@@ -6742,7 +6804,7 @@ void vmx_enable_tdp(void) + + static __init int hardware_setup(void) + { +- int r = -ENOMEM, i, msr; ++ int r = -ENOMEM, i; + + rdmsrl_safe(MSR_EFER, &host_efer); + +@@ -6763,9 +6825,6 @@ static __init int hardware_setup(void) + + memset(vmx_io_bitmap_b, 0xff, PAGE_SIZE); + +- memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE); +- memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE); +- + if (setup_vmcs_config(&vmcs_config) < 0) { + r = -EIO; + goto out; +@@ -6828,42 +6887,8 @@ static __init int hardware_setup(void) + kvm_tsc_scaling_ratio_frac_bits = 48; + } + +- vmx_disable_intercept_for_msr(MSR_FS_BASE, false); +- vmx_disable_intercept_for_msr(MSR_GS_BASE, false); +- vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true); +- vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false); +- vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false); +- vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false); +- +- memcpy(vmx_msr_bitmap_legacy_x2apic_apicv, +- vmx_msr_bitmap_legacy, PAGE_SIZE); +- memcpy(vmx_msr_bitmap_longmode_x2apic_apicv, +- vmx_msr_bitmap_longmode, PAGE_SIZE); +- memcpy(vmx_msr_bitmap_legacy_x2apic, +- vmx_msr_bitmap_legacy, PAGE_SIZE); +- memcpy(vmx_msr_bitmap_longmode_x2apic, +- vmx_msr_bitmap_longmode, PAGE_SIZE); +- + set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */ + +- for (msr = 0x800; msr <= 0x8ff; msr++) { +- if (msr == 0x839 /* TMCCT */) +- continue; +- vmx_disable_intercept_msr_x2apic(msr, MSR_TYPE_R, true); +- } +- +- /* +- * TPR reads and writes can be virtualized even if virtual interrupt +- * delivery is not in use. +- */ +- vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_W, true); +- vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_R | MSR_TYPE_W, false); +- +- /* EOI */ +- vmx_disable_intercept_msr_x2apic(0x80b, MSR_TYPE_W, true); +- /* SELF-IPI */ +- vmx_disable_intercept_msr_x2apic(0x83f, MSR_TYPE_W, true); +- + if (enable_ept) + vmx_enable_tdp(); + else +@@ -7152,13 +7177,6 @@ static int enter_vmx_operation(struct kv + if (r < 0) + goto out_vmcs02; + +- if (cpu_has_vmx_msr_bitmap()) { +- vmx->nested.msr_bitmap = +- (unsigned long *)__get_free_page(GFP_KERNEL); +- if (!vmx->nested.msr_bitmap) +- goto out_msr_bitmap; +- } +- + vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL); + if (!vmx->nested.cached_vmcs12) + goto out_cached_vmcs12; +@@ -7185,9 +7203,6 @@ out_shadow_vmcs: + kfree(vmx->nested.cached_vmcs12); + + out_cached_vmcs12: +- free_page((unsigned long)vmx->nested.msr_bitmap); +- +-out_msr_bitmap: + free_loaded_vmcs(&vmx->nested.vmcs02); + + out_vmcs02: +@@ -7332,10 +7347,6 @@ static void free_nested(struct vcpu_vmx + free_vpid(vmx->nested.vpid02); + vmx->nested.posted_intr_nv = -1; + vmx->nested.current_vmptr = -1ull; +- if (vmx->nested.msr_bitmap) { +- free_page((unsigned long)vmx->nested.msr_bitmap); +- vmx->nested.msr_bitmap = NULL; +- } + if (enable_shadow_vmcs) { + vmx_disable_shadow_vmcs(vmx); + vmcs_clear(vmx->vmcs01.shadow_vmcs); +@@ -8851,7 +8862,7 @@ static void vmx_set_virtual_x2apic_mode( + } + vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control); + +- vmx_set_msr_bitmap(vcpu); ++ vmx_update_msr_bitmap(vcpu); + } + + static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa) +@@ -9513,6 +9524,7 @@ static struct kvm_vcpu *vmx_create_vcpu( + { + int err; + struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); ++ unsigned long *msr_bitmap; + int cpu; + + if (!vmx) +@@ -9549,6 +9561,15 @@ static struct kvm_vcpu *vmx_create_vcpu( + if (err < 0) + goto free_msrs; + ++ msr_bitmap = vmx->vmcs01.msr_bitmap; ++ vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW); ++ vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW); ++ vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW); ++ vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW); ++ vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW); ++ vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW); ++ vmx->msr_bitmap_mode = 0; ++ + vmx->loaded_vmcs = &vmx->vmcs01; + cpu = get_cpu(); + vmx_vcpu_load(&vmx->vcpu, cpu); +@@ -10018,7 +10039,7 @@ static inline bool nested_vmx_merge_msr_ + int msr; + struct page *page; + unsigned long *msr_bitmap_l1; +- unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap; ++ unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; + + /* This shortcut is ok because we support only x2APIC MSRs so far. */ + if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) +@@ -10595,6 +10616,9 @@ static int prepare_vmcs02(struct kvm_vcp + if (kvm_has_tsc_control) + decache_tsc_multiplier(vmx); + ++ if (cpu_has_vmx_msr_bitmap()) ++ vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap)); ++ + if (enable_vpid) { + /* + * There is no direct mapping between vpid02 and vpid12, the +@@ -11388,7 +11412,7 @@ static void load_vmcs12_host_state(struc + vmcs_write64(GUEST_IA32_DEBUGCTL, 0); + + if (cpu_has_vmx_msr_bitmap()) +- vmx_set_msr_bitmap(vcpu); ++ vmx_update_msr_bitmap(vcpu); + + if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr, + vmcs12->vm_exit_msr_load_count)) diff --git a/queue-4.14/KVM_nVMX_Eliminate_vmcs02_pool.patch b/queue-4.14/KVM_nVMX_Eliminate_vmcs02_pool.patch new file mode 100644 index 00000000000..8882522bf94 --- /dev/null +++ b/queue-4.14/KVM_nVMX_Eliminate_vmcs02_pool.patch @@ -0,0 +1,283 @@ +Subject: KVM: nVMX: Eliminate vmcs02 pool +From: Jim Mattson jmattson@google.com +Date: Mon Nov 27 17:22:25 2017 -0600 + +From: Jim Mattson jmattson@google.com + +commit de3a0021a60635de96aa92713c1a31a96747d72c + +The potential performance advantages of a vmcs02 pool have never been +realized. To simplify the code, eliminate the pool. Instead, a single +vmcs02 is allocated per VCPU when the VCPU enters VMX operation. + +Cc: stable@vger.kernel.org # prereq for Spectre mitigation +Signed-off-by: Jim Mattson +Signed-off-by: Mark Kanda +Reviewed-by: Ameya More +Reviewed-by: David Hildenbrand +Reviewed-by: Paolo Bonzini +Signed-off-by: Radim Krčmář +Signed-off-by: Greg Kroah-Hartman + +--- + arch/x86/kvm/vmx.c | 146 ++++++++--------------------------------------------- + 1 file changed, 23 insertions(+), 123 deletions(-) + +--- a/arch/x86/kvm/vmx.c ++++ b/arch/x86/kvm/vmx.c +@@ -182,7 +182,6 @@ module_param(ple_window_max, int, S_IRUG + extern const ulong vmx_return; + + #define NR_AUTOLOAD_MSRS 8 +-#define VMCS02_POOL_SIZE 1 + + struct vmcs { + u32 revision_id; +@@ -223,7 +222,7 @@ struct shared_msr_entry { + * stored in guest memory specified by VMPTRLD, but is opaque to the guest, + * which must access it using VMREAD/VMWRITE/VMCLEAR instructions. + * More than one of these structures may exist, if L1 runs multiple L2 guests. +- * nested_vmx_run() will use the data here to build a vmcs02: a VMCS for the ++ * nested_vmx_run() will use the data here to build the vmcs02: a VMCS for the + * underlying hardware which will be used to run L2. + * This structure is packed to ensure that its layout is identical across + * machines (necessary for live migration). +@@ -406,13 +405,6 @@ struct __packed vmcs12 { + */ + #define VMCS12_SIZE 0x1000 + +-/* Used to remember the last vmcs02 used for some recently used vmcs12s */ +-struct vmcs02_list { +- struct list_head list; +- gpa_t vmptr; +- struct loaded_vmcs vmcs02; +-}; +- + /* + * The nested_vmx structure is part of vcpu_vmx, and holds information we need + * for correct emulation of VMX (i.e., nested VMX) on this vcpu. +@@ -437,15 +429,15 @@ struct nested_vmx { + */ + bool sync_shadow_vmcs; + +- /* vmcs02_list cache of VMCSs recently used to run L2 guests */ +- struct list_head vmcs02_pool; +- int vmcs02_num; + bool change_vmcs01_virtual_x2apic_mode; + /* L2 must run next, and mustn't decide to exit to L1. */ + bool nested_run_pending; ++ ++ struct loaded_vmcs vmcs02; ++ + /* +- * Guest pages referred to in vmcs02 with host-physical pointers, so +- * we must keep them pinned while L2 runs. ++ * Guest pages referred to in the vmcs02 with host-physical ++ * pointers, so we must keep them pinned while L2 runs. + */ + struct page *apic_access_page; + struct page *virtual_apic_page; +@@ -6964,94 +6956,6 @@ static int handle_monitor(struct kvm_vcp + } + + /* +- * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12. +- * We could reuse a single VMCS for all the L2 guests, but we also want the +- * option to allocate a separate vmcs02 for each separate loaded vmcs12 - this +- * allows keeping them loaded on the processor, and in the future will allow +- * optimizations where prepare_vmcs02 doesn't need to set all the fields on +- * every entry if they never change. +- * So we keep, in vmx->nested.vmcs02_pool, a cache of size VMCS02_POOL_SIZE +- * (>=0) with a vmcs02 for each recently loaded vmcs12s, most recent first. +- * +- * The following functions allocate and free a vmcs02 in this pool. +- */ +- +-/* Get a VMCS from the pool to use as vmcs02 for the current vmcs12. */ +-static struct loaded_vmcs *nested_get_current_vmcs02(struct vcpu_vmx *vmx) +-{ +- struct vmcs02_list *item; +- list_for_each_entry(item, &vmx->nested.vmcs02_pool, list) +- if (item->vmptr == vmx->nested.current_vmptr) { +- list_move(&item->list, &vmx->nested.vmcs02_pool); +- return &item->vmcs02; +- } +- +- if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE, 1)) { +- /* Recycle the least recently used VMCS. */ +- item = list_last_entry(&vmx->nested.vmcs02_pool, +- struct vmcs02_list, list); +- item->vmptr = vmx->nested.current_vmptr; +- list_move(&item->list, &vmx->nested.vmcs02_pool); +- return &item->vmcs02; +- } +- +- /* Create a new VMCS */ +- item = kzalloc(sizeof(struct vmcs02_list), GFP_KERNEL); +- if (!item) +- return NULL; +- item->vmcs02.vmcs = alloc_vmcs(); +- item->vmcs02.shadow_vmcs = NULL; +- if (!item->vmcs02.vmcs) { +- kfree(item); +- return NULL; +- } +- loaded_vmcs_init(&item->vmcs02); +- item->vmptr = vmx->nested.current_vmptr; +- list_add(&(item->list), &(vmx->nested.vmcs02_pool)); +- vmx->nested.vmcs02_num++; +- return &item->vmcs02; +-} +- +-/* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */ +-static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr) +-{ +- struct vmcs02_list *item; +- list_for_each_entry(item, &vmx->nested.vmcs02_pool, list) +- if (item->vmptr == vmptr) { +- free_loaded_vmcs(&item->vmcs02); +- list_del(&item->list); +- kfree(item); +- vmx->nested.vmcs02_num--; +- return; +- } +-} +- +-/* +- * Free all VMCSs saved for this vcpu, except the one pointed by +- * vmx->loaded_vmcs. We must be running L1, so vmx->loaded_vmcs +- * must be &vmx->vmcs01. +- */ +-static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx) +-{ +- struct vmcs02_list *item, *n; +- +- WARN_ON(vmx->loaded_vmcs != &vmx->vmcs01); +- list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) { +- /* +- * Something will leak if the above WARN triggers. Better than +- * a use-after-free. +- */ +- if (vmx->loaded_vmcs == &item->vmcs02) +- continue; +- +- free_loaded_vmcs(&item->vmcs02); +- list_del(&item->list); +- kfree(item); +- vmx->nested.vmcs02_num--; +- } +-} +- +-/* + * The following 3 functions, nested_vmx_succeed()/failValid()/failInvalid(), + * set the success or error code of an emulated VMX instruction, as specified + * by Vol 2B, VMX Instruction Reference, "Conventions". +@@ -7232,6 +7136,12 @@ static int enter_vmx_operation(struct kv + struct vcpu_vmx *vmx = to_vmx(vcpu); + struct vmcs *shadow_vmcs; + ++ vmx->nested.vmcs02.vmcs = alloc_vmcs(); ++ vmx->nested.vmcs02.shadow_vmcs = NULL; ++ if (!vmx->nested.vmcs02.vmcs) ++ goto out_vmcs02; ++ loaded_vmcs_init(&vmx->nested.vmcs02); ++ + if (cpu_has_vmx_msr_bitmap()) { + vmx->nested.msr_bitmap = + (unsigned long *)__get_free_page(GFP_KERNEL); +@@ -7254,9 +7164,6 @@ static int enter_vmx_operation(struct kv + vmx->vmcs01.shadow_vmcs = shadow_vmcs; + } + +- INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool)); +- vmx->nested.vmcs02_num = 0; +- + hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC, + HRTIMER_MODE_REL_PINNED); + vmx->nested.preemption_timer.function = vmx_preemption_timer_fn; +@@ -7271,6 +7178,9 @@ out_cached_vmcs12: + free_page((unsigned long)vmx->nested.msr_bitmap); + + out_msr_bitmap: ++ free_loaded_vmcs(&vmx->nested.vmcs02); ++ ++out_vmcs02: + return -ENOMEM; + } + +@@ -7423,7 +7333,7 @@ static void free_nested(struct vcpu_vmx + vmx->vmcs01.shadow_vmcs = NULL; + } + kfree(vmx->nested.cached_vmcs12); +- /* Unpin physical memory we referred to in current vmcs02 */ ++ /* Unpin physical memory we referred to in the vmcs02 */ + if (vmx->nested.apic_access_page) { + kvm_release_page_dirty(vmx->nested.apic_access_page); + vmx->nested.apic_access_page = NULL; +@@ -7439,7 +7349,7 @@ static void free_nested(struct vcpu_vmx + vmx->nested.pi_desc = NULL; + } + +- nested_free_all_saved_vmcss(vmx); ++ free_loaded_vmcs(&vmx->nested.vmcs02); + } + + /* Emulate the VMXOFF instruction */ +@@ -7482,8 +7392,6 @@ static int handle_vmclear(struct kvm_vcp + vmptr + offsetof(struct vmcs12, launch_state), + &zero, sizeof(zero)); + +- nested_free_vmcs02(vmx, vmptr); +- + nested_vmx_succeed(vcpu); + return kvm_skip_emulated_instruction(vcpu); + } +@@ -8395,10 +8303,11 @@ static bool nested_vmx_exit_reflected(st + + /* + * The host physical addresses of some pages of guest memory +- * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU +- * may write to these pages via their host physical address while +- * L2 is running, bypassing any address-translation-based dirty +- * tracking (e.g. EPT write protection). ++ * are loaded into the vmcs02 (e.g. vmcs12's Virtual APIC ++ * Page). The CPU may write to these pages via their host ++ * physical address while L2 is running, bypassing any ++ * address-translation-based dirty tracking (e.g. EPT write ++ * protection). + * + * Mark them dirty on every exit from L2 to prevent them from + * getting out of sync with dirty tracking. +@@ -10894,20 +10803,15 @@ static int enter_vmx_non_root_mode(struc + { + struct vcpu_vmx *vmx = to_vmx(vcpu); + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); +- struct loaded_vmcs *vmcs02; + u32 msr_entry_idx; + u32 exit_qual; + +- vmcs02 = nested_get_current_vmcs02(vmx); +- if (!vmcs02) +- return -ENOMEM; +- + enter_guest_mode(vcpu); + + if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) + vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); + +- vmx_switch_vmcs(vcpu, vmcs02); ++ vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02); + vmx_segment_cache_clear(vmx); + + if (prepare_vmcs02(vcpu, vmcs12, from_vmentry, &exit_qual)) { +@@ -11522,10 +11426,6 @@ static void nested_vmx_vmexit(struct kvm + vm_exit_controls_reset_shadow(vmx); + vmx_segment_cache_clear(vmx); + +- /* if no vmcs02 cache requested, remove the one we used */ +- if (VMCS02_POOL_SIZE == 0) +- nested_free_vmcs02(vmx, vmx->nested.current_vmptr); +- + /* Update any VMCS fields that might have changed while L2 ran */ + vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr); + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.nr); diff --git a/queue-4.14/KVMx86_Add_IBPB_support.patch b/queue-4.14/KVMx86_Add_IBPB_support.patch new file mode 100644 index 00000000000..23531f75da2 --- /dev/null +++ b/queue-4.14/KVMx86_Add_IBPB_support.patch @@ -0,0 +1,322 @@ +Subject: KVM/x86: Add IBPB support +From: Ashok Raj ashok.raj@intel.com +Date: Thu Feb 1 22:59:43 2018 +0100 + +From: Ashok Raj ashok.raj@intel.com + +commit 15d45071523d89b3fb7372e2135fbd72f6af9506 + +The Indirect Branch Predictor Barrier (IBPB) is an indirect branch +control mechanism. It keeps earlier branches from influencing +later ones. + +Unlike IBRS and STIBP, IBPB does not define a new mode of operation. +It's a command that ensures predicted branch targets aren't used after +the barrier. Although IBRS and IBPB are enumerated by the same CPUID +enumeration, IBPB is very different. + +IBPB helps mitigate against three potential attacks: + +* Mitigate guests from being attacked by other guests. + - This is addressed by issing IBPB when we do a guest switch. + +* Mitigate attacks from guest/ring3->host/ring3. + These would require a IBPB during context switch in host, or after + VMEXIT. The host process has two ways to mitigate + - Either it can be compiled with retpoline + - If its going through context switch, and has set !dumpable then + there is a IBPB in that path. + (Tim's patch: https://patchwork.kernel.org/patch/10192871) + - The case where after a VMEXIT you return back to Qemu might make + Qemu attackable from guest when Qemu isn't compiled with retpoline. + There are issues reported when doing IBPB on every VMEXIT that resulted + in some tsc calibration woes in guest. + +* Mitigate guest/ring0->host/ring0 attacks. + When host kernel is using retpoline it is safe against these attacks. + If host kernel isn't using retpoline we might need to do a IBPB flush on + every VMEXIT. + +Even when using retpoline for indirect calls, in certain conditions 'ret' +can use the BTB on Skylake-era CPUs. There are other mitigations +available like RSB stuffing/clearing. + +* IBPB is issued only for SVM during svm_free_vcpu(). + VMX has a vmclear and SVM doesn't. Follow discussion here: + https://lkml.org/lkml/2018/1/15/146 + +Please refer to the following spec for more details on the enumeration +and control. + +Refer here to get documentation about mitigations. + +https://software.intel.com/en-us/side-channel-security-support + +[peterz: rebase and changelog rewrite] +[karahmed: - rebase + - vmx: expose PRED_CMD if guest has it in CPUID + - svm: only pass through IBPB if guest has it in CPUID + - vmx: support !cpu_has_vmx_msr_bitmap()] + - vmx: support nested] +[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS) + PRED_CMD is a write-only MSR] + +Signed-off-by: Ashok Raj +Signed-off-by: Peter Zijlstra (Intel) +Signed-off-by: David Woodhouse +Signed-off-by: KarimAllah Ahmed +Signed-off-by: Thomas Gleixner +Reviewed-by: Konrad Rzeszutek Wilk +Cc: Andrea Arcangeli +Cc: Andi Kleen +Cc: kvm@vger.kernel.org +Cc: Asit Mallick +Cc: Linus Torvalds +Cc: Andy Lutomirski +Cc: Dave Hansen +Cc: Arjan Van De Ven +Cc: Greg KH +Cc: Jun Nakajima +Cc: Paolo Bonzini +Cc: Dan Williams +Cc: Tim Chen +Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com +Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon.de +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/kvm/cpuid.c | 11 ++++++- + arch/x86/kvm/svm.c | 28 +++++++++++++++++ + arch/x86/kvm/vmx.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++-- + 3 files changed, 116 insertions(+), 3 deletions(-) + +--- a/arch/x86/kvm/cpuid.c ++++ b/arch/x86/kvm/cpuid.c +@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct + F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) | + 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM); + ++ /* cpuid 0x80000008.ebx */ ++ const u32 kvm_cpuid_8000_0008_ebx_x86_features = ++ F(IBPB); ++ + /* cpuid 0xC0000001.edx */ + const u32 kvm_cpuid_C000_0001_edx_x86_features = + F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) | +@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct + if (!g_phys_as) + g_phys_as = phys_as; + entry->eax = g_phys_as | (virt_as << 8); +- entry->ebx = entry->edx = 0; ++ entry->edx = 0; ++ /* IBPB isn't necessarily present in hardware cpuid */ ++ if (boot_cpu_has(X86_FEATURE_IBPB)) ++ entry->ebx |= F(IBPB); ++ entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; ++ cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); + break; + } + case 0x80000019: +--- a/arch/x86/kvm/svm.c ++++ b/arch/x86/kvm/svm.c +@@ -249,6 +249,7 @@ static const struct svm_direct_access_ms + { .index = MSR_CSTAR, .always = true }, + { .index = MSR_SYSCALL_MASK, .always = true }, + #endif ++ { .index = MSR_IA32_PRED_CMD, .always = false }, + { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, + { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, + { .index = MSR_IA32_LASTINTFROMIP, .always = false }, +@@ -529,6 +530,7 @@ struct svm_cpu_data { + struct kvm_ldttss_desc *tss_desc; + + struct page *save_area; ++ struct vmcb *current_vmcb; + }; + + static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); +@@ -1706,11 +1708,17 @@ static void svm_free_vcpu(struct kvm_vcp + __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); + kvm_vcpu_uninit(vcpu); + kmem_cache_free(kvm_vcpu_cache, svm); ++ /* ++ * The vmcb page can be recycled, causing a false negative in ++ * svm_vcpu_load(). So do a full IBPB now. ++ */ ++ indirect_branch_prediction_barrier(); + } + + static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) + { + struct vcpu_svm *svm = to_svm(vcpu); ++ struct svm_cpu_data *sd = per_cpu(svm_data, cpu); + int i; + + if (unlikely(cpu != vcpu->cpu)) { +@@ -1739,6 +1747,10 @@ static void svm_vcpu_load(struct kvm_vcp + if (static_cpu_has(X86_FEATURE_RDTSCP)) + wrmsrl(MSR_TSC_AUX, svm->tsc_aux); + ++ if (sd->current_vmcb != svm->vmcb) { ++ sd->current_vmcb = svm->vmcb; ++ indirect_branch_prediction_barrier(); ++ } + avic_vcpu_load(vcpu, cpu); + } + +@@ -3670,6 +3682,22 @@ static int svm_set_msr(struct kvm_vcpu * + case MSR_IA32_TSC: + kvm_write_tsc(vcpu, msr); + break; ++ case MSR_IA32_PRED_CMD: ++ if (!msr->host_initiated && ++ !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) ++ return 1; ++ ++ if (data & ~PRED_CMD_IBPB) ++ return 1; ++ ++ if (!data) ++ break; ++ ++ wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); ++ if (is_guest_mode(vcpu)) ++ break; ++ set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1); ++ break; + case MSR_STAR: + svm->vmcb->save.star = data; + break; +--- a/arch/x86/kvm/vmx.c ++++ b/arch/x86/kvm/vmx.c +@@ -582,6 +582,7 @@ struct vcpu_vmx { + u64 msr_host_kernel_gs_base; + u64 msr_guest_kernel_gs_base; + #endif ++ + u32 vm_entry_controls_shadow; + u32 vm_exit_controls_shadow; + u32 secondary_exec_control; +@@ -926,6 +927,8 @@ static void vmx_set_nmi_mask(struct kvm_ + static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, + u16 error_code); + static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); ++static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, ++ u32 msr, int type); + + static DEFINE_PER_CPU(struct vmcs *, vmxarea); + static DEFINE_PER_CPU(struct vmcs *, current_vmcs); +@@ -1900,6 +1903,29 @@ static void update_exception_bitmap(stru + vmcs_write32(EXCEPTION_BITMAP, eb); + } + ++/* ++ * Check if MSR is intercepted for L01 MSR bitmap. ++ */ ++static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr) ++{ ++ unsigned long *msr_bitmap; ++ int f = sizeof(unsigned long); ++ ++ if (!cpu_has_vmx_msr_bitmap()) ++ return true; ++ ++ msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap; ++ ++ if (msr <= 0x1fff) { ++ return !!test_bit(msr, msr_bitmap + 0x800 / f); ++ } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) { ++ msr &= 0x1fff; ++ return !!test_bit(msr, msr_bitmap + 0xc00 / f); ++ } ++ ++ return true; ++} ++ + static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx, + unsigned long entry, unsigned long exit) + { +@@ -2278,6 +2304,7 @@ static void vmx_vcpu_load(struct kvm_vcp + if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) { + per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; + vmcs_load(vmx->loaded_vmcs->vmcs); ++ indirect_branch_prediction_barrier(); + } + + if (!already_loaded) { +@@ -3337,6 +3364,34 @@ static int vmx_set_msr(struct kvm_vcpu * + case MSR_IA32_TSC: + kvm_write_tsc(vcpu, msr_info); + break; ++ case MSR_IA32_PRED_CMD: ++ if (!msr_info->host_initiated && ++ !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) && ++ !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) ++ return 1; ++ ++ if (data & ~PRED_CMD_IBPB) ++ return 1; ++ ++ if (!data) ++ break; ++ ++ wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); ++ ++ /* ++ * For non-nested: ++ * When it's written (to non-zero) for the first time, pass ++ * it through. ++ * ++ * For nested: ++ * The handling of the MSR bitmap for L2 guests is done in ++ * nested_vmx_merge_msr_bitmap. We should not touch the ++ * vmcs02.msr_bitmap here since it gets completely overwritten ++ * in the merging. ++ */ ++ vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, ++ MSR_TYPE_W); ++ break; + case MSR_IA32_CR_PAT: + if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { + if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) +@@ -10038,9 +10093,23 @@ static inline bool nested_vmx_merge_msr_ + struct page *page; + unsigned long *msr_bitmap_l1; + unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; ++ /* ++ * pred_cmd is trying to verify two things: ++ * ++ * 1. L0 gave a permission to L1 to actually passthrough the MSR. This ++ * ensures that we do not accidentally generate an L02 MSR bitmap ++ * from the L12 MSR bitmap that is too permissive. ++ * 2. That L1 or L2s have actually used the MSR. This avoids ++ * unnecessarily merging of the bitmap if the MSR is unused. This ++ * works properly because we only update the L01 MSR bitmap lazily. ++ * So even if L0 should pass L1 these MSRs, the L01 bitmap is only ++ * updated to reflect this when L1 (or its L2s) actually write to ++ * the MSR. ++ */ ++ bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD); + +- /* This shortcut is ok because we support only x2APIC MSRs so far. */ +- if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) ++ if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && ++ !pred_cmd) + return false; + + page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap); +@@ -10073,6 +10142,13 @@ static inline bool nested_vmx_merge_msr_ + MSR_TYPE_W); + } + } ++ ++ if (pred_cmd) ++ nested_vmx_disable_intercept_for_msr( ++ msr_bitmap_l1, msr_bitmap_l0, ++ MSR_IA32_PRED_CMD, ++ MSR_TYPE_W); ++ + kunmap(page); + kvm_release_page_clean(page); + diff --git a/queue-4.14/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch b/queue-4.14/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch new file mode 100644 index 00000000000..80fe6e32539 --- /dev/null +++ b/queue-4.14/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch @@ -0,0 +1,67 @@ +Subject: KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX +From: KarimAllah Ahmed karahmed@amazon.de +Date: Thu Feb 1 22:59:42 2018 +0100 + +From: KarimAllah Ahmed karahmed@amazon.de + +commit b7b27aa011a1df42728d1768fc181d9ce69e6911 + +[dwmw2: Stop using KF() for bits in it, too] +Signed-off-by: KarimAllah Ahmed +Signed-off-by: David Woodhouse +Signed-off-by: Thomas Gleixner +Reviewed-by: Paolo Bonzini +Reviewed-by: Konrad Rzeszutek Wilk +Reviewed-by: Jim Mattson +Cc: kvm@vger.kernel.org +Cc: Radim Krčmář +Link: https://lkml.kernel.org/r/1517522386-18410-2-git-send-email-karahmed@amazon.de +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/kvm/cpuid.c | 8 +++----- + arch/x86/kvm/cpuid.h | 1 + + 2 files changed, 4 insertions(+), 5 deletions(-) + +--- a/arch/x86/kvm/cpuid.c ++++ b/arch/x86/kvm/cpuid.c +@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void) + + #define F(x) bit(X86_FEATURE_##x) + +-/* These are scattered features in cpufeatures.h. */ +-#define KVM_CPUID_BIT_AVX512_4VNNIW 2 +-#define KVM_CPUID_BIT_AVX512_4FMAPS 3 ++/* For scattered features from cpufeatures.h; we currently expose none */ + #define KF(x) bit(KVM_CPUID_BIT_##x) + + int kvm_update_cpuid(struct kvm_vcpu *vcpu) +@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct + + /* cpuid 7.0.edx*/ + const u32 kvm_cpuid_7_0_edx_x86_features = +- KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); ++ F(AVX512_4VNNIW) | F(AVX512_4FMAPS); + + /* all calls to cpuid_count() should be made on the same cpu */ + get_cpu(); +@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct + if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) + entry->ecx &= ~F(PKU); + entry->edx &= kvm_cpuid_7_0_edx_x86_features; +- entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX); ++ cpuid_mask(&entry->edx, CPUID_7_EDX); + } else { + entry->ebx = 0; + entry->ecx = 0; +--- a/arch/x86/kvm/cpuid.h ++++ b/arch/x86/kvm/cpuid.h +@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cp + [CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX}, + [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, + [CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX}, ++ [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, + }; + + static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) diff --git a/queue-4.14/array_index_nospec_Sanitize_speculative_array_de-references.patch b/queue-4.14/array_index_nospec_Sanitize_speculative_array_de-references.patch new file mode 100644 index 00000000000..ecddc6d9db2 --- /dev/null +++ b/queue-4.14/array_index_nospec_Sanitize_speculative_array_de-references.patch @@ -0,0 +1,115 @@ +Subject: array_index_nospec: Sanitize speculative array de-references +From: Dan Williams dan.j.williams@intel.com +Date: Mon Jan 29 17:02:22 2018 -0800 + +From: Dan Williams dan.j.williams@intel.com + +commit f3804203306e098dae9ca51540fcd5eb700d7f40 + +array_index_nospec() is proposed as a generic mechanism to mitigate +against Spectre-variant-1 attacks, i.e. an attack that bypasses boundary +checks via speculative execution. The array_index_nospec() +implementation is expected to be safe for current generation CPUs across +multiple architectures (ARM, x86). + +Based on an original implementation by Linus Torvalds, tweaked to remove +speculative flows by Alexei Starovoitov, and tweaked again by Linus to +introduce an x86 assembly implementation for the mask generation. + +Co-developed-by: Linus Torvalds +Co-developed-by: Alexei Starovoitov +Suggested-by: Cyril Novikov +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Cc: linux-arch@vger.kernel.org +Cc: kernel-hardening@lists.openwall.com +Cc: Peter Zijlstra +Cc: Catalin Marinas +Cc: Will Deacon +Cc: Russell King +Cc: gregkh@linuxfoundation.org +Cc: torvalds@linux-foundation.org +Cc: alan@linux.intel.com +Link: https://lkml.kernel.org/r/151727414229.33451.18411580953862676575.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + include/linux/nospec.h | 72 +++++++++++++++++++++++++++++++++++++++++++++++++ + 1 file changed, 72 insertions(+) + +--- /dev/null ++++ b/include/linux/nospec.h +@@ -0,0 +1,72 @@ ++// SPDX-License-Identifier: GPL-2.0 ++// Copyright(c) 2018 Linus Torvalds. All rights reserved. ++// Copyright(c) 2018 Alexei Starovoitov. All rights reserved. ++// Copyright(c) 2018 Intel Corporation. All rights reserved. ++ ++#ifndef _LINUX_NOSPEC_H ++#define _LINUX_NOSPEC_H ++ ++/** ++ * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 otherwise ++ * @index: array element index ++ * @size: number of elements in array ++ * ++ * When @index is out of bounds (@index >= @size), the sign bit will be ++ * set. Extend the sign bit to all bits and invert, giving a result of ++ * zero for an out of bounds index, or ~0 if within bounds [0, @size). ++ */ ++#ifndef array_index_mask_nospec ++static inline unsigned long array_index_mask_nospec(unsigned long index, ++ unsigned long size) ++{ ++ /* ++ * Warn developers about inappropriate array_index_nospec() usage. ++ * ++ * Even if the CPU speculates past the WARN_ONCE branch, the ++ * sign bit of @index is taken into account when generating the ++ * mask. ++ * ++ * This warning is compiled out when the compiler can infer that ++ * @index and @size are less than LONG_MAX. ++ */ ++ if (WARN_ONCE(index > LONG_MAX || size > LONG_MAX, ++ "array_index_nospec() limited to range of [0, LONG_MAX]\n")) ++ return 0; ++ ++ /* ++ * Always calculate and emit the mask even if the compiler ++ * thinks the mask is not needed. The compiler does not take ++ * into account the value of @index under speculation. ++ */ ++ OPTIMIZER_HIDE_VAR(index); ++ return ~(long)(index | (size - 1UL - index)) >> (BITS_PER_LONG - 1); ++} ++#endif ++ ++/* ++ * array_index_nospec - sanitize an array index after a bounds check ++ * ++ * For a code sequence like: ++ * ++ * if (index < size) { ++ * index = array_index_nospec(index, size); ++ * val = array[index]; ++ * } ++ * ++ * ...if the CPU speculates past the bounds check then ++ * array_index_nospec() will clamp the index within the range of [0, ++ * size). ++ */ ++#define array_index_nospec(index, size) \ ++({ \ ++ typeof(index) _i = (index); \ ++ typeof(size) _s = (size); \ ++ unsigned long _mask = array_index_mask_nospec(_i, _s); \ ++ \ ++ BUILD_BUG_ON(sizeof(_i) > sizeof(long)); \ ++ BUILD_BUG_ON(sizeof(_s) > sizeof(long)); \ ++ \ ++ _i &= _mask; \ ++ _i; \ ++}) ++#endif /* _LINUX_NOSPEC_H */ diff --git a/queue-4.14/nl80211_Sanitize_array_index_in_parse_txq_params.patch b/queue-4.14/nl80211_Sanitize_array_index_in_parse_txq_params.patch new file mode 100644 index 00000000000..9bf5b8ace49 --- /dev/null +++ b/queue-4.14/nl80211_Sanitize_array_index_in_parse_txq_params.patch @@ -0,0 +1,71 @@ +Subject: nl80211: Sanitize array index in parse_txq_params +From: Dan Williams dan.j.williams@intel.com +Date: Mon Jan 29 17:03:15 2018 -0800 + +From: Dan Williams dan.j.williams@intel.com + +commit 259d8c1e984318497c84eef547bbb6b1d9f4eb05 + +Wireless drivers rely on parse_txq_params to validate that txq_params->ac +is less than NL80211_NUM_ACS by the time the low-level driver's ->conf_tx() +handler is called. Use a new helper, array_index_nospec(), to sanitize +txq_params->ac with respect to speculation. I.e. ensure that any +speculation into ->conf_tx() handlers is done with a value of +txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS). + +Reported-by: Christian Lamparter +Reported-by: Elena Reshetova +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Acked-by: Johannes Berg +Cc: linux-arch@vger.kernel.org +Cc: kernel-hardening@lists.openwall.com +Cc: gregkh@linuxfoundation.org +Cc: linux-wireless@vger.kernel.org +Cc: torvalds@linux-foundation.org +Cc: "David S. Miller" +Cc: alan@linux.intel.com +Link: https://lkml.kernel.org/r/151727419584.33451.7700736761686184303.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + net/wireless/nl80211.c | 9 ++++++--- + 1 file changed, 6 insertions(+), 3 deletions(-) + +--- a/net/wireless/nl80211.c ++++ b/net/wireless/nl80211.c +@@ -16,6 +16,7 @@ + #include + #include + #include ++#include + #include + #include + #include +@@ -2056,20 +2057,22 @@ static const struct nla_policy txq_param + static int parse_txq_params(struct nlattr *tb[], + struct ieee80211_txq_params *txq_params) + { ++ u8 ac; ++ + if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] || + !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] || + !tb[NL80211_TXQ_ATTR_AIFS]) + return -EINVAL; + +- txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]); ++ ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]); + txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]); + txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]); + txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]); + txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]); + +- if (txq_params->ac >= NL80211_NUM_ACS) ++ if (ac >= NL80211_NUM_ACS) + return -EINVAL; +- ++ txq_params->ac = array_index_nospec(ac, NL80211_NUM_ACS); + return 0; + } + diff --git a/queue-4.14/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch b/queue-4.14/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch new file mode 100644 index 00000000000..4c284abeb2f --- /dev/null +++ b/queue-4.14/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch @@ -0,0 +1,126 @@ +Subject: objtool: Add support for alternatives at the end of a section +From: Josh Poimboeuf jpoimboe@redhat.com +Date: Mon Jan 29 22:00:40 2018 -0600 + +From: Josh Poimboeuf jpoimboe@redhat.com + +commit 17bc33914bcc98ba3c6b426fd1c49587a25c0597 + +Now that the previous patch gave objtool the ability to read retpoline +alternatives, it shows a new warning: + + arch/x86/entry/entry_64.o: warning: objtool: .entry_trampoline: don't know how to handle alternatives at end of section + +This is due to the JMP_NOSPEC in entry_SYSCALL_64_trampoline(). + +Previously, objtool ignored this situation because it wasn't needed, and +it would have required a bit of extra code. Now that this case exists, +add proper support for it. + +Signed-off-by: Josh Poimboeuf +Cc: Andy Lutomirski +Cc: Borislav Petkov +Cc: Dave Hansen +Cc: David Woodhouse +Cc: Greg Kroah-Hartman +Cc: Guenter Roeck +Cc: H. Peter Anvin +Cc: Juergen Gross +Cc: Linus Torvalds +Cc: Peter Zijlstra +Cc: Thomas Gleixner +Link: http://lkml.kernel.org/r/2a30a3c2158af47d891a76e69bb1ef347e0443fd.1517284349.git.jpoimboe@redhat.com +Signed-off-by: Ingo Molnar +Signed-off-by: Greg Kroah-Hartman + +--- + tools/objtool/check.c | 53 +++++++++++++++++++++++++++++--------------------- + 1 file changed, 31 insertions(+), 22 deletions(-) + +--- a/tools/objtool/check.c ++++ b/tools/objtool/check.c +@@ -594,7 +594,7 @@ static int handle_group_alt(struct objto + struct instruction *orig_insn, + struct instruction **new_insn) + { +- struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump; ++ struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump = NULL; + unsigned long dest_off; + + last_orig_insn = NULL; +@@ -610,28 +610,30 @@ static int handle_group_alt(struct objto + last_orig_insn = insn; + } + +- if (!next_insn_same_sec(file, last_orig_insn)) { +- WARN("%s: don't know how to handle alternatives at end of section", +- special_alt->orig_sec->name); +- return -1; +- } +- +- fake_jump = malloc(sizeof(*fake_jump)); +- if (!fake_jump) { +- WARN("malloc failed"); +- return -1; ++ if (next_insn_same_sec(file, last_orig_insn)) { ++ fake_jump = malloc(sizeof(*fake_jump)); ++ if (!fake_jump) { ++ WARN("malloc failed"); ++ return -1; ++ } ++ memset(fake_jump, 0, sizeof(*fake_jump)); ++ INIT_LIST_HEAD(&fake_jump->alts); ++ clear_insn_state(&fake_jump->state); ++ ++ fake_jump->sec = special_alt->new_sec; ++ fake_jump->offset = -1; ++ fake_jump->type = INSN_JUMP_UNCONDITIONAL; ++ fake_jump->jump_dest = list_next_entry(last_orig_insn, list); ++ fake_jump->ignore = true; + } +- memset(fake_jump, 0, sizeof(*fake_jump)); +- INIT_LIST_HEAD(&fake_jump->alts); +- clear_insn_state(&fake_jump->state); +- +- fake_jump->sec = special_alt->new_sec; +- fake_jump->offset = -1; +- fake_jump->type = INSN_JUMP_UNCONDITIONAL; +- fake_jump->jump_dest = list_next_entry(last_orig_insn, list); +- fake_jump->ignore = true; + + if (!special_alt->new_len) { ++ if (!fake_jump) { ++ WARN("%s: empty alternative at end of section", ++ special_alt->orig_sec->name); ++ return -1; ++ } ++ + *new_insn = fake_jump; + return 0; + } +@@ -654,8 +656,14 @@ static int handle_group_alt(struct objto + continue; + + dest_off = insn->offset + insn->len + insn->immediate; +- if (dest_off == special_alt->new_off + special_alt->new_len) ++ if (dest_off == special_alt->new_off + special_alt->new_len) { ++ if (!fake_jump) { ++ WARN("%s: alternative jump to end of section", ++ special_alt->orig_sec->name); ++ return -1; ++ } + insn->jump_dest = fake_jump; ++ } + + if (!insn->jump_dest) { + WARN_FUNC("can't find alternative jump destination", +@@ -670,7 +678,8 @@ static int handle_group_alt(struct objto + return -1; + } + +- list_add(&fake_jump->list, &last_new_insn->list); ++ if (fake_jump) ++ list_add(&fake_jump->list, &last_new_insn->list); + + return 0; + } diff --git a/queue-4.14/objtool_Improve_retpoline_alternative_handling.patch b/queue-4.14/objtool_Improve_retpoline_alternative_handling.patch new file mode 100644 index 00000000000..b83517820db --- /dev/null +++ b/queue-4.14/objtool_Improve_retpoline_alternative_handling.patch @@ -0,0 +1,123 @@ +Subject: objtool: Improve retpoline alternative handling +From: Josh Poimboeuf jpoimboe@redhat.com +Date: Mon Jan 29 22:00:39 2018 -0600 + +From: Josh Poimboeuf jpoimboe@redhat.com + +commit a845c7cf4b4cb5e9e3b2823867892b27646f3a98 + +Currently objtool requires all retpolines to be: + + a) patched in with alternatives; and + + b) annotated with ANNOTATE_NOSPEC_ALTERNATIVE. + +If you forget to do both of the above, objtool segfaults trying to +dereference a NULL 'insn->call_dest' pointer. + +Avoid that situation and print a more helpful error message: + + quirks.o: warning: objtool: efi_delete_dummy_variable()+0x99: unsupported intra-function call + quirks.o: warning: objtool: If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE. + +Future improvements can be made to make objtool smarter with respect to +retpolines, but this is a good incremental improvement for now. + +Reported-and-tested-by: Guenter Roeck +Signed-off-by: Josh Poimboeuf +Cc: Andy Lutomirski +Cc: Borislav Petkov +Cc: Dave Hansen +Cc: David Woodhouse +Cc: Greg Kroah-Hartman +Cc: H. Peter Anvin +Cc: Juergen Gross +Cc: Linus Torvalds +Cc: Peter Zijlstra +Cc: Thomas Gleixner +Link: http://lkml.kernel.org/r/819e50b6d9c2e1a22e34c1a636c0b2057cc8c6e5.1517284349.git.jpoimboe@redhat.com +Signed-off-by: Ingo Molnar +Signed-off-by: Greg Kroah-Hartman + +--- + tools/objtool/check.c | 36 ++++++++++++++++-------------------- + 1 file changed, 16 insertions(+), 20 deletions(-) + +--- a/tools/objtool/check.c ++++ b/tools/objtool/check.c +@@ -543,18 +543,14 @@ static int add_call_destinations(struct + dest_off = insn->offset + insn->len + insn->immediate; + insn->call_dest = find_symbol_by_offset(insn->sec, + dest_off); +- /* +- * FIXME: Thanks to retpolines, it's now considered +- * normal for a function to call within itself. So +- * disable this warning for now. +- */ +-#if 0 +- if (!insn->call_dest) { +- WARN_FUNC("can't find call dest symbol at offset 0x%lx", +- insn->sec, insn->offset, dest_off); ++ ++ if (!insn->call_dest && !insn->ignore) { ++ WARN_FUNC("unsupported intra-function call", ++ insn->sec, insn->offset); ++ WARN("If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE."); + return -1; + } +-#endif ++ + } else if (rela->sym->type == STT_SECTION) { + insn->call_dest = find_symbol_by_offset(rela->sym->sec, + rela->addend+4); +@@ -648,6 +644,8 @@ static int handle_group_alt(struct objto + + last_new_insn = insn; + ++ insn->ignore = orig_insn->ignore_alts; ++ + if (insn->type != INSN_JUMP_CONDITIONAL && + insn->type != INSN_JUMP_UNCONDITIONAL) + continue; +@@ -729,10 +727,6 @@ static int add_special_section_alts(stru + goto out; + } + +- /* Ignore retpoline alternatives. */ +- if (orig_insn->ignore_alts) +- continue; +- + new_insn = NULL; + if (!special_alt->group || special_alt->new_len) { + new_insn = find_insn(file, special_alt->new_sec, +@@ -1089,11 +1083,11 @@ static int decode_sections(struct objtoo + if (ret) + return ret; + +- ret = add_call_destinations(file); ++ ret = add_special_section_alts(file); + if (ret) + return ret; + +- ret = add_special_section_alts(file); ++ ret = add_call_destinations(file); + if (ret) + return ret; + +@@ -1720,10 +1714,12 @@ static int validate_branch(struct objtoo + + insn->visited = true; + +- list_for_each_entry(alt, &insn->alts, list) { +- ret = validate_branch(file, alt->insn, state); +- if (ret) +- return 1; ++ if (!insn->ignore_alts) { ++ list_for_each_entry(alt, &insn->alts, list) { ++ ret = validate_branch(file, alt->insn, state); ++ if (ret) ++ return 1; ++ } + } + + switch (insn->type) { diff --git a/queue-4.14/objtool_Warn_on_stripped_section_symbol.patch b/queue-4.14/objtool_Warn_on_stripped_section_symbol.patch new file mode 100644 index 00000000000..39f4cbf0662 --- /dev/null +++ b/queue-4.14/objtool_Warn_on_stripped_section_symbol.patch @@ -0,0 +1,50 @@ +Subject: objtool: Warn on stripped section symbol +From: Josh Poimboeuf jpoimboe@redhat.com +Date: Mon Jan 29 22:00:41 2018 -0600 + +From: Josh Poimboeuf jpoimboe@redhat.com + +commit 830c1e3d16b2c1733cd1ec9c8f4d47a398ae31bc + +With the following fix: + + 2a0098d70640 ("objtool: Fix seg fault with gold linker") + +... a seg fault was avoided, but the original seg fault condition in +objtool wasn't fixed. Replace the seg fault with an error message. + +Suggested-by: Ingo Molnar +Signed-off-by: Josh Poimboeuf +Cc: Andy Lutomirski +Cc: Borislav Petkov +Cc: Dave Hansen +Cc: David Woodhouse +Cc: Greg Kroah-Hartman +Cc: Guenter Roeck +Cc: H. Peter Anvin +Cc: Juergen Gross +Cc: Linus Torvalds +Cc: Peter Zijlstra +Cc: Thomas Gleixner +Link: http://lkml.kernel.org/r/dc4585a70d6b975c99fc51d1957ccdde7bd52f3a.1517284349.git.jpoimboe@redhat.com +Signed-off-by: Ingo Molnar +Signed-off-by: Greg Kroah-Hartman + +--- + tools/objtool/orc_gen.c | 5 +++++ + 1 file changed, 5 insertions(+) + +--- a/tools/objtool/orc_gen.c ++++ b/tools/objtool/orc_gen.c +@@ -98,6 +98,11 @@ static int create_orc_entry(struct secti + struct orc_entry *orc; + struct rela *rela; + ++ if (!insn_sec->sym) { ++ WARN("missing symbol for section %s", insn_sec->name); ++ return -1; ++ } ++ + /* populate ORC data */ + orc = (struct orc_entry *)u_sec->data->d_buf + idx; + memcpy(orc, o, sizeof(*orc)); diff --git a/queue-4.14/series b/queue-4.14/series index 0de485d65d4..e6e6c4be78d 100644 --- a/queue-4.14/series +++ b/queue-4.14/series @@ -23,3 +23,40 @@ auxdisplay-img-ascii-lcd-add-missing-module_description-author-license.patch iio-adc-accel-fix-up-module-licenses.patch pinctrl-pxa-pxa2xx-add-missing-module_description-author-license.patch asoc-pcm512x-add-missing-module_description-author-license.patch +KVM_nVMX_Eliminate_vmcs02_pool.patch +KVM_VMX_introduce_alloc_loaded_vmcs.patch +objtool_Improve_retpoline_alternative_handling.patch +objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch +objtool_Warn_on_stripped_section_symbol.patch +x86mm_Fix_overlap_of_i386_CPU_ENTRY_AREA_with_FIX_BTMAP.patch +x86spectre_Check_CONFIG_RETPOLINE_in_command_line_parser.patch +x86entry64_Remove_the_SYSCALL64_fast_path.patch +x86entry64_Push_extra_regs_right_away.patch +x86asm_Move_status_from_thread_struct_to_thread_info.patch +Documentation_Document_array_index_nospec.patch +array_index_nospec_Sanitize_speculative_array_de-references.patch +x86_Implement_array_index_mask_nospec.patch +x86_Introduce_barrier_nospec.patch +x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch +x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch +x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch +x86get_user_Use_pointer_masking_to_limit_speculation.patch +x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch +vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch +nl80211_Sanitize_array_index_in_parse_txq_params.patch +x86spectre_Report_get_user_mitigation_for_spectre_v1.patch +x86spectre_Fix_spelling_mistake_vunerable-_vulnerable.patch +x86cpuid_Fix_up_virtual_IBRSIBPBSTIBP_feature_bits_on_Intel.patch +x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch +x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch +KVM_VMX_make_MSR_bitmaps_per-VCPU.patch +x86kvm_Update_spectre-v1_mitigation.patch +x86retpoline_Avoid_retpolines_for_built-in___init_functions.patch +x86spectre_Simplify_spectre_v2_command_line_parsing.patch +x86pti_Mark_constant_arrays_as___initconst.patch +x86speculation_Fix_typo_IBRS_ATT_which_should_be_IBRS_ALL.patch +KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch +KVMx86_Add_IBPB_support.patch +KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch +KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch +KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch diff --git a/queue-4.14/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch b/queue-4.14/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch new file mode 100644 index 00000000000..c47774b2369 --- /dev/null +++ b/queue-4.14/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch @@ -0,0 +1,53 @@ +Subject: vfs, fdtable: Prevent bounds-check bypass via speculative execution +From: Dan Williams dan.j.williams@intel.com +Date: Mon Jan 29 17:03:05 2018 -0800 + +From: Dan Williams dan.j.williams@intel.com + +commit 56c30ba7b348b90484969054d561f711ba196507 + +'fd' is a user controlled value that is used as a data dependency to +read from the 'fdt->fd' array. In order to avoid potential leaks of +kernel memory values, block speculative execution of the instruction +stream that could issue reads based on an invalid 'file *' returned from +__fcheck_files. + +Co-developed-by: Elena Reshetova +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Cc: linux-arch@vger.kernel.org +Cc: kernel-hardening@lists.openwall.com +Cc: gregkh@linuxfoundation.org +Cc: Al Viro +Cc: torvalds@linux-foundation.org +Cc: alan@linux.intel.com +Link: https://lkml.kernel.org/r/151727418500.33451.17392199002892248656.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + include/linux/fdtable.h | 5 ++++- + 1 file changed, 4 insertions(+), 1 deletion(-) + +--- a/include/linux/fdtable.h ++++ b/include/linux/fdtable.h +@@ -10,6 +10,7 @@ + #include + #include + #include ++#include + #include + #include + #include +@@ -82,8 +83,10 @@ static inline struct file *__fcheck_file + { + struct fdtable *fdt = rcu_dereference_raw(files->fdt); + +- if (fd < fdt->max_fds) ++ if (fd < fdt->max_fds) { ++ fd = array_index_nospec(fd, fdt->max_fds); + return rcu_dereference_raw(fdt->fd[fd]); ++ } + return NULL; + } + diff --git a/queue-4.14/x86_Implement_array_index_mask_nospec.patch b/queue-4.14/x86_Implement_array_index_mask_nospec.patch new file mode 100644 index 00000000000..f12258f4edc --- /dev/null +++ b/queue-4.14/x86_Implement_array_index_mask_nospec.patch @@ -0,0 +1,65 @@ +Subject: x86: Implement array_index_mask_nospec +From: Dan Williams dan.j.williams@intel.com +Date: Mon Jan 29 17:02:28 2018 -0800 + +From: Dan Williams dan.j.williams@intel.com + +commit babdde2698d482b6c0de1eab4f697cf5856c5859 + +array_index_nospec() uses a mask to sanitize user controllable array +indexes, i.e. generate a 0 mask if 'index' >= 'size', and a ~0 mask +otherwise. While the default array_index_mask_nospec() handles the +carry-bit from the (index - size) result in software. + +The x86 array_index_mask_nospec() does the same, but the carry-bit is +handled in the processor CF flag without conditional instructions in the +control flow. + +Suggested-by: Linus Torvalds +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Cc: linux-arch@vger.kernel.org +Cc: kernel-hardening@lists.openwall.com +Cc: gregkh@linuxfoundation.org +Cc: alan@linux.intel.com +Link: https://lkml.kernel.org/r/151727414808.33451.1873237130672785331.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/include/asm/barrier.h | 24 ++++++++++++++++++++++++ + 1 file changed, 24 insertions(+) + +--- a/arch/x86/include/asm/barrier.h ++++ b/arch/x86/include/asm/barrier.h +@@ -24,6 +24,30 @@ + #define wmb() asm volatile("sfence" ::: "memory") + #endif + ++/** ++ * array_index_mask_nospec() - generate a mask that is ~0UL when the ++ * bounds check succeeds and 0 otherwise ++ * @index: array element index ++ * @size: number of elements in array ++ * ++ * Returns: ++ * 0 - (index < size) ++ */ ++static inline unsigned long array_index_mask_nospec(unsigned long index, ++ unsigned long size) ++{ ++ unsigned long mask; ++ ++ asm ("cmp %1,%2; sbb %0,%0;" ++ :"=r" (mask) ++ :"r"(size),"r" (index) ++ :"cc"); ++ return mask; ++} ++ ++/* Override the default implementation from linux/nospec.h. */ ++#define array_index_mask_nospec array_index_mask_nospec ++ + #ifdef CONFIG_X86_PPRO_FENCE + #define dma_rmb() rmb() + #else diff --git a/queue-4.14/x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch b/queue-4.14/x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch new file mode 100644 index 00000000000..73d3c643bbe --- /dev/null +++ b/queue-4.14/x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch @@ -0,0 +1,79 @@ +Subject: x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec +From: Dan Williams dan.j.williams@intel.com +Date: Mon Jan 29 17:02:39 2018 -0800 + +From: Dan Williams dan.j.williams@intel.com + +commit b3bbfb3fb5d25776b8e3f361d2eedaabb0b496cd + +For __get_user() paths, do not allow the kernel to speculate on the value +of a user controlled pointer. In addition to the 'stac' instruction for +Supervisor Mode Access Protection (SMAP), a barrier_nospec() causes the +access_ok() result to resolve in the pipeline before the CPU might take any +speculative action on the pointer value. Given the cost of 'stac' the +speculation barrier is placed after 'stac' to hopefully overlap the cost of +disabling SMAP with the cost of flushing the instruction pipeline. + +Since __get_user is a major kernel interface that deals with user +controlled pointers, the __uaccess_begin_nospec() mechanism will prevent +speculative execution past an access_ok() permission check. While +speculative execution past access_ok() is not enough to lead to a kernel +memory leak, it is a necessary precondition. + +To be clear, __uaccess_begin_nospec() is addressing a class of potential +problems near __get_user() usages. + +Note, that while the barrier_nospec() in __uaccess_begin_nospec() is used +to protect __get_user(), pointer masking similar to array_index_nospec() +will be used for get_user() since it incorporates a bounds check near the +usage. + +uaccess_try_nospec provides the same mechanism for get_user_try. + +No functional changes. + +Suggested-by: Linus Torvalds +Suggested-by: Andi Kleen +Suggested-by: Ingo Molnar +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Cc: linux-arch@vger.kernel.org +Cc: Tom Lendacky +Cc: Kees Cook +Cc: kernel-hardening@lists.openwall.com +Cc: gregkh@linuxfoundation.org +Cc: Al Viro +Cc: alan@linux.intel.com +Link: https://lkml.kernel.org/r/151727415922.33451.5796614273104346583.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/include/asm/uaccess.h | 9 +++++++++ + 1 file changed, 9 insertions(+) + +--- a/arch/x86/include/asm/uaccess.h ++++ b/arch/x86/include/asm/uaccess.h +@@ -124,6 +124,11 @@ extern int __get_user_bad(void); + + #define __uaccess_begin() stac() + #define __uaccess_end() clac() ++#define __uaccess_begin_nospec() \ ++({ \ ++ stac(); \ ++ barrier_nospec(); \ ++}) + + /* + * This is a type: either unsigned long, if the argument fits into +@@ -487,6 +492,10 @@ struct __large_struct { unsigned long bu + __uaccess_begin(); \ + barrier(); + ++#define uaccess_try_nospec do { \ ++ current->thread.uaccess_err = 0; \ ++ __uaccess_begin_nospec(); \ ++ + #define uaccess_catch(err) \ + __uaccess_end(); \ + (err) |= (current->thread.uaccess_err ? -EFAULT : 0); \ diff --git a/queue-4.14/x86_Introduce_barrier_nospec.patch b/queue-4.14/x86_Introduce_barrier_nospec.patch new file mode 100644 index 00000000000..ecae4045a9c --- /dev/null +++ b/queue-4.14/x86_Introduce_barrier_nospec.patch @@ -0,0 +1,65 @@ +Subject: x86: Introduce barrier_nospec +From: Dan Williams dan.j.williams@intel.com +Date: Mon Jan 29 17:02:33 2018 -0800 + +From: Dan Williams dan.j.williams@intel.com + +commit b3d7ad85b80bbc404635dca80f5b129f6242bc7a + +Rename the open coded form of this instruction sequence from +rdtsc_ordered() into a generic barrier primitive, barrier_nospec(). + +One of the mitigations for Spectre variant1 vulnerabilities is to fence +speculative execution after successfully validating a bounds check. I.e. +force the result of a bounds check to resolve in the instruction pipeline +to ensure speculative execution honors that result before potentially +operating on out-of-bounds data. + +No functional changes. + +Suggested-by: Linus Torvalds +Suggested-by: Andi Kleen +Suggested-by: Ingo Molnar +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Cc: linux-arch@vger.kernel.org +Cc: Tom Lendacky +Cc: Kees Cook +Cc: kernel-hardening@lists.openwall.com +Cc: gregkh@linuxfoundation.org +Cc: Al Viro +Cc: alan@linux.intel.com +Link: https://lkml.kernel.org/r/151727415361.33451.9049453007262764675.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/include/asm/barrier.h | 4 ++++ + arch/x86/include/asm/msr.h | 3 +-- + 2 files changed, 5 insertions(+), 2 deletions(-) + +--- a/arch/x86/include/asm/barrier.h ++++ b/arch/x86/include/asm/barrier.h +@@ -48,6 +48,10 @@ static inline unsigned long array_index_ + /* Override the default implementation from linux/nospec.h. */ + #define array_index_mask_nospec array_index_mask_nospec + ++/* Prevent speculative execution past this barrier. */ ++#define barrier_nospec() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \ ++ "lfence", X86_FEATURE_LFENCE_RDTSC) ++ + #ifdef CONFIG_X86_PPRO_FENCE + #define dma_rmb() rmb() + #else +--- a/arch/x86/include/asm/msr.h ++++ b/arch/x86/include/asm/msr.h +@@ -214,8 +214,7 @@ static __always_inline unsigned long lon + * that some other imaginary CPU is updating continuously with a + * time stamp. + */ +- alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, +- "lfence", X86_FEATURE_LFENCE_RDTSC); ++ barrier_nospec(); + return rdtsc(); + } + diff --git a/queue-4.14/x86asm_Move_status_from_thread_struct_to_thread_info.patch b/queue-4.14/x86asm_Move_status_from_thread_struct_to_thread_info.patch new file mode 100644 index 00000000000..a700ab5c1e2 --- /dev/null +++ b/queue-4.14/x86asm_Move_status_from_thread_struct_to_thread_info.patch @@ -0,0 +1,171 @@ +Subject: x86/asm: Move 'status' from thread_struct to thread_info +From: Andy Lutomirski luto@kernel.org +Date: Sun Jan 28 10:38:50 2018 -0800 + +From: Andy Lutomirski luto@kernel.org + +commit 37a8f7c38339b22b69876d6f5a0ab851565284e3 + +The TS_COMPAT bit is very hot and is accessed from code paths that mostly +also touch thread_info::flags. Move it into struct thread_info to improve +cache locality. + +The only reason it was in thread_struct is that there was a brief period +during which arch-specific fields were not allowed in struct thread_info. + +Linus suggested further changing: + + ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); + +to: + + if (unlikely(ti->status & (TS_COMPAT|TS_I386_REGS_POKED))) + ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); + +on the theory that frequently dirtying the cacheline even in pure 64-bit +code that never needs to modify status hurts performance. That could be a +reasonable followup patch, but I suspect it matters less on top of this +patch. + +Suggested-by: Linus Torvalds +Signed-off-by: Andy Lutomirski +Signed-off-by: Thomas Gleixner +Reviewed-by: Ingo Molnar +Acked-by: Linus Torvalds +Cc: Borislav Petkov +Cc: Kernel Hardening +Link: https://lkml.kernel.org/r/03148bcc1b217100e6e8ecf6a5468c45cf4304b6.1517164461.git.luto@kernel.org +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/entry/common.c | 4 ++-- + arch/x86/include/asm/processor.h | 2 -- + arch/x86/include/asm/syscall.h | 6 +++--- + arch/x86/include/asm/thread_info.h | 3 ++- + arch/x86/kernel/process_64.c | 4 ++-- + arch/x86/kernel/ptrace.c | 2 +- + arch/x86/kernel/signal.c | 2 +- + 7 files changed, 11 insertions(+), 12 deletions(-) + +--- a/arch/x86/entry/common.c ++++ b/arch/x86/entry/common.c +@@ -208,7 +208,7 @@ __visible inline void prepare_exit_to_us + * special case only applies after poking regs and before the + * very next return to user mode. + */ +- current->thread.status &= ~(TS_COMPAT|TS_I386_REGS_POKED); ++ ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); + #endif + + user_enter_irqoff(); +@@ -306,7 +306,7 @@ static __always_inline void do_syscall_3 + unsigned int nr = (unsigned int)regs->orig_ax; + + #ifdef CONFIG_IA32_EMULATION +- current->thread.status |= TS_COMPAT; ++ ti->status |= TS_COMPAT; + #endif + + if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) { +--- a/arch/x86/include/asm/processor.h ++++ b/arch/x86/include/asm/processor.h +@@ -459,8 +459,6 @@ struct thread_struct { + unsigned short gsindex; + #endif + +- u32 status; /* thread synchronous flags */ +- + #ifdef CONFIG_X86_64 + unsigned long fsbase; + unsigned long gsbase; +--- a/arch/x86/include/asm/syscall.h ++++ b/arch/x86/include/asm/syscall.h +@@ -60,7 +60,7 @@ static inline long syscall_get_error(str + * TS_COMPAT is set for 32-bit syscall entries and then + * remains set until we return to user mode. + */ +- if (task->thread.status & (TS_COMPAT|TS_I386_REGS_POKED)) ++ if (task->thread_info.status & (TS_COMPAT|TS_I386_REGS_POKED)) + /* + * Sign-extend the value so (int)-EFOO becomes (long)-EFOO + * and will match correctly in comparisons. +@@ -116,7 +116,7 @@ static inline void syscall_get_arguments + unsigned long *args) + { + # ifdef CONFIG_IA32_EMULATION +- if (task->thread.status & TS_COMPAT) ++ if (task->thread_info.status & TS_COMPAT) + switch (i) { + case 0: + if (!n--) break; +@@ -177,7 +177,7 @@ static inline void syscall_set_arguments + const unsigned long *args) + { + # ifdef CONFIG_IA32_EMULATION +- if (task->thread.status & TS_COMPAT) ++ if (task->thread_info.status & TS_COMPAT) + switch (i) { + case 0: + if (!n--) break; +--- a/arch/x86/include/asm/thread_info.h ++++ b/arch/x86/include/asm/thread_info.h +@@ -55,6 +55,7 @@ struct task_struct; + + struct thread_info { + unsigned long flags; /* low level flags */ ++ u32 status; /* thread synchronous flags */ + }; + + #define INIT_THREAD_INFO(tsk) \ +@@ -221,7 +222,7 @@ static inline int arch_within_stack_fram + #define in_ia32_syscall() true + #else + #define in_ia32_syscall() (IS_ENABLED(CONFIG_IA32_EMULATION) && \ +- current->thread.status & TS_COMPAT) ++ current_thread_info()->status & TS_COMPAT) + #endif + + /* +--- a/arch/x86/kernel/process_64.c ++++ b/arch/x86/kernel/process_64.c +@@ -557,7 +557,7 @@ static void __set_personality_x32(void) + * Pretend to come from a x32 execve. + */ + task_pt_regs(current)->orig_ax = __NR_x32_execve | __X32_SYSCALL_BIT; +- current->thread.status &= ~TS_COMPAT; ++ current_thread_info()->status &= ~TS_COMPAT; + #endif + } + +@@ -571,7 +571,7 @@ static void __set_personality_ia32(void) + current->personality |= force_personality32; + /* Prepare the first "return" to user space */ + task_pt_regs(current)->orig_ax = __NR_ia32_execve; +- current->thread.status |= TS_COMPAT; ++ current_thread_info()->status |= TS_COMPAT; + #endif + } + +--- a/arch/x86/kernel/ptrace.c ++++ b/arch/x86/kernel/ptrace.c +@@ -935,7 +935,7 @@ static int putreg32(struct task_struct * + */ + regs->orig_ax = value; + if (syscall_get_nr(child, regs) >= 0) +- child->thread.status |= TS_I386_REGS_POKED; ++ child->thread_info.status |= TS_I386_REGS_POKED; + break; + + case offsetof(struct user32, regs.eflags): +--- a/arch/x86/kernel/signal.c ++++ b/arch/x86/kernel/signal.c +@@ -787,7 +787,7 @@ static inline unsigned long get_nr_resta + * than the tracee. + */ + #ifdef CONFIG_IA32_EMULATION +- if (current->thread.status & (TS_COMPAT|TS_I386_REGS_POKED)) ++ if (current_thread_info()->status & (TS_COMPAT|TS_I386_REGS_POKED)) + return __NR_ia32_restart_syscall; + #endif + #ifdef CONFIG_X86_X32_ABI diff --git a/queue-4.14/x86cpuid_Fix_up_virtual_IBRSIBPBSTIBP_feature_bits_on_Intel.patch b/queue-4.14/x86cpuid_Fix_up_virtual_IBRSIBPBSTIBP_feature_bits_on_Intel.patch new file mode 100644 index 00000000000..df29d55a338 --- /dev/null +++ b/queue-4.14/x86cpuid_Fix_up_virtual_IBRSIBPBSTIBP_feature_bits_on_Intel.patch @@ -0,0 +1,121 @@ +Subject: x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel +From: David Woodhouse dwmw@amazon.co.uk +Date: Tue Jan 30 14:30:23 2018 +0000 + +From: David Woodhouse dwmw@amazon.co.uk + +commit 7fcae1118f5fd44a862aa5c3525248e35ee67c3b + +Despite the fact that all the other code there seems to be doing it, just +using set_cpu_cap() in early_intel_init() doesn't actually work. + +For CPUs with PKU support, setup_pku() calls get_cpu_cap() after +c->c_init() has set those feature bits. That resets those bits back to what +was queried from the hardware. + +Turning the bits off for bad microcode is easy to fix. That can just use +setup_clear_cpu_cap() to force them off for all CPUs. + +I was less keen on forcing the feature bits *on* that way, just in case +of inconsistencies. I appreciate that the kernel is going to get this +utterly wrong if CPU features are not consistent, because it has already +applied alternatives by the time secondary CPUs are brought up. + +But at least if setup_force_cpu_cap() isn't being used, we might have a +chance of *detecting* the lack of the corresponding bit and either +panicking or refusing to bring the offending CPU online. + +So ensure that the appropriate feature bits are set within get_cpu_cap() +regardless of how many extra times it's called. + +Fixes: 2961298e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags") +Signed-off-by: David Woodhouse +Signed-off-by: Thomas Gleixner +Cc: karahmed@amazon.de +Cc: peterz@infradead.org +Cc: bp@alien8.de +Link: https://lkml.kernel.org/r/1517322623-15261-1-git-send-email-dwmw@amazon.co.uk +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/kernel/cpu/common.c | 21 +++++++++++++++++++++ + arch/x86/kernel/cpu/intel.c | 27 ++++++++------------------- + 2 files changed, 29 insertions(+), 19 deletions(-) + +--- a/arch/x86/kernel/cpu/common.c ++++ b/arch/x86/kernel/cpu/common.c +@@ -726,6 +726,26 @@ static void apply_forced_caps(struct cpu + } + } + ++static void init_speculation_control(struct cpuinfo_x86 *c) ++{ ++ /* ++ * The Intel SPEC_CTRL CPUID bit implies IBRS and IBPB support, ++ * and they also have a different bit for STIBP support. Also, ++ * a hypervisor might have set the individual AMD bits even on ++ * Intel CPUs, for finer-grained selection of what's available. ++ * ++ * We use the AMD bits in 0x8000_0008 EBX as the generic hardware ++ * features, which are visible in /proc/cpuinfo and used by the ++ * kernel. So set those accordingly from the Intel bits. ++ */ ++ if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) { ++ set_cpu_cap(c, X86_FEATURE_IBRS); ++ set_cpu_cap(c, X86_FEATURE_IBPB); ++ } ++ if (cpu_has(c, X86_FEATURE_INTEL_STIBP)) ++ set_cpu_cap(c, X86_FEATURE_STIBP); ++} ++ + void get_cpu_cap(struct cpuinfo_x86 *c) + { + u32 eax, ebx, ecx, edx; +@@ -820,6 +840,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c) + c->x86_capability[CPUID_8000_000A_EDX] = cpuid_edx(0x8000000a); + + init_scattered_cpuid_features(c); ++ init_speculation_control(c); + + /* + * Clear/Set all flags overridden by options, after probe. +--- a/arch/x86/kernel/cpu/intel.c ++++ b/arch/x86/kernel/cpu/intel.c +@@ -175,28 +175,17 @@ static void early_init_intel(struct cpui + if (c->x86 >= 6 && !cpu_has(c, X86_FEATURE_IA64)) + c->microcode = intel_get_microcode_revision(); + +- /* +- * The Intel SPEC_CTRL CPUID bit implies IBRS and IBPB support, +- * and they also have a different bit for STIBP support. Also, +- * a hypervisor might have set the individual AMD bits even on +- * Intel CPUs, for finer-grained selection of what's available. +- */ +- if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) { +- set_cpu_cap(c, X86_FEATURE_IBRS); +- set_cpu_cap(c, X86_FEATURE_IBPB); +- } +- if (cpu_has(c, X86_FEATURE_INTEL_STIBP)) +- set_cpu_cap(c, X86_FEATURE_STIBP); +- + /* Now if any of them are set, check the blacklist and clear the lot */ +- if ((cpu_has(c, X86_FEATURE_IBRS) || cpu_has(c, X86_FEATURE_IBPB) || ++ if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) || ++ cpu_has(c, X86_FEATURE_INTEL_STIBP) || ++ cpu_has(c, X86_FEATURE_IBRS) || cpu_has(c, X86_FEATURE_IBPB) || + cpu_has(c, X86_FEATURE_STIBP)) && bad_spectre_microcode(c)) { + pr_warn("Intel Spectre v2 broken microcode detected; disabling Speculation Control\n"); +- clear_cpu_cap(c, X86_FEATURE_IBRS); +- clear_cpu_cap(c, X86_FEATURE_IBPB); +- clear_cpu_cap(c, X86_FEATURE_STIBP); +- clear_cpu_cap(c, X86_FEATURE_SPEC_CTRL); +- clear_cpu_cap(c, X86_FEATURE_INTEL_STIBP); ++ setup_clear_cpu_cap(X86_FEATURE_IBRS); ++ setup_clear_cpu_cap(X86_FEATURE_IBPB); ++ setup_clear_cpu_cap(X86_FEATURE_STIBP); ++ setup_clear_cpu_cap(X86_FEATURE_SPEC_CTRL); ++ setup_clear_cpu_cap(X86_FEATURE_INTEL_STIBP); + } + + /* diff --git a/queue-4.14/x86entry64_Push_extra_regs_right_away.patch b/queue-4.14/x86entry64_Push_extra_regs_right_away.patch new file mode 100644 index 00000000000..e79133098f2 --- /dev/null +++ b/queue-4.14/x86entry64_Push_extra_regs_right_away.patch @@ -0,0 +1,50 @@ +Subject: x86/entry/64: Push extra regs right away +From: Andy Lutomirski luto@kernel.org +Date: Sun Jan 28 10:38:49 2018 -0800 + +From: Andy Lutomirski luto@kernel.org + +commit d1f7732009e0549eedf8ea1db948dc37be77fd46 + +With the fast path removed there is no point in splitting the push of the +normal and the extra register set. Just push the extra regs right away. + +[ tglx: Split out from 'x86/entry/64: Remove the SYSCALL64 fast path' ] + +Signed-off-by: Andy Lutomirski +Signed-off-by: Thomas Gleixner +Acked-by: Ingo Molnar +Cc: Borislav Petkov +Cc: Linus Torvalds +Cc: Kernel Hardening +Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.1517164461.git.luto@kernel.org +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/entry/entry_64.S | 10 +++++++--- + 1 file changed, 7 insertions(+), 3 deletions(-) + +--- a/arch/x86/entry/entry_64.S ++++ b/arch/x86/entry/entry_64.S +@@ -232,13 +232,17 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) + pushq %r9 /* pt_regs->r9 */ + pushq %r10 /* pt_regs->r10 */ + pushq %r11 /* pt_regs->r11 */ +- sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */ +- UNWIND_HINT_REGS extra=0 ++ pushq %rbx /* pt_regs->rbx */ ++ pushq %rbp /* pt_regs->rbp */ ++ pushq %r12 /* pt_regs->r12 */ ++ pushq %r13 /* pt_regs->r13 */ ++ pushq %r14 /* pt_regs->r14 */ ++ pushq %r15 /* pt_regs->r15 */ ++ UNWIND_HINT_REGS + + TRACE_IRQS_OFF + + /* IRQs are off. */ +- SAVE_EXTRA_REGS + movq %rsp, %rdi + call do_syscall_64 /* returns with IRQs disabled */ + diff --git a/queue-4.14/x86entry64_Remove_the_SYSCALL64_fast_path.patch b/queue-4.14/x86entry64_Remove_the_SYSCALL64_fast_path.patch new file mode 100644 index 00000000000..59132a38404 --- /dev/null +++ b/queue-4.14/x86entry64_Remove_the_SYSCALL64_fast_path.patch @@ -0,0 +1,196 @@ +Subject: x86/entry/64: Remove the SYSCALL64 fast path +From: Andy Lutomirski luto@kernel.org +Date: Sun Jan 28 10:38:49 2018 -0800 + +From: Andy Lutomirski luto@kernel.org + +commit 21d375b6b34ff511a507de27bf316b3dde6938d9 + +The SYCALLL64 fast path was a nice, if small, optimization back in the good +old days when syscalls were actually reasonably fast. Now there is PTI to +slow everything down, and indirect branches are verboten, making everything +messier. The retpoline code in the fast path is particularly nasty. + +Just get rid of the fast path. The slow path is barely slower. + +[ tglx: Split out the 'push all extra regs' part ] + +Signed-off-by: Andy Lutomirski +Signed-off-by: Thomas Gleixner +Acked-by: Ingo Molnar +Cc: Borislav Petkov +Cc: Linus Torvalds +Cc: Kernel Hardening +Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.1517164461.git.luto@kernel.org +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/entry/entry_64.S | 117 -------------------------------------------- + arch/x86/entry/syscall_64.c | 7 -- + 2 files changed, 2 insertions(+), 122 deletions(-) + +--- a/arch/x86/entry/entry_64.S ++++ b/arch/x86/entry/entry_64.S +@@ -237,86 +237,11 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) + + TRACE_IRQS_OFF + +- /* +- * If we need to do entry work or if we guess we'll need to do +- * exit work, go straight to the slow path. +- */ +- movq PER_CPU_VAR(current_task), %r11 +- testl $_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, TASK_TI_flags(%r11) +- jnz entry_SYSCALL64_slow_path +- +-entry_SYSCALL_64_fastpath: +- /* +- * Easy case: enable interrupts and issue the syscall. If the syscall +- * needs pt_regs, we'll call a stub that disables interrupts again +- * and jumps to the slow path. +- */ +- TRACE_IRQS_ON +- ENABLE_INTERRUPTS(CLBR_NONE) +-#if __SYSCALL_MASK == ~0 +- cmpq $__NR_syscall_max, %rax +-#else +- andl $__SYSCALL_MASK, %eax +- cmpl $__NR_syscall_max, %eax +-#endif +- ja 1f /* return -ENOSYS (already in pt_regs->ax) */ +- movq %r10, %rcx +- +- /* +- * This call instruction is handled specially in stub_ptregs_64. +- * It might end up jumping to the slow path. If it jumps, RAX +- * and all argument registers are clobbered. +- */ +-#ifdef CONFIG_RETPOLINE +- movq sys_call_table(, %rax, 8), %rax +- call __x86_indirect_thunk_rax +-#else +- call *sys_call_table(, %rax, 8) +-#endif +-.Lentry_SYSCALL_64_after_fastpath_call: +- +- movq %rax, RAX(%rsp) +-1: +- +- /* +- * If we get here, then we know that pt_regs is clean for SYSRET64. +- * If we see that no exit work is required (which we are required +- * to check with IRQs off), then we can go straight to SYSRET64. +- */ +- DISABLE_INTERRUPTS(CLBR_ANY) +- TRACE_IRQS_OFF +- movq PER_CPU_VAR(current_task), %r11 +- testl $_TIF_ALLWORK_MASK, TASK_TI_flags(%r11) +- jnz 1f +- +- LOCKDEP_SYS_EXIT +- TRACE_IRQS_ON /* user mode is traced as IRQs on */ +- movq RIP(%rsp), %rcx +- movq EFLAGS(%rsp), %r11 +- addq $6*8, %rsp /* skip extra regs -- they were preserved */ +- UNWIND_HINT_EMPTY +- jmp .Lpop_c_regs_except_rcx_r11_and_sysret +- +-1: +- /* +- * The fast path looked good when we started, but something changed +- * along the way and we need to switch to the slow path. Calling +- * raise(3) will trigger this, for example. IRQs are off. +- */ +- TRACE_IRQS_ON +- ENABLE_INTERRUPTS(CLBR_ANY) +- SAVE_EXTRA_REGS +- movq %rsp, %rdi +- call syscall_return_slowpath /* returns with IRQs disabled */ +- jmp return_from_SYSCALL_64 +- +-entry_SYSCALL64_slow_path: + /* IRQs are off. */ + SAVE_EXTRA_REGS + movq %rsp, %rdi + call do_syscall_64 /* returns with IRQs disabled */ + +-return_from_SYSCALL_64: + TRACE_IRQS_IRETQ /* we're about to change IF */ + + /* +@@ -389,7 +314,6 @@ syscall_return_via_sysret: + /* rcx and r11 are already restored (see code above) */ + UNWIND_HINT_EMPTY + POP_EXTRA_REGS +-.Lpop_c_regs_except_rcx_r11_and_sysret: + popq %rsi /* skip r11 */ + popq %r10 + popq %r9 +@@ -420,47 +344,6 @@ syscall_return_via_sysret: + USERGS_SYSRET64 + END(entry_SYSCALL_64) + +-ENTRY(stub_ptregs_64) +- /* +- * Syscalls marked as needing ptregs land here. +- * If we are on the fast path, we need to save the extra regs, +- * which we achieve by trying again on the slow path. If we are on +- * the slow path, the extra regs are already saved. +- * +- * RAX stores a pointer to the C function implementing the syscall. +- * IRQs are on. +- */ +- cmpq $.Lentry_SYSCALL_64_after_fastpath_call, (%rsp) +- jne 1f +- +- /* +- * Called from fast path -- disable IRQs again, pop return address +- * and jump to slow path +- */ +- DISABLE_INTERRUPTS(CLBR_ANY) +- TRACE_IRQS_OFF +- popq %rax +- UNWIND_HINT_REGS extra=0 +- jmp entry_SYSCALL64_slow_path +- +-1: +- JMP_NOSPEC %rax /* Called from C */ +-END(stub_ptregs_64) +- +-.macro ptregs_stub func +-ENTRY(ptregs_\func) +- UNWIND_HINT_FUNC +- leaq \func(%rip), %rax +- jmp stub_ptregs_64 +-END(ptregs_\func) +-.endm +- +-/* Instantiate ptregs_stub for each ptregs-using syscall */ +-#define __SYSCALL_64_QUAL_(sym) +-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym +-#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym) +-#include +- + /* + * %rdi: prev task + * %rsi: next task +--- a/arch/x86/entry/syscall_64.c ++++ b/arch/x86/entry/syscall_64.c +@@ -7,14 +7,11 @@ + #include + #include + +-#define __SYSCALL_64_QUAL_(sym) sym +-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym +- +-#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); ++#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); + #include + #undef __SYSCALL_64 + +-#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym), ++#define __SYSCALL_64(nr, sym, qual) [nr] = sym, + + extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); + diff --git a/queue-4.14/x86get_user_Use_pointer_masking_to_limit_speculation.patch b/queue-4.14/x86get_user_Use_pointer_masking_to_limit_speculation.patch new file mode 100644 index 00000000000..8c70533768e --- /dev/null +++ b/queue-4.14/x86get_user_Use_pointer_masking_to_limit_speculation.patch @@ -0,0 +1,97 @@ +Subject: x86/get_user: Use pointer masking to limit speculation +From: Dan Williams dan.j.williams@intel.com +Date: Mon Jan 29 17:02:54 2018 -0800 + +From: Dan Williams dan.j.williams@intel.com + +commit c7f631cb07e7da06ac1d231ca178452339e32a94 + +Quoting Linus: + + I do think that it would be a good idea to very expressly document + the fact that it's not that the user access itself is unsafe. I do + agree that things like "get_user()" want to be protected, but not + because of any direct bugs or problems with get_user() and friends, + but simply because get_user() is an excellent source of a pointer + that is obviously controlled from a potentially attacking user + space. So it's a prime candidate for then finding _subsequent_ + accesses that can then be used to perturb the cache. + +Unlike the __get_user() case get_user() includes the address limit check +near the pointer de-reference. With that locality the speculation can be +mitigated with pointer narrowing rather than a barrier, i.e. +array_index_nospec(). Where the narrowing is performed by: + + cmp %limit, %ptr + sbb %mask, %mask + and %mask, %ptr + +With respect to speculation the value of %ptr is either less than %limit +or NULL. + +Co-developed-by: Linus Torvalds +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Cc: linux-arch@vger.kernel.org +Cc: Kees Cook +Cc: kernel-hardening@lists.openwall.com +Cc: gregkh@linuxfoundation.org +Cc: Al Viro +Cc: Andy Lutomirski +Cc: torvalds@linux-foundation.org +Cc: alan@linux.intel.com +Link: https://lkml.kernel.org/r/151727417469.33451.11804043010080838495.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/lib/getuser.S | 10 ++++++++++ + 1 file changed, 10 insertions(+) + +--- a/arch/x86/lib/getuser.S ++++ b/arch/x86/lib/getuser.S +@@ -40,6 +40,8 @@ ENTRY(__get_user_1) + mov PER_CPU_VAR(current_task), %_ASM_DX + cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX + jae bad_get_user ++ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */ ++ and %_ASM_DX, %_ASM_AX + ASM_STAC + 1: movzbl (%_ASM_AX),%edx + xor %eax,%eax +@@ -54,6 +56,8 @@ ENTRY(__get_user_2) + mov PER_CPU_VAR(current_task), %_ASM_DX + cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX + jae bad_get_user ++ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */ ++ and %_ASM_DX, %_ASM_AX + ASM_STAC + 2: movzwl -1(%_ASM_AX),%edx + xor %eax,%eax +@@ -68,6 +72,8 @@ ENTRY(__get_user_4) + mov PER_CPU_VAR(current_task), %_ASM_DX + cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX + jae bad_get_user ++ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */ ++ and %_ASM_DX, %_ASM_AX + ASM_STAC + 3: movl -3(%_ASM_AX),%edx + xor %eax,%eax +@@ -83,6 +89,8 @@ ENTRY(__get_user_8) + mov PER_CPU_VAR(current_task), %_ASM_DX + cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX + jae bad_get_user ++ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */ ++ and %_ASM_DX, %_ASM_AX + ASM_STAC + 4: movq -7(%_ASM_AX),%rdx + xor %eax,%eax +@@ -94,6 +102,8 @@ ENTRY(__get_user_8) + mov PER_CPU_VAR(current_task), %_ASM_DX + cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX + jae bad_get_user_8 ++ sbb %_ASM_DX, %_ASM_DX /* array_index_mask_nospec() */ ++ and %_ASM_DX, %_ASM_AX + ASM_STAC + 4: movl -7(%_ASM_AX),%edx + 5: movl -3(%_ASM_AX),%ecx diff --git a/queue-4.14/x86kvm_Update_spectre-v1_mitigation.patch b/queue-4.14/x86kvm_Update_spectre-v1_mitigation.patch new file mode 100644 index 00000000000..4e1b7b96ed0 --- /dev/null +++ b/queue-4.14/x86kvm_Update_spectre-v1_mitigation.patch @@ -0,0 +1,69 @@ +Subject: x86/kvm: Update spectre-v1 mitigation +From: Dan Williams dan.j.williams@intel.com +Date: Wed Jan 31 17:47:03 2018 -0800 + +From: Dan Williams dan.j.williams@intel.com + +commit 085331dfc6bbe3501fb936e657331ca943827600 + +Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup" +added a raw 'asm("lfence");' to prevent a bounds check bypass of +'vmcs_field_to_offset_table'. + +The lfence can be avoided in this path by using the array_index_nospec() +helper designed for these types of fixes. + +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Acked-by: Paolo Bonzini +Cc: Andrew Honig +Cc: kvm@vger.kernel.org +Cc: Jim Mattson +Link: https://lkml.kernel.org/r/151744959670.6342.3001723920950249067.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/kvm/vmx.c | 20 +++++++++----------- + 1 file changed, 9 insertions(+), 11 deletions(-) + +--- a/arch/x86/kvm/vmx.c ++++ b/arch/x86/kvm/vmx.c +@@ -34,6 +34,7 @@ + #include + #include + #include ++#include + #include "kvm_cache_regs.h" + #include "x86.h" + +@@ -887,21 +888,18 @@ static const unsigned short vmcs_field_t + + static inline short vmcs_field_to_offset(unsigned long field) + { +- BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX); ++ const size_t size = ARRAY_SIZE(vmcs_field_to_offset_table); ++ unsigned short offset; + +- if (field >= ARRAY_SIZE(vmcs_field_to_offset_table)) ++ BUILD_BUG_ON(size > SHRT_MAX); ++ if (field >= size) + return -ENOENT; + +- /* +- * FIXME: Mitigation for CVE-2017-5753. To be replaced with a +- * generic mechanism. +- */ +- asm("lfence"); +- +- if (vmcs_field_to_offset_table[field] == 0) ++ field = array_index_nospec(field, size); ++ offset = vmcs_field_to_offset_table[field]; ++ if (offset == 0) + return -ENOENT; +- +- return vmcs_field_to_offset_table[field]; ++ return offset; + } + + static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu) diff --git a/queue-4.14/x86mm_Fix_overlap_of_i386_CPU_ENTRY_AREA_with_FIX_BTMAP.patch b/queue-4.14/x86mm_Fix_overlap_of_i386_CPU_ENTRY_AREA_with_FIX_BTMAP.patch new file mode 100644 index 00000000000..71f794c0bc0 --- /dev/null +++ b/queue-4.14/x86mm_Fix_overlap_of_i386_CPU_ENTRY_AREA_with_FIX_BTMAP.patch @@ -0,0 +1,68 @@ +Subject: x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP +From: William Grant william.grant@canonical.com +Date: Tue Jan 30 22:22:55 2018 +1100 + +From: William Grant william.grant@canonical.com + +commit 55f49fcb879fbeebf2a8c1ac7c9e6d90df55f798 + +Since commit 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the +fixmap"), i386's CPU_ENTRY_AREA has been mapped to the memory area just +below FIXADDR_START. But already immediately before FIXADDR_START is the +FIX_BTMAP area, which means that early_ioremap can collide with the entry +area. + +It's especially bad on PAE where FIX_BTMAP_BEGIN gets aligned to exactly +match CPU_ENTRY_AREA_BASE, so the first early_ioremap slot clobbers the +IDT and causes interrupts during early boot to reset the system. + +The overlap wasn't a problem before the CPU entry area was introduced, +as the fixmap has classically been preceded by the pkmap or vmalloc +areas, neither of which is used until early_ioremap is out of the +picture. + +Relocate CPU_ENTRY_AREA to below FIX_BTMAP, not just below the permanent +fixmap area. + +Fixes: commit 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") +Signed-off-by: William Grant +Signed-off-by: Thomas Gleixner +Cc: stable@vger.kernel.org +Link: https://lkml.kernel.org/r/7041d181-a019-e8b9-4e4e-48215f841e2c@canonical.com +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/include/asm/fixmap.h | 6 ++++-- + arch/x86/include/asm/pgtable_32_types.h | 5 +++-- + 2 files changed, 7 insertions(+), 4 deletions(-) + +--- a/arch/x86/include/asm/fixmap.h ++++ b/arch/x86/include/asm/fixmap.h +@@ -137,8 +137,10 @@ enum fixed_addresses { + + extern void reserve_top_address(unsigned long reserve); + +-#define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT) +-#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE) ++#define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT) ++#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE) ++#define FIXADDR_TOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) ++#define FIXADDR_TOT_START (FIXADDR_TOP - FIXADDR_TOT_SIZE) + + extern int fixmaps_set; + +--- a/arch/x86/include/asm/pgtable_32_types.h ++++ b/arch/x86/include/asm/pgtable_32_types.h +@@ -44,8 +44,9 @@ extern bool __vmalloc_start_set; /* set + */ + #define CPU_ENTRY_AREA_PAGES (NR_CPUS * 40) + +-#define CPU_ENTRY_AREA_BASE \ +- ((FIXADDR_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) & PMD_MASK) ++#define CPU_ENTRY_AREA_BASE \ ++ ((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) \ ++ & PMD_MASK) + + #define PKMAP_BASE \ + ((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK) diff --git a/queue-4.14/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch b/queue-4.14/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch new file mode 100644 index 00000000000..132d6ec2b42 --- /dev/null +++ b/queue-4.14/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch @@ -0,0 +1,90 @@ +Subject: x86/paravirt: Remove 'noreplace-paravirt' cmdline option +From: Josh Poimboeuf jpoimboe@redhat.com +Date: Tue Jan 30 22:13:33 2018 -0600 + +From: Josh Poimboeuf jpoimboe@redhat.com + +commit 12c69f1e94c89d40696e83804dd2f0965b5250cd + +The 'noreplace-paravirt' option disables paravirt patching, leaving the +original pv indirect calls in place. + +That's highly incompatible with retpolines, unless we want to uglify +paravirt even further and convert the paravirt calls to retpolines. + +As far as I can tell, the option doesn't seem to be useful for much +other than introducing surprising corner cases and making the kernel +vulnerable to Spectre v2. It was probably a debug option from the early +paravirt days. So just remove it. + +Signed-off-by: Josh Poimboeuf +Signed-off-by: Thomas Gleixner +Reviewed-by: Juergen Gross +Cc: Andrea Arcangeli +Cc: Peter Zijlstra +Cc: Andi Kleen +Cc: Ashok Raj +Cc: Greg KH +Cc: Jun Nakajima +Cc: Tim Chen +Cc: Rusty Russell +Cc: Dave Hansen +Cc: Asit Mallick +Cc: Andy Lutomirski +Cc: Linus Torvalds +Cc: Jason Baron +Cc: Paolo Bonzini +Cc: Alok Kataria +Cc: Arjan Van De Ven +Cc: David Woodhouse +Cc: Dan Williams +Link: https://lkml.kernel.org/r/20180131041333.2x6blhxirc2kclrq@treble +Signed-off-by: Greg Kroah-Hartman + + +--- + Documentation/admin-guide/kernel-parameters.txt | 2 -- + arch/x86/kernel/alternative.c | 14 -------------- + 2 files changed, 16 deletions(-) + +--- a/Documentation/admin-guide/kernel-parameters.txt ++++ b/Documentation/admin-guide/kernel-parameters.txt +@@ -2718,8 +2718,6 @@ + norandmaps Don't use address space randomization. Equivalent to + echo 0 > /proc/sys/kernel/randomize_va_space + +- noreplace-paravirt [X86,IA-64,PV_OPS] Don't patch paravirt_ops +- + noreplace-smp [X86-32,SMP] Don't replace SMP instructions + with UP alternatives + +--- a/arch/x86/kernel/alternative.c ++++ b/arch/x86/kernel/alternative.c +@@ -46,17 +46,6 @@ static int __init setup_noreplace_smp(ch + } + __setup("noreplace-smp", setup_noreplace_smp); + +-#ifdef CONFIG_PARAVIRT +-static int __initdata_or_module noreplace_paravirt = 0; +- +-static int __init setup_noreplace_paravirt(char *str) +-{ +- noreplace_paravirt = 1; +- return 1; +-} +-__setup("noreplace-paravirt", setup_noreplace_paravirt); +-#endif +- + #define DPRINTK(fmt, args...) \ + do { \ + if (debug_alternative) \ +@@ -599,9 +588,6 @@ void __init_or_module apply_paravirt(str + struct paravirt_patch_site *p; + char insnbuf[MAX_PATCH_LEN]; + +- if (noreplace_paravirt) +- return; +- + for (p = start; p < end; p++) { + unsigned int used; + diff --git a/queue-4.14/x86pti_Mark_constant_arrays_as___initconst.patch b/queue-4.14/x86pti_Mark_constant_arrays_as___initconst.patch new file mode 100644 index 00000000000..de25ad3e0ff --- /dev/null +++ b/queue-4.14/x86pti_Mark_constant_arrays_as___initconst.patch @@ -0,0 +1,52 @@ +Subject: x86/pti: Mark constant arrays as __initconst +From: Arnd Bergmann arnd@arndb.de +Date: Fri Feb 2 22:39:23 2018 +0100 + +From: Arnd Bergmann arnd@arndb.de + +commit 4bf5d56d429cbc96c23d809a08f63cd29e1a702e + +I'm seeing build failures from the two newly introduced arrays that +are marked 'const' and '__initdata', which are mutually exclusive: + +arch/x86/kernel/cpu/common.c:882:43: error: 'cpu_no_speculation' causes a section type conflict with 'e820_table_firmware_init' +arch/x86/kernel/cpu/common.c:895:43: error: 'cpu_no_meltdown' causes a section type conflict with 'e820_table_firmware_init' + +The correct annotation is __initconst. + +Fixes: fec9434a12f3 ("x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown") +Signed-off-by: Arnd Bergmann +Signed-off-by: Thomas Gleixner +Cc: Ricardo Neri +Cc: Andy Lutomirski +Cc: Borislav Petkov +Cc: Thomas Garnier +Cc: David Woodhouse +Link: https://lkml.kernel.org/r/20180202213959.611210-1-arnd@arndb.de +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/kernel/cpu/common.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/arch/x86/kernel/cpu/common.c ++++ b/arch/x86/kernel/cpu/common.c +@@ -876,7 +876,7 @@ static void identify_cpu_without_cpuid(s + #endif + } + +-static const __initdata struct x86_cpu_id cpu_no_speculation[] = { ++static const __initconst struct x86_cpu_id cpu_no_speculation[] = { + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CEDARVIEW, X86_FEATURE_ANY }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CLOVERVIEW, X86_FEATURE_ANY }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_LINCROFT, X86_FEATURE_ANY }, +@@ -889,7 +889,7 @@ static const __initdata struct x86_cpu_i + {} + }; + +-static const __initdata struct x86_cpu_id cpu_no_meltdown[] = { ++static const __initconst struct x86_cpu_id cpu_no_meltdown[] = { + { X86_VENDOR_AMD }, + {} + }; diff --git a/queue-4.14/x86retpoline_Avoid_retpolines_for_built-in___init_functions.patch b/queue-4.14/x86retpoline_Avoid_retpolines_for_built-in___init_functions.patch new file mode 100644 index 00000000000..7175fecdf3c --- /dev/null +++ b/queue-4.14/x86retpoline_Avoid_retpolines_for_built-in___init_functions.patch @@ -0,0 +1,50 @@ +Subject: x86/retpoline: Avoid retpolines for built-in __init functions +From: David Woodhouse dwmw@amazon.co.uk +Date: Thu Feb 1 11:27:20 2018 +0000 + +From: David Woodhouse dwmw@amazon.co.uk + +commit 66f793099a636862a71c59d4a6ba91387b155e0c + +There's no point in building init code with retpolines, since it runs before +any potentially hostile userspace does. And before the retpoline is actually +ALTERNATIVEd into place, for much of it. + +Signed-off-by: David Woodhouse +Signed-off-by: Thomas Gleixner +Cc: karahmed@amazon.de +Cc: peterz@infradead.org +Cc: bp@alien8.de +Link: https://lkml.kernel.org/r/1517484441-1420-2-git-send-email-dwmw@amazon.co.uk +Signed-off-by: Greg Kroah-Hartman + + +--- + include/linux/init.h | 9 ++++++++- + 1 file changed, 8 insertions(+), 1 deletion(-) + +--- a/include/linux/init.h ++++ b/include/linux/init.h +@@ -5,6 +5,13 @@ + #include + #include + ++/* Built-in __init functions needn't be compiled with retpoline */ ++#if defined(RETPOLINE) && !defined(MODULE) ++#define __noretpoline __attribute__((indirect_branch("keep"))) ++#else ++#define __noretpoline ++#endif ++ + /* These macros are used to mark some functions or + * initialized data (doesn't apply to uninitialized data) + * as `initialization' functions. The kernel can take this +@@ -40,7 +47,7 @@ + + /* These are for everybody (although not all archs will actually + discard it in modules) */ +-#define __init __section(.init.text) __cold __inittrace __latent_entropy ++#define __init __section(.init.text) __cold __inittrace __latent_entropy __noretpoline + #define __initdata __section(.init.data) + #define __initconst __section(.init.rodata) + #define __exitdata __section(.exit.data) diff --git a/queue-4.14/x86spectre_Check_CONFIG_RETPOLINE_in_command_line_parser.patch b/queue-4.14/x86spectre_Check_CONFIG_RETPOLINE_in_command_line_parser.patch new file mode 100644 index 00000000000..9070a8c4b05 --- /dev/null +++ b/queue-4.14/x86spectre_Check_CONFIG_RETPOLINE_in_command_line_parser.patch @@ -0,0 +1,49 @@ +Subject: x86/spectre: Check CONFIG_RETPOLINE in command line parser +From: Dou Liyang douly.fnst@cn.fujitsu.com +Date: Tue Jan 30 14:13:50 2018 +0800 + +From: Dou Liyang douly.fnst@cn.fujitsu.com + +commit 9471eee9186a46893726e22ebb54cade3f9bc043 + +The spectre_v2 option 'auto' does not check whether CONFIG_RETPOLINE is +enabled. As a consequence it fails to emit the appropriate warning and sets +feature flags which have no effect at all. + +Add the missing IS_ENABLED() check. + +Fixes: da285121560e ("x86/spectre: Add boot time option to select Spectre v2 mitigation") +Signed-off-by: Dou Liyang +Signed-off-by: Thomas Gleixner +Cc: ak@linux.intel.com +Cc: peterz@infradead.org +Cc: Tomohiro" +Cc: dave.hansen@intel.com +Cc: bp@alien8.de +Cc: arjan@linux.intel.com +Cc: dwmw@amazon.co.uk +Cc: stable@vger.kernel.org +Link: https://lkml.kernel.org/r/f5892721-7528-3647-08fb-f8d10e65ad87@cn.fujitsu.com +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/kernel/cpu/bugs.c | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +--- a/arch/x86/kernel/cpu/bugs.c ++++ b/arch/x86/kernel/cpu/bugs.c +@@ -213,10 +213,10 @@ static void __init spectre_v2_select_mit + return; + + case SPECTRE_V2_CMD_FORCE: +- /* FALLTRHU */ + case SPECTRE_V2_CMD_AUTO: +- goto retpoline_auto; +- ++ if (IS_ENABLED(CONFIG_RETPOLINE)) ++ goto retpoline_auto; ++ break; + case SPECTRE_V2_CMD_RETPOLINE_AMD: + if (IS_ENABLED(CONFIG_RETPOLINE)) + goto retpoline_amd; diff --git a/queue-4.14/x86spectre_Fix_spelling_mistake_vunerable-_vulnerable.patch b/queue-4.14/x86spectre_Fix_spelling_mistake_vunerable-_vulnerable.patch new file mode 100644 index 00000000000..6a86c2b67b4 --- /dev/null +++ b/queue-4.14/x86spectre_Fix_spelling_mistake_vunerable-_vulnerable.patch @@ -0,0 +1,36 @@ +Subject: x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable" +From: Colin Ian King colin.king@canonical.com +Date: Tue Jan 30 19:32:18 2018 +0000 + +From: Colin Ian King colin.king@canonical.com + +commit e698dcdfcda41efd0984de539767b4cddd235f1e + +Trivial fix to spelling mistake in pr_err error message text. + +Signed-off-by: Colin Ian King +Signed-off-by: Thomas Gleixner +Cc: Andi Kleen +Cc: Greg Kroah-Hartman +Cc: kernel-janitors@vger.kernel.org +Cc: Andy Lutomirski +Cc: Borislav Petkov +Cc: David Woodhouse +Link: https://lkml.kernel.org/r/20180130193218.9271-1-colin.king@canonical.com +Signed-off-by: Greg Kroah-Hartman + +--- + arch/x86/kernel/cpu/bugs.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/arch/x86/kernel/cpu/bugs.c ++++ b/arch/x86/kernel/cpu/bugs.c +@@ -103,7 +103,7 @@ bool retpoline_module_ok(bool has_retpol + if (spectre_v2_enabled == SPECTRE_V2_NONE || has_retpoline) + return true; + +- pr_err("System may be vunerable to spectre v2\n"); ++ pr_err("System may be vulnerable to spectre v2\n"); + spectre_v2_bad_module = true; + return false; + } diff --git a/queue-4.14/x86spectre_Report_get_user_mitigation_for_spectre_v1.patch b/queue-4.14/x86spectre_Report_get_user_mitigation_for_spectre_v1.patch new file mode 100644 index 00000000000..e0e8d8b7de0 --- /dev/null +++ b/queue-4.14/x86spectre_Report_get_user_mitigation_for_spectre_v1.patch @@ -0,0 +1,40 @@ +Subject: x86/spectre: Report get_user mitigation for spectre_v1 +From: Dan Williams dan.j.williams@intel.com +Date: Mon Jan 29 17:03:21 2018 -0800 + +From: Dan Williams dan.j.williams@intel.com + +commit edfbae53dab8348fca778531be9f4855d2ca0360 + +Reflect the presence of get_user(), __get_user(), and 'syscall' protections +in sysfs. The expectation is that new and better tooling will allow the +kernel to grow more usages of array_index_nospec(), for now, only claim +mitigation for __user pointer de-references. + +Reported-by: Jiri Slaby +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Cc: linux-arch@vger.kernel.org +Cc: kernel-hardening@lists.openwall.com +Cc: gregkh@linuxfoundation.org +Cc: torvalds@linux-foundation.org +Cc: alan@linux.intel.com +Link: https://lkml.kernel.org/r/151727420158.33451.11658324346540434635.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/kernel/cpu/bugs.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/arch/x86/kernel/cpu/bugs.c ++++ b/arch/x86/kernel/cpu/bugs.c +@@ -297,7 +297,7 @@ ssize_t cpu_show_spectre_v1(struct devic + { + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1)) + return sprintf(buf, "Not affected\n"); +- return sprintf(buf, "Vulnerable\n"); ++ return sprintf(buf, "Mitigation: __user pointer sanitization\n"); + } + + ssize_t cpu_show_spectre_v2(struct device *dev, diff --git a/queue-4.14/x86spectre_Simplify_spectre_v2_command_line_parsing.patch b/queue-4.14/x86spectre_Simplify_spectre_v2_command_line_parsing.patch new file mode 100644 index 00000000000..764b4e18626 --- /dev/null +++ b/queue-4.14/x86spectre_Simplify_spectre_v2_command_line_parsing.patch @@ -0,0 +1,137 @@ +Subject: x86/spectre: Simplify spectre_v2 command line parsing +From: KarimAllah Ahmed karahmed@amazon.de +Date: Thu Feb 1 11:27:21 2018 +0000 + +From: KarimAllah Ahmed karahmed@amazon.de + +commit 9005c6834c0ffdfe46afa76656bd9276cca864f6 + +[dwmw2: Use ARRAY_SIZE] + +Signed-off-by: KarimAllah Ahmed +Signed-off-by: David Woodhouse +Signed-off-by: Thomas Gleixner +Cc: peterz@infradead.org +Cc: bp@alien8.de +Link: https://lkml.kernel.org/r/1517484441-1420-3-git-send-email-dwmw@amazon.co.uk +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/kernel/cpu/bugs.c | 84 +++++++++++++++++++++++++++++---------------- + 1 file changed, 55 insertions(+), 29 deletions(-) + +--- a/arch/x86/kernel/cpu/bugs.c ++++ b/arch/x86/kernel/cpu/bugs.c +@@ -119,13 +119,13 @@ static inline const char *spectre_v2_mod + static void __init spec2_print_if_insecure(const char *reason) + { + if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) +- pr_info("%s\n", reason); ++ pr_info("%s selected on command line.\n", reason); + } + + static void __init spec2_print_if_secure(const char *reason) + { + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) +- pr_info("%s\n", reason); ++ pr_info("%s selected on command line.\n", reason); + } + + static inline bool retp_compiler(void) +@@ -140,42 +140,68 @@ static inline bool match_option(const ch + return len == arglen && !strncmp(arg, opt, len); + } + ++static const struct { ++ const char *option; ++ enum spectre_v2_mitigation_cmd cmd; ++ bool secure; ++} mitigation_options[] = { ++ { "off", SPECTRE_V2_CMD_NONE, false }, ++ { "on", SPECTRE_V2_CMD_FORCE, true }, ++ { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false }, ++ { "retpoline,amd", SPECTRE_V2_CMD_RETPOLINE_AMD, false }, ++ { "retpoline,generic", SPECTRE_V2_CMD_RETPOLINE_GENERIC, false }, ++ { "auto", SPECTRE_V2_CMD_AUTO, false }, ++}; ++ + static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) + { + char arg[20]; +- int ret; ++ int ret, i; ++ enum spectre_v2_mitigation_cmd cmd = SPECTRE_V2_CMD_AUTO; ++ ++ if (cmdline_find_option_bool(boot_command_line, "nospectre_v2")) ++ return SPECTRE_V2_CMD_NONE; ++ else { ++ ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, ++ sizeof(arg)); ++ if (ret < 0) ++ return SPECTRE_V2_CMD_AUTO; + +- ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, +- sizeof(arg)); +- if (ret > 0) { +- if (match_option(arg, ret, "off")) { +- goto disable; +- } else if (match_option(arg, ret, "on")) { +- spec2_print_if_secure("force enabled on command line."); +- return SPECTRE_V2_CMD_FORCE; +- } else if (match_option(arg, ret, "retpoline")) { +- spec2_print_if_insecure("retpoline selected on command line."); +- return SPECTRE_V2_CMD_RETPOLINE; +- } else if (match_option(arg, ret, "retpoline,amd")) { +- if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) { +- pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n"); +- return SPECTRE_V2_CMD_AUTO; +- } +- spec2_print_if_insecure("AMD retpoline selected on command line."); +- return SPECTRE_V2_CMD_RETPOLINE_AMD; +- } else if (match_option(arg, ret, "retpoline,generic")) { +- spec2_print_if_insecure("generic retpoline selected on command line."); +- return SPECTRE_V2_CMD_RETPOLINE_GENERIC; +- } else if (match_option(arg, ret, "auto")) { ++ for (i = 0; i < ARRAY_SIZE(mitigation_options); i++) { ++ if (!match_option(arg, ret, mitigation_options[i].option)) ++ continue; ++ cmd = mitigation_options[i].cmd; ++ break; ++ } ++ ++ if (i >= ARRAY_SIZE(mitigation_options)) { ++ pr_err("unknown option (%s). Switching to AUTO select\n", ++ mitigation_options[i].option); + return SPECTRE_V2_CMD_AUTO; + } + } + +- if (!cmdline_find_option_bool(boot_command_line, "nospectre_v2")) ++ if ((cmd == SPECTRE_V2_CMD_RETPOLINE || ++ cmd == SPECTRE_V2_CMD_RETPOLINE_AMD || ++ cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) && ++ !IS_ENABLED(CONFIG_RETPOLINE)) { ++ pr_err("%s selected but not compiled in. Switching to AUTO select\n", ++ mitigation_options[i].option); + return SPECTRE_V2_CMD_AUTO; +-disable: +- spec2_print_if_insecure("disabled on command line."); +- return SPECTRE_V2_CMD_NONE; ++ } ++ ++ if (cmd == SPECTRE_V2_CMD_RETPOLINE_AMD && ++ boot_cpu_data.x86_vendor != X86_VENDOR_AMD) { ++ pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n"); ++ return SPECTRE_V2_CMD_AUTO; ++ } ++ ++ if (mitigation_options[i].secure) ++ spec2_print_if_secure(mitigation_options[i].option); ++ else ++ spec2_print_if_insecure(mitigation_options[i].option); ++ ++ return cmd; + } + + /* Check for Skylake-like CPUs (for RSB handling) */ diff --git a/queue-4.14/x86speculation_Fix_typo_IBRS_ATT_which_should_be_IBRS_ALL.patch b/queue-4.14/x86speculation_Fix_typo_IBRS_ATT_which_should_be_IBRS_ALL.patch new file mode 100644 index 00000000000..1a5586e8ea9 --- /dev/null +++ b/queue-4.14/x86speculation_Fix_typo_IBRS_ATT_which_should_be_IBRS_ALL.patch @@ -0,0 +1,37 @@ +Subject: x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL +From: Darren Kenny darren.kenny@oracle.com +Date: Fri Feb 2 19:12:20 2018 +0000 + +From: Darren Kenny darren.kenny@oracle.com + +commit af189c95a371b59f493dbe0f50c0a09724868881 + +Fixes: 117cc7a908c83 ("x86/retpoline: Fill return stack buffer on vmexit") +Signed-off-by: Darren Kenny +Signed-off-by: Thomas Gleixner +Reviewed-by: Konrad Rzeszutek Wilk +Cc: Tom Lendacky +Cc: Andi Kleen +Cc: Borislav Petkov +Cc: Masami Hiramatsu +Cc: Arjan van de Ven +Cc: David Woodhouse +Link: https://lkml.kernel.org/r/20180202191220.blvgkgutojecxr3b@starbug-vm.ie.oracle.com +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/include/asm/nospec-branch.h | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/arch/x86/include/asm/nospec-branch.h ++++ b/arch/x86/include/asm/nospec-branch.h +@@ -150,7 +150,7 @@ extern char __indirect_thunk_end[]; + * On VMEXIT we must ensure that no RSB predictions learned in the guest + * can be followed in the host, by overwriting the RSB completely. Both + * retpoline and IBRS mitigations for Spectre v2 need this; only on future +- * CPUs with IBRS_ATT *might* it be avoided. ++ * CPUs with IBRS_ALL *might* it be avoided. + */ + static inline void vmexit_fill_RSB(void) + { diff --git a/queue-4.14/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch b/queue-4.14/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch new file mode 100644 index 00000000000..7565edf7b6d --- /dev/null +++ b/queue-4.14/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch @@ -0,0 +1,137 @@ +Subject: x86/speculation: Use Indirect Branch Prediction Barrier in context switch +From: Tim Chen tim.c.chen@linux.intel.com +Date: Mon Jan 29 22:04:47 2018 +0000 + +From: Tim Chen tim.c.chen@linux.intel.com + +commit 18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7 + +Flush indirect branches when switching into a process that marked itself +non dumpable. This protects high value processes like gpg better, +without having too high performance overhead. + +If done naïvely, we could switch to a kernel idle thread and then back +to the original process, such as: + + process A -> idle -> process A + +In such scenario, we do not have to do IBPB here even though the process +is non-dumpable, as we are switching back to the same process after a +hiatus. + +To avoid the redundant IBPB, which is expensive, we track the last mm +user context ID. The cost is to have an extra u64 mm context id to track +the last mm we were using before switching to the init_mm used by idle. +Avoiding the extra IBPB is probably worth the extra memory for this +common scenario. + +For those cases where tlb_defer_switch_to_init_mm() returns true (non +PCID), lazy tlb will defer switch to init_mm, so we will not be changing +the mm for the process A -> idle -> process A switch. So IBPB will be +skipped for this case. + +Thanks to the reviewers and Andy Lutomirski for the suggestion of +using ctx_id which got rid of the problem of mm pointer recycling. + +Signed-off-by: Tim Chen +Signed-off-by: David Woodhouse +Signed-off-by: Thomas Gleixner +Cc: ak@linux.intel.com +Cc: karahmed@amazon.de +Cc: arjan@linux.intel.com +Cc: torvalds@linux-foundation.org +Cc: linux@dominikbrodowski.net +Cc: peterz@infradead.org +Cc: bp@alien8.de +Cc: luto@kernel.org +Cc: pbonzini@redhat.com +Cc: gregkh@linux-foundation.org +Link: https://lkml.kernel.org/r/1517263487-3708-1-git-send-email-dwmw@amazon.co.uk +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/include/asm/tlbflush.h | 2 ++ + arch/x86/mm/tlb.c | 33 ++++++++++++++++++++++++++++++++- + 2 files changed, 34 insertions(+), 1 deletion(-) + +--- a/arch/x86/include/asm/tlbflush.h ++++ b/arch/x86/include/asm/tlbflush.h +@@ -174,6 +174,8 @@ struct tlb_state { + struct mm_struct *loaded_mm; + u16 loaded_mm_asid; + u16 next_asid; ++ /* last user mm's ctx id */ ++ u64 last_ctx_id; + + /* + * We can be in one of several states: +--- a/arch/x86/mm/tlb.c ++++ b/arch/x86/mm/tlb.c +@@ -6,13 +6,14 @@ + #include + #include + #include ++#include + + #include + #include ++#include + #include + #include + #include +-#include + + /* + * TLB flushing, formerly SMP-only +@@ -247,6 +248,27 @@ void switch_mm_irqs_off(struct mm_struct + } else { + u16 new_asid; + bool need_flush; ++ u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id); ++ ++ /* ++ * Avoid user/user BTB poisoning by flushing the branch ++ * predictor when switching between processes. This stops ++ * one process from doing Spectre-v2 attacks on another. ++ * ++ * As an optimization, flush indirect branches only when ++ * switching into processes that disable dumping. This ++ * protects high value processes like gpg, without having ++ * too high performance overhead. IBPB is *expensive*! ++ * ++ * This will not flush branches when switching into kernel ++ * threads. It will also not flush if we switch to idle ++ * thread and back to the same process. It will flush if we ++ * switch to a different non-dumpable process. ++ */ ++ if (tsk && tsk->mm && ++ tsk->mm->context.ctx_id != last_ctx_id && ++ get_dumpable(tsk->mm) != SUID_DUMP_USER) ++ indirect_branch_prediction_barrier(); + + if (IS_ENABLED(CONFIG_VMAP_STACK)) { + /* +@@ -292,6 +314,14 @@ void switch_mm_irqs_off(struct mm_struct + trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0); + } + ++ /* ++ * Record last user mm's context id, so we can avoid ++ * flushing branch buffer with IBPB if we switch back ++ * to the same user. ++ */ ++ if (next != &init_mm) ++ this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id); ++ + this_cpu_write(cpu_tlbstate.loaded_mm, next); + this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid); + } +@@ -369,6 +399,7 @@ void initialize_tlbstate_and_flush(void) + write_cr3(build_cr3(mm->pgd, 0)); + + /* Reinitialize tlbstate. */ ++ this_cpu_write(cpu_tlbstate.last_ctx_id, mm->context.ctx_id); + this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0); + this_cpu_write(cpu_tlbstate.next_asid, 1); + this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id); diff --git a/queue-4.14/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch b/queue-4.14/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch new file mode 100644 index 00000000000..dcc1025facd --- /dev/null +++ b/queue-4.14/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch @@ -0,0 +1,60 @@ +Subject: x86/syscall: Sanitize syscall table de-references under speculation +From: Dan Williams dan.j.williams@intel.com +Date: Mon Jan 29 17:02:59 2018 -0800 + +From: Dan Williams dan.j.williams@intel.com + +commit 2fbd7af5af8665d18bcefae3e9700be07e22b681 + +The syscall table base is a user controlled function pointer in kernel +space. Use array_index_nospec() to prevent any out of bounds speculation. + +While retpoline prevents speculating into a userspace directed target it +does not stop the pointer de-reference, the concern is leaking memory +relative to the syscall table base, by observing instruction cache +behavior. + +Reported-by: Linus Torvalds +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Cc: linux-arch@vger.kernel.org +Cc: kernel-hardening@lists.openwall.com +Cc: gregkh@linuxfoundation.org +Cc: Andy Lutomirski +Cc: alan@linux.intel.com +Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/entry/common.c | 5 ++++- + 1 file changed, 4 insertions(+), 1 deletion(-) + +--- a/arch/x86/entry/common.c ++++ b/arch/x86/entry/common.c +@@ -21,6 +21,7 @@ + #include + #include + #include ++#include + #include + #include + #include +@@ -284,7 +285,8 @@ __visible void do_syscall_64(struct pt_r + * regs->orig_ax, which changes the behavior of some syscalls. + */ + if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) { +- regs->ax = sys_call_table[nr & __SYSCALL_MASK]( ++ nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls); ++ regs->ax = sys_call_table[nr]( + regs->di, regs->si, regs->dx, + regs->r10, regs->r8, regs->r9); + } +@@ -320,6 +322,7 @@ static __always_inline void do_syscall_3 + } + + if (likely(nr < IA32_NR_syscalls)) { ++ nr = array_index_nospec(nr, IA32_NR_syscalls); + /* + * It's possible that a 32-bit syscall implementation + * takes a 64-bit parameter but nonetheless assumes that diff --git a/queue-4.14/x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch b/queue-4.14/x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch new file mode 100644 index 00000000000..1bdc2c073b4 --- /dev/null +++ b/queue-4.14/x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch @@ -0,0 +1,169 @@ +Subject: x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec +From: Dan Williams dan.j.williams@intel.com +Date: Mon Jan 29 17:02:49 2018 -0800 + +From: Dan Williams dan.j.williams@intel.com + +commit 304ec1b050310548db33063e567123fae8fd0301 + +Quoting Linus: + + I do think that it would be a good idea to very expressly document + the fact that it's not that the user access itself is unsafe. I do + agree that things like "get_user()" want to be protected, but not + because of any direct bugs or problems with get_user() and friends, + but simply because get_user() is an excellent source of a pointer + that is obviously controlled from a potentially attacking user + space. So it's a prime candidate for then finding _subsequent_ + accesses that can then be used to perturb the cache. + +__uaccess_begin_nospec() covers __get_user() and copy_from_iter() where the +limit check is far away from the user pointer de-reference. In those cases +a barrier_nospec() prevents speculation with a potential pointer to +privileged memory. uaccess_try_nospec covers get_user_try. + +Suggested-by: Linus Torvalds +Suggested-by: Andi Kleen +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Cc: linux-arch@vger.kernel.org +Cc: Kees Cook +Cc: kernel-hardening@lists.openwall.com +Cc: gregkh@linuxfoundation.org +Cc: Al Viro +Cc: alan@linux.intel.com +Link: https://lkml.kernel.org/r/151727416953.33451.10508284228526170604.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/include/asm/uaccess.h | 6 +++--- + arch/x86/include/asm/uaccess_32.h | 6 +++--- + arch/x86/include/asm/uaccess_64.h | 12 ++++++------ + arch/x86/lib/usercopy_32.c | 4 ++-- + 4 files changed, 14 insertions(+), 14 deletions(-) + +--- a/arch/x86/include/asm/uaccess.h ++++ b/arch/x86/include/asm/uaccess.h +@@ -450,7 +450,7 @@ do { \ + ({ \ + int __gu_err; \ + __inttype(*(ptr)) __gu_val; \ +- __uaccess_begin(); \ ++ __uaccess_begin_nospec(); \ + __get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT); \ + __uaccess_end(); \ + (x) = (__force __typeof__(*(ptr)))__gu_val; \ +@@ -557,7 +557,7 @@ struct __large_struct { unsigned long bu + * get_user_ex(...); + * } get_user_catch(err) + */ +-#define get_user_try uaccess_try ++#define get_user_try uaccess_try_nospec + #define get_user_catch(err) uaccess_catch(err) + + #define get_user_ex(x, ptr) do { \ +@@ -591,7 +591,7 @@ extern void __cmpxchg_wrong_size(void) + __typeof__(ptr) __uval = (uval); \ + __typeof__(*(ptr)) __old = (old); \ + __typeof__(*(ptr)) __new = (new); \ +- __uaccess_begin(); \ ++ __uaccess_begin_nospec(); \ + switch (size) { \ + case 1: \ + { \ +--- a/arch/x86/include/asm/uaccess_32.h ++++ b/arch/x86/include/asm/uaccess_32.h +@@ -29,21 +29,21 @@ raw_copy_from_user(void *to, const void + switch (n) { + case 1: + ret = 0; +- __uaccess_begin(); ++ __uaccess_begin_nospec(); + __get_user_asm_nozero(*(u8 *)to, from, ret, + "b", "b", "=q", 1); + __uaccess_end(); + return ret; + case 2: + ret = 0; +- __uaccess_begin(); ++ __uaccess_begin_nospec(); + __get_user_asm_nozero(*(u16 *)to, from, ret, + "w", "w", "=r", 2); + __uaccess_end(); + return ret; + case 4: + ret = 0; +- __uaccess_begin(); ++ __uaccess_begin_nospec(); + __get_user_asm_nozero(*(u32 *)to, from, ret, + "l", "k", "=r", 4); + __uaccess_end(); +--- a/arch/x86/include/asm/uaccess_64.h ++++ b/arch/x86/include/asm/uaccess_64.h +@@ -55,31 +55,31 @@ raw_copy_from_user(void *dst, const void + return copy_user_generic(dst, (__force void *)src, size); + switch (size) { + case 1: +- __uaccess_begin(); ++ __uaccess_begin_nospec(); + __get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src, + ret, "b", "b", "=q", 1); + __uaccess_end(); + return ret; + case 2: +- __uaccess_begin(); ++ __uaccess_begin_nospec(); + __get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src, + ret, "w", "w", "=r", 2); + __uaccess_end(); + return ret; + case 4: +- __uaccess_begin(); ++ __uaccess_begin_nospec(); + __get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src, + ret, "l", "k", "=r", 4); + __uaccess_end(); + return ret; + case 8: +- __uaccess_begin(); ++ __uaccess_begin_nospec(); + __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src, + ret, "q", "", "=r", 8); + __uaccess_end(); + return ret; + case 10: +- __uaccess_begin(); ++ __uaccess_begin_nospec(); + __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src, + ret, "q", "", "=r", 10); + if (likely(!ret)) +@@ -89,7 +89,7 @@ raw_copy_from_user(void *dst, const void + __uaccess_end(); + return ret; + case 16: +- __uaccess_begin(); ++ __uaccess_begin_nospec(); + __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src, + ret, "q", "", "=r", 16); + if (likely(!ret)) +--- a/arch/x86/lib/usercopy_32.c ++++ b/arch/x86/lib/usercopy_32.c +@@ -331,7 +331,7 @@ do { \ + + unsigned long __copy_user_ll(void *to, const void *from, unsigned long n) + { +- __uaccess_begin(); ++ __uaccess_begin_nospec(); + if (movsl_is_ok(to, from, n)) + __copy_user(to, from, n); + else +@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll); + unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from, + unsigned long n) + { +- __uaccess_begin(); ++ __uaccess_begin_nospec(); + #ifdef CONFIG_X86_INTEL_USERCOPY + if (n > 64 && static_cpu_has(X86_FEATURE_XMM2)) + n = __copy_user_intel_nocache(to, from, n); diff --git a/queue-4.14/x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch b/queue-4.14/x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch new file mode 100644 index 00000000000..68a6fcec4ab --- /dev/null +++ b/queue-4.14/x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch @@ -0,0 +1,69 @@ +Subject: x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end} +From: Dan Williams dan.j.williams@intel.com +Date: Mon Jan 29 17:02:44 2018 -0800 + +From: Dan Williams dan.j.williams@intel.com + +commit b5c4ae4f35325d520b230bab6eb3310613b72ac1 + +In preparation for converting some __uaccess_begin() instances to +__uacess_begin_nospec(), make sure all 'from user' uaccess paths are +using the _begin(), _end() helpers rather than open-coded stac() and +clac(). + +No functional changes. + +Suggested-by: Ingo Molnar +Signed-off-by: Dan Williams +Signed-off-by: Thomas Gleixner +Cc: linux-arch@vger.kernel.org +Cc: Tom Lendacky +Cc: Kees Cook +Cc: kernel-hardening@lists.openwall.com +Cc: gregkh@linuxfoundation.org +Cc: Al Viro +Cc: torvalds@linux-foundation.org +Cc: alan@linux.intel.com +Link: https://lkml.kernel.org/r/151727416438.33451.17309465232057176966.stgit@dwillia2-desk3.amr.corp.intel.com +Signed-off-by: Greg Kroah-Hartman + + +--- + arch/x86/lib/usercopy_32.c | 8 ++++---- + 1 file changed, 4 insertions(+), 4 deletions(-) + +--- a/arch/x86/lib/usercopy_32.c ++++ b/arch/x86/lib/usercopy_32.c +@@ -331,12 +331,12 @@ do { \ + + unsigned long __copy_user_ll(void *to, const void *from, unsigned long n) + { +- stac(); ++ __uaccess_begin(); + if (movsl_is_ok(to, from, n)) + __copy_user(to, from, n); + else + n = __copy_user_intel(to, from, n); +- clac(); ++ __uaccess_end(); + return n; + } + EXPORT_SYMBOL(__copy_user_ll); +@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll); + unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from, + unsigned long n) + { +- stac(); ++ __uaccess_begin(); + #ifdef CONFIG_X86_INTEL_USERCOPY + if (n > 64 && static_cpu_has(X86_FEATURE_XMM2)) + n = __copy_user_intel_nocache(to, from, n); +@@ -353,7 +353,7 @@ unsigned long __copy_from_user_ll_nocach + #else + __copy_user(to, from, n); + #endif +- clac(); ++ __uaccess_end(); + return n; + } + EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);