--- /dev/null
+Subject: Documentation: Document array_index_nospec
+From: Mark Rutland mark.rutland@arm.com
+Date: Mon Jan 29 17:02:16 2018 -0800
+
+From: Mark Rutland mark.rutland@arm.com
+
+commit f84a56f73dddaeac1dba8045b007f742f61cd2da
+
+Document the rationale and usage of the new array_index_nospec() helper.
+
+Signed-off-by: Mark Rutland <mark.rutland@arm.com>
+Signed-off-by: Will Deacon <will.deacon@arm.com>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Kees Cook <keescook@chromium.org>
+Cc: linux-arch@vger.kernel.org
+Cc: Jonathan Corbet <corbet@lwn.net>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: gregkh@linuxfoundation.org
+Cc: kernel-hardening@lists.openwall.com
+Cc: torvalds@linux-foundation.org
+Cc: alan@linux.intel.com
+Link: https://lkml.kernel.org/r/151727413645.33451.15878817161436755393.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ Documentation/speculation.txt |   90 ++++++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 90 insertions(+)
+
+--- /dev/null
++++ b/Documentation/speculation.txt
+@@ -0,0 +1,90 @@
++This document explains potential effects of speculation, and how undesirable
++effects can be mitigated portably using common APIs.
++
++===========
++Speculation
++===========
++
++To improve performance and minimize average latencies, many contemporary CPUs
++employ speculative execution techniques such as branch prediction, performing
++work which may be discarded at a later stage.
++
++Typically speculative execution cannot be observed from architectural state,
++such as the contents of registers. However, in some cases it is possible to
++observe its impact on microarchitectural state, such as the presence or
++absence of data in caches. Such state may form side-channels which can be
++observed to extract secret information.
++
++For example, in the presence of branch prediction, it is possible for bounds
++checks to be ignored by code which is speculatively executed. Consider the
++following code:
++
++      int load_array(int *array, unsigned int index)
++      {
++              if (index >= MAX_ARRAY_ELEMS)
++                      return 0;
++              else
++                      return array[index];
++      }
++
++Which, on arm64, may be compiled to an assembly sequence such as:
++
++      CMP     <index>, #MAX_ARRAY_ELEMS
++      B.LT    less
++      MOV     <returnval>, #0
++      RET
++  less:
++      LDR     <returnval>, [<array>, <index>]
++      RET
++
++It is possible that a CPU mis-predicts the conditional branch, and
++speculatively loads array[index], even if index >= MAX_ARRAY_ELEMS. This
++value will subsequently be discarded, but the speculated load may affect
++microarchitectural state which can be subsequently measured.
++
++More complex sequences involving multiple dependent memory accesses may
++result in sensitive information being leaked. Consider the following
++code, building on the prior example:
++
++      int load_dependent_arrays(int *arr1, int *arr2, int index)
++      {
++              int val1, val2,
++
++              val1 = load_array(arr1, index);
++              val2 = load_array(arr2, val1);
++
++              return val2;
++      }
++
++Under speculation, the first call to load_array() may return the value
++of an out-of-bounds address, while the second call will influence
++microarchitectural state dependent on this value. This may provide an
++arbitrary read primitive.
++
++====================================
++Mitigating speculation side-channels
++====================================
++
++The kernel provides a generic API to ensure that bounds checks are
++respected even under speculation. Architectures which are affected by
++speculation-based side-channels are expected to implement these
++primitives.
++
++The array_index_nospec() helper in <linux/nospec.h> can be used to
++prevent information from being leaked via side-channels.
++
++A call to array_index_nospec(index, size) returns a sanitized index
++value that is bounded to [0, size) even under cpu speculation
++conditions.
++
++This can be used to protect the earlier load_array() example:
++
++      int load_array(int *array, unsigned int index)
++      {
++              if (index >= MAX_ARRAY_ELEMS)
++                      return 0;
++              else {
++                      index = array_index_nospec(index, MAX_ARRAY_ELEMS);
++                      return array[index];
++              }
++      }
 
--- /dev/null
+Subject: KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
+From: KarimAllah Ahmed karahmed@amazon.de
+Date: Sat Feb  3 15:56:23 2018 +0100
+
+From: KarimAllah Ahmed karahmed@amazon.de
+
+commit b2ac58f90540e39324e7a29a7ad471407ae0bf48
+
+[ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ]
+
+... basically doing exactly what we do for VMX:
+
+- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
+- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
+  actually used it.
+
+Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
+Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
+Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+Cc: Andrea Arcangeli <aarcange@redhat.com>
+Cc: Andi Kleen <ak@linux.intel.com>
+Cc: Jun Nakajima <jun.nakajima@intel.com>
+Cc: kvm@vger.kernel.org
+Cc: Dave Hansen <dave.hansen@intel.com>
+Cc: Tim Chen <tim.c.chen@linux.intel.com>
+Cc: Andy Lutomirski <luto@kernel.org>
+Cc: Asit Mallick <asit.k.mallick@intel.com>
+Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
+Cc: Greg KH <gregkh@linuxfoundation.org>
+Cc: Paolo Bonzini <pbonzini@redhat.com>
+Cc: Dan Williams <dan.j.williams@intel.com>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Ashok Raj <ashok.raj@intel.com>
+Link: https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karahmed@amazon.de
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/kvm/svm.c |   88 +++++++++++++++++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 88 insertions(+)
+
+--- a/arch/x86/kvm/svm.c
++++ b/arch/x86/kvm/svm.c
+@@ -184,6 +184,8 @@ struct vcpu_svm {
+               u64 gs_base;
+       } host;
+ 
++      u64 spec_ctrl;
++
+       u32 *msrpm;
+ 
+       ulong nmi_iret_rip;
+@@ -249,6 +251,7 @@ static const struct svm_direct_access_ms
+       { .index = MSR_CSTAR,                           .always = true  },
+       { .index = MSR_SYSCALL_MASK,                    .always = true  },
+ #endif
++      { .index = MSR_IA32_SPEC_CTRL,                  .always = false },
+       { .index = MSR_IA32_PRED_CMD,                   .always = false },
+       { .index = MSR_IA32_LASTBRANCHFROMIP,           .always = false },
+       { .index = MSR_IA32_LASTBRANCHTOIP,             .always = false },
+@@ -882,6 +885,25 @@ static bool valid_msr_intercept(u32 inde
+       return false;
+ }
+ 
++static bool msr_write_intercepted(struct kvm_vcpu *vcpu, unsigned msr)
++{
++      u8 bit_write;
++      unsigned long tmp;
++      u32 offset;
++      u32 *msrpm;
++
++      msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm:
++                                    to_svm(vcpu)->msrpm;
++
++      offset    = svm_msrpm_offset(msr);
++      bit_write = 2 * (msr & 0x0f) + 1;
++      tmp       = msrpm[offset];
++
++      BUG_ON(offset == MSR_INVALID);
++
++      return !!test_bit(bit_write,  &tmp);
++}
++
+ static void set_msr_interception(u32 *msrpm, unsigned msr,
+                                int read, int write)
+ {
+@@ -1587,6 +1609,8 @@ static void svm_vcpu_reset(struct kvm_vc
+       u32 dummy;
+       u32 eax = 1;
+ 
++      svm->spec_ctrl = 0;
++
+       if (!init_event) {
+               svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+                                          MSR_IA32_APICBASE_ENABLE;
+@@ -3591,6 +3615,13 @@ static int svm_get_msr(struct kvm_vcpu *
+       case MSR_VM_CR:
+               msr_info->data = svm->nested.vm_cr_msr;
+               break;
++      case MSR_IA32_SPEC_CTRL:
++              if (!msr_info->host_initiated &&
++                  !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
++                      return 1;
++
++              msr_info->data = svm->spec_ctrl;
++              break;
+       case MSR_IA32_UCODE_REV:
+               msr_info->data = 0x01000065;
+               break;
+@@ -3682,6 +3713,33 @@ static int svm_set_msr(struct kvm_vcpu *
+       case MSR_IA32_TSC:
+               kvm_write_tsc(vcpu, msr);
+               break;
++      case MSR_IA32_SPEC_CTRL:
++              if (!msr->host_initiated &&
++                  !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
++                      return 1;
++
++              /* The STIBP bit doesn't fault even if it's not advertised */
++              if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
++                      return 1;
++
++              svm->spec_ctrl = data;
++
++              if (!data)
++                      break;
++
++              /*
++               * For non-nested:
++               * When it's written (to non-zero) for the first time, pass
++               * it through.
++               *
++               * For nested:
++               * The handling of the MSR bitmap for L2 guests is done in
++               * nested_svm_vmrun_msrpm.
++               * We update the L1 MSR bit as well since it will end up
++               * touching the MSR anyway now.
++               */
++              set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
++              break;
+       case MSR_IA32_PRED_CMD:
+               if (!msr->host_initiated &&
+                   !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
+@@ -4950,6 +5008,15 @@ static void svm_vcpu_run(struct kvm_vcpu
+ 
+       local_irq_enable();
+ 
++      /*
++       * If this vCPU has touched SPEC_CTRL, restore the guest's value if
++       * it's non-zero. Since vmentry is serialising on affected CPUs, there
++       * is no need to worry about the conditional branch over the wrmsr
++       * being speculatively taken.
++       */
++      if (svm->spec_ctrl)
++              wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
++
+       asm volatile (
+               "push %%" _ASM_BP "; \n\t"
+               "mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t"
+@@ -5042,6 +5109,27 @@ static void svm_vcpu_run(struct kvm_vcpu
+ #endif
+               );
+ 
++      /*
++       * We do not use IBRS in the kernel. If this vCPU has used the
++       * SPEC_CTRL MSR it may have left it on; save the value and
++       * turn it off. This is much more efficient than blindly adding
++       * it to the atomic save/restore list. Especially as the former
++       * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
++       *
++       * For non-nested case:
++       * If the L01 MSR bitmap does not intercept the MSR, then we need to
++       * save it.
++       *
++       * For nested case:
++       * If the L02 MSR bitmap does not intercept the MSR, then we need to
++       * save it.
++       */
++      if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
++              rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
++
++      if (svm->spec_ctrl)
++              wrmsrl(MSR_IA32_SPEC_CTRL, 0);
++
+       /* Eliminate branch target predictions from guest mode */
+       vmexit_fill_RSB();
+ 
 
--- /dev/null
+Subject: KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
+From: KarimAllah Ahmed karahmed@amazon.de
+Date: Thu Feb  1 22:59:45 2018 +0100
+
+From: KarimAllah Ahmed karahmed@amazon.de
+
+commit d28b387fb74da95d69d2615732f50cceb38e9a4d
+
+[ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]
+
+Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
+guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
+be using a retpoline+IBPB based approach.
+
+To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
+guests that do not actually use the MSR, only start saving and restoring
+when a non-zero is written to it.
+
+No attempt is made to handle STIBP here, intentionally. Filtering STIBP
+may be added in a future patch, which may require trapping all writes
+if we don't want to pass it through directly to the guest.
+
+[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]
+
+Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
+Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
+Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+Reviewed-by: Jim Mattson <jmattson@google.com>
+Cc: Andrea Arcangeli <aarcange@redhat.com>
+Cc: Andi Kleen <ak@linux.intel.com>
+Cc: Jun Nakajima <jun.nakajima@intel.com>
+Cc: kvm@vger.kernel.org
+Cc: Dave Hansen <dave.hansen@intel.com>
+Cc: Tim Chen <tim.c.chen@linux.intel.com>
+Cc: Andy Lutomirski <luto@kernel.org>
+Cc: Asit Mallick <asit.k.mallick@intel.com>
+Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
+Cc: Greg KH <gregkh@linuxfoundation.org>
+Cc: Paolo Bonzini <pbonzini@redhat.com>
+Cc: Dan Williams <dan.j.williams@intel.com>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Ashok Raj <ashok.raj@intel.com>
+Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon.de
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/kvm/cpuid.c |    9 ++--
+ arch/x86/kvm/vmx.c   |  105 ++++++++++++++++++++++++++++++++++++++++++++++++++-
+ arch/x86/kvm/x86.c   |    2 
+ 3 files changed, 110 insertions(+), 6 deletions(-)
+
+--- a/arch/x86/kvm/cpuid.c
++++ b/arch/x86/kvm/cpuid.c
+@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct
+ 
+       /* cpuid 0x80000008.ebx */
+       const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+-              F(IBPB);
++              F(IBPB) | F(IBRS);
+ 
+       /* cpuid 0xC0000001.edx */
+       const u32 kvm_cpuid_C000_0001_edx_x86_features =
+@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct
+ 
+       /* cpuid 7.0.edx*/
+       const u32 kvm_cpuid_7_0_edx_x86_features =
+-              F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
++              F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
++              F(ARCH_CAPABILITIES);
+ 
+       /* all calls to cpuid_count() should be made on the same cpu */
+       get_cpu();
+@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct
+                       g_phys_as = phys_as;
+               entry->eax = g_phys_as | (virt_as << 8);
+               entry->edx = 0;
+-              /* IBPB isn't necessarily present in hardware cpuid */
++              /* IBRS and IBPB aren't necessarily present in hardware cpuid */
+               if (boot_cpu_has(X86_FEATURE_IBPB))
+                       entry->ebx |= F(IBPB);
++              if (boot_cpu_has(X86_FEATURE_IBRS))
++                      entry->ebx |= F(IBRS);
+               entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
+               cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
+               break;
+--- a/arch/x86/kvm/vmx.c
++++ b/arch/x86/kvm/vmx.c
+@@ -584,6 +584,7 @@ struct vcpu_vmx {
+ #endif
+ 
+       u64                   arch_capabilities;
++      u64                   spec_ctrl;
+ 
+       u32 vm_entry_controls_shadow;
+       u32 vm_exit_controls_shadow;
+@@ -1906,6 +1907,29 @@ static void update_exception_bitmap(stru
+ }
+ 
+ /*
++ * Check if MSR is intercepted for currently loaded MSR bitmap.
++ */
++static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
++{
++      unsigned long *msr_bitmap;
++      int f = sizeof(unsigned long);
++
++      if (!cpu_has_vmx_msr_bitmap())
++              return true;
++
++      msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
++
++      if (msr <= 0x1fff) {
++              return !!test_bit(msr, msr_bitmap + 0x800 / f);
++      } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
++              msr &= 0x1fff;
++              return !!test_bit(msr, msr_bitmap + 0xc00 / f);
++      }
++
++      return true;
++}
++
++/*
+  * Check if MSR is intercepted for L01 MSR bitmap.
+  */
+ static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
+@@ -3259,6 +3283,14 @@ static int vmx_get_msr(struct kvm_vcpu *
+       case MSR_IA32_TSC:
+               msr_info->data = guest_read_tsc(vcpu);
+               break;
++      case MSR_IA32_SPEC_CTRL:
++              if (!msr_info->host_initiated &&
++                  !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
++                  !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
++                      return 1;
++
++              msr_info->data = to_vmx(vcpu)->spec_ctrl;
++              break;
+       case MSR_IA32_ARCH_CAPABILITIES:
+               if (!msr_info->host_initiated &&
+                   !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+@@ -3372,6 +3404,37 @@ static int vmx_set_msr(struct kvm_vcpu *
+       case MSR_IA32_TSC:
+               kvm_write_tsc(vcpu, msr_info);
+               break;
++      case MSR_IA32_SPEC_CTRL:
++              if (!msr_info->host_initiated &&
++                  !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
++                  !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
++                      return 1;
++
++              /* The STIBP bit doesn't fault even if it's not advertised */
++              if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
++                      return 1;
++
++              vmx->spec_ctrl = data;
++
++              if (!data)
++                      break;
++
++              /*
++               * For non-nested:
++               * When it's written (to non-zero) for the first time, pass
++               * it through.
++               *
++               * For nested:
++               * The handling of the MSR bitmap for L2 guests is done in
++               * nested_vmx_merge_msr_bitmap. We should not touch the
++               * vmcs02.msr_bitmap here since it gets completely overwritten
++               * in the merging. We update the vmcs01 here for L1 as well
++               * since it will end up touching the MSR anyway now.
++               */
++              vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
++                                            MSR_IA32_SPEC_CTRL,
++                                            MSR_TYPE_RW);
++              break;
+       case MSR_IA32_PRED_CMD:
+               if (!msr_info->host_initiated &&
+                   !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
+@@ -5697,6 +5760,7 @@ static void vmx_vcpu_reset(struct kvm_vc
+       u64 cr0;
+ 
+       vmx->rmode.vm86_active = 0;
++      vmx->spec_ctrl = 0;
+ 
+       vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
+       kvm_set_cr8(vcpu, 0);
+@@ -9360,6 +9424,15 @@ static void __noclone vmx_vcpu_run(struc
+ 
+       vmx_arm_hv_timer(vcpu);
+ 
++      /*
++       * If this vCPU has touched SPEC_CTRL, restore the guest's value if
++       * it's non-zero. Since vmentry is serialising on affected CPUs, there
++       * is no need to worry about the conditional branch over the wrmsr
++       * being speculatively taken.
++       */
++      if (vmx->spec_ctrl)
++              wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
++
+       vmx->__launched = vmx->loaded_vmcs->launched;
+       asm(
+               /* Store host registers */
+@@ -9478,6 +9551,27 @@ static void __noclone vmx_vcpu_run(struc
+ #endif
+             );
+ 
++      /*
++       * We do not use IBRS in the kernel. If this vCPU has used the
++       * SPEC_CTRL MSR it may have left it on; save the value and
++       * turn it off. This is much more efficient than blindly adding
++       * it to the atomic save/restore list. Especially as the former
++       * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
++       *
++       * For non-nested case:
++       * If the L01 MSR bitmap does not intercept the MSR, then we need to
++       * save it.
++       *
++       * For nested case:
++       * If the L02 MSR bitmap does not intercept the MSR, then we need to
++       * save it.
++       */
++      if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))
++              rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
++
++      if (vmx->spec_ctrl)
++              wrmsrl(MSR_IA32_SPEC_CTRL, 0);
++
+       /* Eliminate branch target predictions from guest mode */
+       vmexit_fill_RSB();
+ 
+@@ -10109,7 +10203,7 @@ static inline bool nested_vmx_merge_msr_
+       unsigned long *msr_bitmap_l1;
+       unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
+       /*
+-       * pred_cmd is trying to verify two things:
++       * pred_cmd & spec_ctrl are trying to verify two things:
+        *
+        * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
+        *    ensures that we do not accidentally generate an L02 MSR bitmap
+@@ -10122,9 +10216,10 @@ static inline bool nested_vmx_merge_msr_
+        *    the MSR.
+        */
+       bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
++      bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
+ 
+       if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
+-          !pred_cmd)
++          !pred_cmd && !spec_ctrl)
+               return false;
+ 
+       page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
+@@ -10158,6 +10253,12 @@ static inline bool nested_vmx_merge_msr_
+               }
+       }
+ 
++      if (spec_ctrl)
++              nested_vmx_disable_intercept_for_msr(
++                                      msr_bitmap_l1, msr_bitmap_l0,
++                                      MSR_IA32_SPEC_CTRL,
++                                      MSR_TYPE_R | MSR_TYPE_W);
++
+       if (pred_cmd)
+               nested_vmx_disable_intercept_for_msr(
+                                       msr_bitmap_l1, msr_bitmap_l0,
+--- a/arch/x86/kvm/x86.c
++++ b/arch/x86/kvm/x86.c
+@@ -1006,7 +1006,7 @@ static u32 msrs_to_save[] = {
+ #endif
+       MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
+       MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+-      MSR_IA32_ARCH_CAPABILITIES
++      MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES
+ };
+ 
+ static unsigned num_msrs_to_save;
 
--- /dev/null
+Subject: KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
+From: KarimAllah Ahmed karahmed@amazon.de
+Date: Thu Feb  1 22:59:44 2018 +0100
+
+From: KarimAllah Ahmed karahmed@amazon.de
+
+commit 28c1c9fabf48d6ad596273a11c46e0d0da3e14cd
+
+Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
+(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
+contents will come directly from the hardware, but user-space can still
+override it.
+
+[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]
+
+Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
+Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
+Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
+Reviewed-by: Jim Mattson <jmattson@google.com>
+Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+Cc: Andrea Arcangeli <aarcange@redhat.com>
+Cc: Andi Kleen <ak@linux.intel.com>
+Cc: Jun Nakajima <jun.nakajima@intel.com>
+Cc: kvm@vger.kernel.org
+Cc: Dave Hansen <dave.hansen@intel.com>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Andy Lutomirski <luto@kernel.org>
+Cc: Asit Mallick <asit.k.mallick@intel.com>
+Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
+Cc: Greg KH <gregkh@linuxfoundation.org>
+Cc: Dan Williams <dan.j.williams@intel.com>
+Cc: Tim Chen <tim.c.chen@linux.intel.com>
+Cc: Ashok Raj <ashok.raj@intel.com>
+Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon.de
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/kvm/cpuid.c |    2 +-
+ arch/x86/kvm/vmx.c   |   15 +++++++++++++++
+ arch/x86/kvm/x86.c   |    1 +
+ 3 files changed, 17 insertions(+), 1 deletion(-)
+
+--- a/arch/x86/kvm/cpuid.c
++++ b/arch/x86/kvm/cpuid.c
+@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct
+ 
+       /* cpuid 7.0.edx*/
+       const u32 kvm_cpuid_7_0_edx_x86_features =
+-              F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
++              F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+ 
+       /* all calls to cpuid_count() should be made on the same cpu */
+       get_cpu();
+--- a/arch/x86/kvm/vmx.c
++++ b/arch/x86/kvm/vmx.c
+@@ -583,6 +583,8 @@ struct vcpu_vmx {
+       u64                   msr_guest_kernel_gs_base;
+ #endif
+ 
++      u64                   arch_capabilities;
++
+       u32 vm_entry_controls_shadow;
+       u32 vm_exit_controls_shadow;
+       u32 secondary_exec_control;
+@@ -3257,6 +3259,12 @@ static int vmx_get_msr(struct kvm_vcpu *
+       case MSR_IA32_TSC:
+               msr_info->data = guest_read_tsc(vcpu);
+               break;
++      case MSR_IA32_ARCH_CAPABILITIES:
++              if (!msr_info->host_initiated &&
++                  !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
++                      return 1;
++              msr_info->data = to_vmx(vcpu)->arch_capabilities;
++              break;
+       case MSR_IA32_SYSENTER_CS:
+               msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
+               break;
+@@ -3392,6 +3400,11 @@ static int vmx_set_msr(struct kvm_vcpu *
+               vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
+                                             MSR_TYPE_W);
+               break;
++      case MSR_IA32_ARCH_CAPABILITIES:
++              if (!msr_info->host_initiated)
++                      return 1;
++              vmx->arch_capabilities = data;
++              break;
+       case MSR_IA32_CR_PAT:
+               if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
+                       if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
+@@ -5652,6 +5665,8 @@ static int vmx_vcpu_setup(struct vcpu_vm
+               ++vmx->nmsrs;
+       }
+ 
++      if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
++              rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
+ 
+       vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
+ 
+--- a/arch/x86/kvm/x86.c
++++ b/arch/x86/kvm/x86.c
+@@ -1006,6 +1006,7 @@ static u32 msrs_to_save[] = {
+ #endif
+       MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
+       MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
++      MSR_IA32_ARCH_CAPABILITIES
+ };
+ 
+ static unsigned num_msrs_to_save;
 
--- /dev/null
+Subject: KVM: VMX: introduce alloc_loaded_vmcs
+From: Paolo Bonzini pbonzini@redhat.com
+Date: Thu Jan 11 12:16:15 2018 +0100
+
+From: Paolo Bonzini pbonzini@redhat.com
+
+commit f21f165ef922c2146cc5bdc620f542953c41714b
+
+Group together the calls to alloc_vmcs and loaded_vmcs_init.  Soon we'll also
+allocate an MSR bitmap there.
+
+Cc: stable@vger.kernel.org       # prereq for Spectre mitigation
+Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ arch/x86/kvm/vmx.c |   36 ++++++++++++++++++++++--------------
+ 1 file changed, 22 insertions(+), 14 deletions(-)
+
+--- a/arch/x86/kvm/vmx.c
++++ b/arch/x86/kvm/vmx.c
+@@ -3814,11 +3814,6 @@ static struct vmcs *alloc_vmcs_cpu(int c
+       return vmcs;
+ }
+ 
+-static struct vmcs *alloc_vmcs(void)
+-{
+-      return alloc_vmcs_cpu(raw_smp_processor_id());
+-}
+-
+ static void free_vmcs(struct vmcs *vmcs)
+ {
+       free_pages((unsigned long)vmcs, vmcs_config.order);
+@@ -3837,6 +3832,22 @@ static void free_loaded_vmcs(struct load
+       WARN_ON(loaded_vmcs->shadow_vmcs != NULL);
+ }
+ 
++static struct vmcs *alloc_vmcs(void)
++{
++      return alloc_vmcs_cpu(raw_smp_processor_id());
++}
++
++static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
++{
++      loaded_vmcs->vmcs = alloc_vmcs();
++      if (!loaded_vmcs->vmcs)
++              return -ENOMEM;
++
++      loaded_vmcs->shadow_vmcs = NULL;
++      loaded_vmcs_init(loaded_vmcs);
++      return 0;
++}
++
+ static void free_kvm_area(void)
+ {
+       int cpu;
+@@ -7135,12 +7146,11 @@ static int enter_vmx_operation(struct kv
+ {
+       struct vcpu_vmx *vmx = to_vmx(vcpu);
+       struct vmcs *shadow_vmcs;
++      int r;
+ 
+-      vmx->nested.vmcs02.vmcs = alloc_vmcs();
+-      vmx->nested.vmcs02.shadow_vmcs = NULL;
+-      if (!vmx->nested.vmcs02.vmcs)
++      r = alloc_loaded_vmcs(&vmx->nested.vmcs02);
++      if (r < 0)
+               goto out_vmcs02;
+-      loaded_vmcs_init(&vmx->nested.vmcs02);
+ 
+       if (cpu_has_vmx_msr_bitmap()) {
+               vmx->nested.msr_bitmap =
+@@ -9535,13 +9545,11 @@ static struct kvm_vcpu *vmx_create_vcpu(
+       if (!vmx->guest_msrs)
+               goto free_pml;
+ 
+-      vmx->loaded_vmcs = &vmx->vmcs01;
+-      vmx->loaded_vmcs->vmcs = alloc_vmcs();
+-      vmx->loaded_vmcs->shadow_vmcs = NULL;
+-      if (!vmx->loaded_vmcs->vmcs)
++      err = alloc_loaded_vmcs(&vmx->vmcs01);
++      if (err < 0)
+               goto free_msrs;
+-      loaded_vmcs_init(vmx->loaded_vmcs);
+ 
++      vmx->loaded_vmcs = &vmx->vmcs01;
+       cpu = get_cpu();
+       vmx_vcpu_load(&vmx->vcpu, cpu);
+       vmx->vcpu.cpu = cpu;
 
--- /dev/null
+Subject: KVM: VMX: make MSR bitmaps per-VCPU
+From: Paolo Bonzini pbonzini@redhat.com
+Date: Tue Jan 16 16:51:18 2018 +0100
+
+From: Paolo Bonzini pbonzini@redhat.com
+
+commit 904e14fb7cb96401a7dc803ca2863fd5ba32ffe6
+
+Place the MSR bitmap in struct loaded_vmcs, and update it in place
+every time the x2apic or APICv state can change.  This is rare and
+the loop can handle 64 MSRs per iteration, in a similar fashion as
+nested_vmx_prepare_msr_bitmap.
+
+This prepares for choosing, on a per-VM basis, whether to intercept
+the SPEC_CTRL and PRED_CMD MSRs.
+
+Cc: stable@vger.kernel.org       # prereq for Spectre mitigation
+Suggested-by: Jim Mattson <jmattson@google.com>
+Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ arch/x86/kvm/vmx.c |  276 ++++++++++++++++++++++++++++-------------------------
+ 1 file changed, 150 insertions(+), 126 deletions(-)
+
+--- a/arch/x86/kvm/vmx.c
++++ b/arch/x86/kvm/vmx.c
+@@ -108,6 +108,14 @@ static u64 __read_mostly host_xss;
+ static bool __read_mostly enable_pml = 1;
+ module_param_named(pml, enable_pml, bool, S_IRUGO);
+ 
++#define MSR_TYPE_R    1
++#define MSR_TYPE_W    2
++#define MSR_TYPE_RW   3
++
++#define MSR_BITMAP_MODE_X2APIC                1
++#define MSR_BITMAP_MODE_X2APIC_APICV  2
++#define MSR_BITMAP_MODE_LM            4
++
+ #define KVM_VMX_TSC_MULTIPLIER_MAX     0xffffffffffffffffULL
+ 
+ /* Guest_tsc -> host_tsc conversion requires 64-bit division.  */
+@@ -206,6 +214,7 @@ struct loaded_vmcs {
+       int soft_vnmi_blocked;
+       ktime_t entry_time;
+       s64 vnmi_blocked_time;
++      unsigned long *msr_bitmap;
+       struct list_head loaded_vmcss_on_cpu_link;
+ };
+ 
+@@ -446,8 +455,6 @@ struct nested_vmx {
+       bool pi_pending;
+       u16 posted_intr_nv;
+ 
+-      unsigned long *msr_bitmap;
+-
+       struct hrtimer preemption_timer;
+       bool preemption_timer_expired;
+ 
+@@ -562,6 +569,7 @@ struct vcpu_vmx {
+       struct kvm_vcpu       vcpu;
+       unsigned long         host_rsp;
+       u8                    fail;
++      u8                    msr_bitmap_mode;
+       u32                   exit_intr_info;
+       u32                   idt_vectoring_info;
+       ulong                 rflags;
+@@ -919,6 +927,7 @@ static bool vmx_get_nmi_mask(struct kvm_
+ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
+ static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
+                                           u16 error_code);
++static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
+ 
+ static DEFINE_PER_CPU(struct vmcs *, vmxarea);
+ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
+@@ -938,12 +947,6 @@ static DEFINE_PER_CPU(spinlock_t, blocke
+ enum {
+       VMX_IO_BITMAP_A,
+       VMX_IO_BITMAP_B,
+-      VMX_MSR_BITMAP_LEGACY,
+-      VMX_MSR_BITMAP_LONGMODE,
+-      VMX_MSR_BITMAP_LEGACY_X2APIC_APICV,
+-      VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV,
+-      VMX_MSR_BITMAP_LEGACY_X2APIC,
+-      VMX_MSR_BITMAP_LONGMODE_X2APIC,
+       VMX_VMREAD_BITMAP,
+       VMX_VMWRITE_BITMAP,
+       VMX_BITMAP_NR
+@@ -953,12 +956,6 @@ static unsigned long *vmx_bitmap[VMX_BIT
+ 
+ #define vmx_io_bitmap_a                      (vmx_bitmap[VMX_IO_BITMAP_A])
+ #define vmx_io_bitmap_b                      (vmx_bitmap[VMX_IO_BITMAP_B])
+-#define vmx_msr_bitmap_legacy                (vmx_bitmap[VMX_MSR_BITMAP_LEGACY])
+-#define vmx_msr_bitmap_longmode              (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE])
+-#define vmx_msr_bitmap_legacy_x2apic_apicv   (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC_APICV])
+-#define vmx_msr_bitmap_longmode_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV])
+-#define vmx_msr_bitmap_legacy_x2apic         (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC])
+-#define vmx_msr_bitmap_longmode_x2apic       (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC])
+ #define vmx_vmread_bitmap                    (vmx_bitmap[VMX_VMREAD_BITMAP])
+ #define vmx_vmwrite_bitmap                   (vmx_bitmap[VMX_VMWRITE_BITMAP])
+ 
+@@ -2559,36 +2556,6 @@ static void move_msr_up(struct vcpu_vmx
+       vmx->guest_msrs[from] = tmp;
+ }
+ 
+-static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
+-{
+-      unsigned long *msr_bitmap;
+-
+-      if (is_guest_mode(vcpu))
+-              msr_bitmap = to_vmx(vcpu)->nested.msr_bitmap;
+-      else if (cpu_has_secondary_exec_ctrls() &&
+-               (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) &
+-                SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) {
+-              if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) {
+-                      if (is_long_mode(vcpu))
+-                              msr_bitmap = vmx_msr_bitmap_longmode_x2apic_apicv;
+-                      else
+-                              msr_bitmap = vmx_msr_bitmap_legacy_x2apic_apicv;
+-              } else {
+-                      if (is_long_mode(vcpu))
+-                              msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
+-                      else
+-                              msr_bitmap = vmx_msr_bitmap_legacy_x2apic;
+-              }
+-      } else {
+-              if (is_long_mode(vcpu))
+-                      msr_bitmap = vmx_msr_bitmap_longmode;
+-              else
+-                      msr_bitmap = vmx_msr_bitmap_legacy;
+-      }
+-
+-      vmcs_write64(MSR_BITMAP, __pa(msr_bitmap));
+-}
+-
+ /*
+  * Set up the vmcs to automatically save and restore system
+  * msrs.  Don't touch the 64-bit msrs if the guest is in legacy
+@@ -2629,7 +2596,7 @@ static void setup_msrs(struct vcpu_vmx *
+       vmx->save_nmsrs = save_nmsrs;
+ 
+       if (cpu_has_vmx_msr_bitmap())
+-              vmx_set_msr_bitmap(&vmx->vcpu);
++              vmx_update_msr_bitmap(&vmx->vcpu);
+ }
+ 
+ /*
+@@ -3829,6 +3796,8 @@ static void free_loaded_vmcs(struct load
+       loaded_vmcs_clear(loaded_vmcs);
+       free_vmcs(loaded_vmcs->vmcs);
+       loaded_vmcs->vmcs = NULL;
++      if (loaded_vmcs->msr_bitmap)
++              free_page((unsigned long)loaded_vmcs->msr_bitmap);
+       WARN_ON(loaded_vmcs->shadow_vmcs != NULL);
+ }
+ 
+@@ -3845,7 +3814,18 @@ static int alloc_loaded_vmcs(struct load
+ 
+       loaded_vmcs->shadow_vmcs = NULL;
+       loaded_vmcs_init(loaded_vmcs);
++
++      if (cpu_has_vmx_msr_bitmap()) {
++              loaded_vmcs->msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL);
++              if (!loaded_vmcs->msr_bitmap)
++                      goto out_vmcs;
++              memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE);
++      }
+       return 0;
++
++out_vmcs:
++      free_loaded_vmcs(loaded_vmcs);
++      return -ENOMEM;
+ }
+ 
+ static void free_kvm_area(void)
+@@ -4920,10 +4900,8 @@ static void free_vpid(int vpid)
+       spin_unlock(&vmx_vpid_lock);
+ }
+ 
+-#define MSR_TYPE_R    1
+-#define MSR_TYPE_W    2
+-static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+-                                              u32 msr, int type)
++static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
++                                                        u32 msr, int type)
+ {
+       int f = sizeof(unsigned long);
+ 
+@@ -4957,6 +4935,50 @@ static void __vmx_disable_intercept_for_
+       }
+ }
+ 
++static void __always_inline vmx_enable_intercept_for_msr(unsigned long *msr_bitmap,
++                                                       u32 msr, int type)
++{
++      int f = sizeof(unsigned long);
++
++      if (!cpu_has_vmx_msr_bitmap())
++              return;
++
++      /*
++       * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
++       * have the write-low and read-high bitmap offsets the wrong way round.
++       * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff.
++       */
++      if (msr <= 0x1fff) {
++              if (type & MSR_TYPE_R)
++                      /* read-low */
++                      __set_bit(msr, msr_bitmap + 0x000 / f);
++
++              if (type & MSR_TYPE_W)
++                      /* write-low */
++                      __set_bit(msr, msr_bitmap + 0x800 / f);
++
++      } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
++              msr &= 0x1fff;
++              if (type & MSR_TYPE_R)
++                      /* read-high */
++                      __set_bit(msr, msr_bitmap + 0x400 / f);
++
++              if (type & MSR_TYPE_W)
++                      /* write-high */
++                      __set_bit(msr, msr_bitmap + 0xc00 / f);
++
++      }
++}
++
++static void __always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
++                                                    u32 msr, int type, bool value)
++{
++      if (value)
++              vmx_enable_intercept_for_msr(msr_bitmap, msr, type);
++      else
++              vmx_disable_intercept_for_msr(msr_bitmap, msr, type);
++}
++
+ /*
+  * If a msr is allowed by L0, we should check whether it is allowed by L1.
+  * The corresponding bit will be cleared unless both of L0 and L1 allow it.
+@@ -5003,28 +5025,68 @@ static void nested_vmx_disable_intercept
+       }
+ }
+ 
+-static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
++static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu)
+ {
+-      if (!longmode_only)
+-              __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy,
+-                                              msr, MSR_TYPE_R | MSR_TYPE_W);
+-      __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode,
+-                                              msr, MSR_TYPE_R | MSR_TYPE_W);
+-}
+-
+-static void vmx_disable_intercept_msr_x2apic(u32 msr, int type, bool apicv_active)
+-{
+-      if (apicv_active) {
+-              __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic_apicv,
+-                              msr, type);
+-              __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic_apicv,
+-                              msr, type);
+-      } else {
+-              __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic,
+-                              msr, type);
+-              __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic,
+-                              msr, type);
++      u8 mode = 0;
++
++      if (cpu_has_secondary_exec_ctrls() &&
++          (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) &
++           SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) {
++              mode |= MSR_BITMAP_MODE_X2APIC;
++              if (enable_apicv && kvm_vcpu_apicv_active(vcpu))
++                      mode |= MSR_BITMAP_MODE_X2APIC_APICV;
++      }
++
++      if (is_long_mode(vcpu))
++              mode |= MSR_BITMAP_MODE_LM;
++
++      return mode;
++}
++
++#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4))
++
++static void vmx_update_msr_bitmap_x2apic(unsigned long *msr_bitmap,
++                                       u8 mode)
++{
++      int msr;
++
++      for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) {
++              unsigned word = msr / BITS_PER_LONG;
++              msr_bitmap[word] = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0;
++              msr_bitmap[word + (0x800 / sizeof(long))] = ~0;
+       }
++
++      if (mode & MSR_BITMAP_MODE_X2APIC) {
++              /*
++               * TPR reads and writes can be virtualized even if virtual interrupt
++               * delivery is not in use.
++               */
++              vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW);
++              if (mode & MSR_BITMAP_MODE_X2APIC_APICV) {
++                      vmx_enable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R);
++                      vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
++                      vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
++              }
++      }
++}
++
++static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu)
++{
++      struct vcpu_vmx *vmx = to_vmx(vcpu);
++      unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
++      u8 mode = vmx_msr_bitmap_mode(vcpu);
++      u8 changed = mode ^ vmx->msr_bitmap_mode;
++
++      if (!changed)
++              return;
++
++      vmx_set_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW,
++                                !(mode & MSR_BITMAP_MODE_LM));
++
++      if (changed & (MSR_BITMAP_MODE_X2APIC | MSR_BITMAP_MODE_X2APIC_APICV))
++              vmx_update_msr_bitmap_x2apic(msr_bitmap, mode);
++
++      vmx->msr_bitmap_mode = mode;
+ }
+ 
+ static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu)
+@@ -5272,7 +5334,7 @@ static void vmx_refresh_apicv_exec_ctrl(
+       }
+ 
+       if (cpu_has_vmx_msr_bitmap())
+-              vmx_set_msr_bitmap(vcpu);
++              vmx_update_msr_bitmap(vcpu);
+ }
+ 
+ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
+@@ -5459,7 +5521,7 @@ static int vmx_vcpu_setup(struct vcpu_vm
+               vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap));
+       }
+       if (cpu_has_vmx_msr_bitmap())
+-              vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy));
++              vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap));
+ 
+       vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */
+ 
+@@ -6742,7 +6804,7 @@ void vmx_enable_tdp(void)
+ 
+ static __init int hardware_setup(void)
+ {
+-      int r = -ENOMEM, i, msr;
++      int r = -ENOMEM, i;
+ 
+       rdmsrl_safe(MSR_EFER, &host_efer);
+ 
+@@ -6763,9 +6825,6 @@ static __init int hardware_setup(void)
+ 
+       memset(vmx_io_bitmap_b, 0xff, PAGE_SIZE);
+ 
+-      memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE);
+-      memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE);
+-
+       if (setup_vmcs_config(&vmcs_config) < 0) {
+               r = -EIO;
+               goto out;
+@@ -6828,42 +6887,8 @@ static __init int hardware_setup(void)
+               kvm_tsc_scaling_ratio_frac_bits = 48;
+       }
+ 
+-      vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
+-      vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
+-      vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
+-      vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false);
+-      vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
+-      vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
+-
+-      memcpy(vmx_msr_bitmap_legacy_x2apic_apicv,
+-                      vmx_msr_bitmap_legacy, PAGE_SIZE);
+-      memcpy(vmx_msr_bitmap_longmode_x2apic_apicv,
+-                      vmx_msr_bitmap_longmode, PAGE_SIZE);
+-      memcpy(vmx_msr_bitmap_legacy_x2apic,
+-                      vmx_msr_bitmap_legacy, PAGE_SIZE);
+-      memcpy(vmx_msr_bitmap_longmode_x2apic,
+-                      vmx_msr_bitmap_longmode, PAGE_SIZE);
+-
+       set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
+ 
+-      for (msr = 0x800; msr <= 0x8ff; msr++) {
+-              if (msr == 0x839 /* TMCCT */)
+-                      continue;
+-              vmx_disable_intercept_msr_x2apic(msr, MSR_TYPE_R, true);
+-      }
+-
+-      /*
+-       * TPR reads and writes can be virtualized even if virtual interrupt
+-       * delivery is not in use.
+-       */
+-      vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_W, true);
+-      vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_R | MSR_TYPE_W, false);
+-
+-      /* EOI */
+-      vmx_disable_intercept_msr_x2apic(0x80b, MSR_TYPE_W, true);
+-      /* SELF-IPI */
+-      vmx_disable_intercept_msr_x2apic(0x83f, MSR_TYPE_W, true);
+-
+       if (enable_ept)
+               vmx_enable_tdp();
+       else
+@@ -7152,13 +7177,6 @@ static int enter_vmx_operation(struct kv
+       if (r < 0)
+               goto out_vmcs02;
+ 
+-      if (cpu_has_vmx_msr_bitmap()) {
+-              vmx->nested.msr_bitmap =
+-                              (unsigned long *)__get_free_page(GFP_KERNEL);
+-              if (!vmx->nested.msr_bitmap)
+-                      goto out_msr_bitmap;
+-      }
+-
+       vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL);
+       if (!vmx->nested.cached_vmcs12)
+               goto out_cached_vmcs12;
+@@ -7185,9 +7203,6 @@ out_shadow_vmcs:
+       kfree(vmx->nested.cached_vmcs12);
+ 
+ out_cached_vmcs12:
+-      free_page((unsigned long)vmx->nested.msr_bitmap);
+-
+-out_msr_bitmap:
+       free_loaded_vmcs(&vmx->nested.vmcs02);
+ 
+ out_vmcs02:
+@@ -7332,10 +7347,6 @@ static void free_nested(struct vcpu_vmx
+       free_vpid(vmx->nested.vpid02);
+       vmx->nested.posted_intr_nv = -1;
+       vmx->nested.current_vmptr = -1ull;
+-      if (vmx->nested.msr_bitmap) {
+-              free_page((unsigned long)vmx->nested.msr_bitmap);
+-              vmx->nested.msr_bitmap = NULL;
+-      }
+       if (enable_shadow_vmcs) {
+               vmx_disable_shadow_vmcs(vmx);
+               vmcs_clear(vmx->vmcs01.shadow_vmcs);
+@@ -8851,7 +8862,7 @@ static void vmx_set_virtual_x2apic_mode(
+       }
+       vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control);
+ 
+-      vmx_set_msr_bitmap(vcpu);
++      vmx_update_msr_bitmap(vcpu);
+ }
+ 
+ static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa)
+@@ -9513,6 +9524,7 @@ static struct kvm_vcpu *vmx_create_vcpu(
+ {
+       int err;
+       struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
++      unsigned long *msr_bitmap;
+       int cpu;
+ 
+       if (!vmx)
+@@ -9549,6 +9561,15 @@ static struct kvm_vcpu *vmx_create_vcpu(
+       if (err < 0)
+               goto free_msrs;
+ 
++      msr_bitmap = vmx->vmcs01.msr_bitmap;
++      vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW);
++      vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW);
++      vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
++      vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
++      vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
++      vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
++      vmx->msr_bitmap_mode = 0;
++
+       vmx->loaded_vmcs = &vmx->vmcs01;
+       cpu = get_cpu();
+       vmx_vcpu_load(&vmx->vcpu, cpu);
+@@ -10018,7 +10039,7 @@ static inline bool nested_vmx_merge_msr_
+       int msr;
+       struct page *page;
+       unsigned long *msr_bitmap_l1;
+-      unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap;
++      unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
+ 
+       /* This shortcut is ok because we support only x2APIC MSRs so far. */
+       if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
+@@ -10595,6 +10616,9 @@ static int prepare_vmcs02(struct kvm_vcp
+       if (kvm_has_tsc_control)
+               decache_tsc_multiplier(vmx);
+ 
++      if (cpu_has_vmx_msr_bitmap())
++              vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap));
++
+       if (enable_vpid) {
+               /*
+                * There is no direct mapping between vpid02 and vpid12, the
+@@ -11388,7 +11412,7 @@ static void load_vmcs12_host_state(struc
+       vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
+ 
+       if (cpu_has_vmx_msr_bitmap())
+-              vmx_set_msr_bitmap(vcpu);
++              vmx_update_msr_bitmap(vcpu);
+ 
+       if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr,
+                               vmcs12->vm_exit_msr_load_count))
 
--- /dev/null
+Subject: KVM: nVMX: Eliminate vmcs02 pool
+From: Jim Mattson jmattson@google.com
+Date: Mon Nov 27 17:22:25 2017 -0600
+
+From: Jim Mattson jmattson@google.com
+
+commit de3a0021a60635de96aa92713c1a31a96747d72c
+
+The potential performance advantages of a vmcs02 pool have never been
+realized. To simplify the code, eliminate the pool. Instead, a single
+vmcs02 is allocated per VCPU when the VCPU enters VMX operation.
+
+Cc: stable@vger.kernel.org       # prereq for Spectre mitigation
+Signed-off-by: Jim Mattson <jmattson@google.com>
+Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
+Reviewed-by: Ameya More <ameya.more@oracle.com>
+Reviewed-by: David Hildenbrand <david@redhat.com>
+Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
+Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ arch/x86/kvm/vmx.c |  146 ++++++++---------------------------------------------
+ 1 file changed, 23 insertions(+), 123 deletions(-)
+
+--- a/arch/x86/kvm/vmx.c
++++ b/arch/x86/kvm/vmx.c
+@@ -182,7 +182,6 @@ module_param(ple_window_max, int, S_IRUG
+ extern const ulong vmx_return;
+ 
+ #define NR_AUTOLOAD_MSRS 8
+-#define VMCS02_POOL_SIZE 1
+ 
+ struct vmcs {
+       u32 revision_id;
+@@ -223,7 +222,7 @@ struct shared_msr_entry {
+  * stored in guest memory specified by VMPTRLD, but is opaque to the guest,
+  * which must access it using VMREAD/VMWRITE/VMCLEAR instructions.
+  * More than one of these structures may exist, if L1 runs multiple L2 guests.
+- * nested_vmx_run() will use the data here to build a vmcs02: a VMCS for the
++ * nested_vmx_run() will use the data here to build the vmcs02: a VMCS for the
+  * underlying hardware which will be used to run L2.
+  * This structure is packed to ensure that its layout is identical across
+  * machines (necessary for live migration).
+@@ -406,13 +405,6 @@ struct __packed vmcs12 {
+  */
+ #define VMCS12_SIZE 0x1000
+ 
+-/* Used to remember the last vmcs02 used for some recently used vmcs12s */
+-struct vmcs02_list {
+-      struct list_head list;
+-      gpa_t vmptr;
+-      struct loaded_vmcs vmcs02;
+-};
+-
+ /*
+  * The nested_vmx structure is part of vcpu_vmx, and holds information we need
+  * for correct emulation of VMX (i.e., nested VMX) on this vcpu.
+@@ -437,15 +429,15 @@ struct nested_vmx {
+        */
+       bool sync_shadow_vmcs;
+ 
+-      /* vmcs02_list cache of VMCSs recently used to run L2 guests */
+-      struct list_head vmcs02_pool;
+-      int vmcs02_num;
+       bool change_vmcs01_virtual_x2apic_mode;
+       /* L2 must run next, and mustn't decide to exit to L1. */
+       bool nested_run_pending;
++
++      struct loaded_vmcs vmcs02;
++
+       /*
+-       * Guest pages referred to in vmcs02 with host-physical pointers, so
+-       * we must keep them pinned while L2 runs.
++       * Guest pages referred to in the vmcs02 with host-physical
++       * pointers, so we must keep them pinned while L2 runs.
+        */
+       struct page *apic_access_page;
+       struct page *virtual_apic_page;
+@@ -6964,94 +6956,6 @@ static int handle_monitor(struct kvm_vcp
+ }
+ 
+ /*
+- * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12.
+- * We could reuse a single VMCS for all the L2 guests, but we also want the
+- * option to allocate a separate vmcs02 for each separate loaded vmcs12 - this
+- * allows keeping them loaded on the processor, and in the future will allow
+- * optimizations where prepare_vmcs02 doesn't need to set all the fields on
+- * every entry if they never change.
+- * So we keep, in vmx->nested.vmcs02_pool, a cache of size VMCS02_POOL_SIZE
+- * (>=0) with a vmcs02 for each recently loaded vmcs12s, most recent first.
+- *
+- * The following functions allocate and free a vmcs02 in this pool.
+- */
+-
+-/* Get a VMCS from the pool to use as vmcs02 for the current vmcs12. */
+-static struct loaded_vmcs *nested_get_current_vmcs02(struct vcpu_vmx *vmx)
+-{
+-      struct vmcs02_list *item;
+-      list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
+-              if (item->vmptr == vmx->nested.current_vmptr) {
+-                      list_move(&item->list, &vmx->nested.vmcs02_pool);
+-                      return &item->vmcs02;
+-              }
+-
+-      if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE, 1)) {
+-              /* Recycle the least recently used VMCS. */
+-              item = list_last_entry(&vmx->nested.vmcs02_pool,
+-                                     struct vmcs02_list, list);
+-              item->vmptr = vmx->nested.current_vmptr;
+-              list_move(&item->list, &vmx->nested.vmcs02_pool);
+-              return &item->vmcs02;
+-      }
+-
+-      /* Create a new VMCS */
+-      item = kzalloc(sizeof(struct vmcs02_list), GFP_KERNEL);
+-      if (!item)
+-              return NULL;
+-      item->vmcs02.vmcs = alloc_vmcs();
+-      item->vmcs02.shadow_vmcs = NULL;
+-      if (!item->vmcs02.vmcs) {
+-              kfree(item);
+-              return NULL;
+-      }
+-      loaded_vmcs_init(&item->vmcs02);
+-      item->vmptr = vmx->nested.current_vmptr;
+-      list_add(&(item->list), &(vmx->nested.vmcs02_pool));
+-      vmx->nested.vmcs02_num++;
+-      return &item->vmcs02;
+-}
+-
+-/* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */
+-static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr)
+-{
+-      struct vmcs02_list *item;
+-      list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
+-              if (item->vmptr == vmptr) {
+-                      free_loaded_vmcs(&item->vmcs02);
+-                      list_del(&item->list);
+-                      kfree(item);
+-                      vmx->nested.vmcs02_num--;
+-                      return;
+-              }
+-}
+-
+-/*
+- * Free all VMCSs saved for this vcpu, except the one pointed by
+- * vmx->loaded_vmcs. We must be running L1, so vmx->loaded_vmcs
+- * must be &vmx->vmcs01.
+- */
+-static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx)
+-{
+-      struct vmcs02_list *item, *n;
+-
+-      WARN_ON(vmx->loaded_vmcs != &vmx->vmcs01);
+-      list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) {
+-              /*
+-               * Something will leak if the above WARN triggers.  Better than
+-               * a use-after-free.
+-               */
+-              if (vmx->loaded_vmcs == &item->vmcs02)
+-                      continue;
+-
+-              free_loaded_vmcs(&item->vmcs02);
+-              list_del(&item->list);
+-              kfree(item);
+-              vmx->nested.vmcs02_num--;
+-      }
+-}
+-
+-/*
+  * The following 3 functions, nested_vmx_succeed()/failValid()/failInvalid(),
+  * set the success or error code of an emulated VMX instruction, as specified
+  * by Vol 2B, VMX Instruction Reference, "Conventions".
+@@ -7232,6 +7136,12 @@ static int enter_vmx_operation(struct kv
+       struct vcpu_vmx *vmx = to_vmx(vcpu);
+       struct vmcs *shadow_vmcs;
+ 
++      vmx->nested.vmcs02.vmcs = alloc_vmcs();
++      vmx->nested.vmcs02.shadow_vmcs = NULL;
++      if (!vmx->nested.vmcs02.vmcs)
++              goto out_vmcs02;
++      loaded_vmcs_init(&vmx->nested.vmcs02);
++
+       if (cpu_has_vmx_msr_bitmap()) {
+               vmx->nested.msr_bitmap =
+                               (unsigned long *)__get_free_page(GFP_KERNEL);
+@@ -7254,9 +7164,6 @@ static int enter_vmx_operation(struct kv
+               vmx->vmcs01.shadow_vmcs = shadow_vmcs;
+       }
+ 
+-      INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
+-      vmx->nested.vmcs02_num = 0;
+-
+       hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC,
+                    HRTIMER_MODE_REL_PINNED);
+       vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
+@@ -7271,6 +7178,9 @@ out_cached_vmcs12:
+       free_page((unsigned long)vmx->nested.msr_bitmap);
+ 
+ out_msr_bitmap:
++      free_loaded_vmcs(&vmx->nested.vmcs02);
++
++out_vmcs02:
+       return -ENOMEM;
+ }
+ 
+@@ -7423,7 +7333,7 @@ static void free_nested(struct vcpu_vmx
+               vmx->vmcs01.shadow_vmcs = NULL;
+       }
+       kfree(vmx->nested.cached_vmcs12);
+-      /* Unpin physical memory we referred to in current vmcs02 */
++      /* Unpin physical memory we referred to in the vmcs02 */
+       if (vmx->nested.apic_access_page) {
+               kvm_release_page_dirty(vmx->nested.apic_access_page);
+               vmx->nested.apic_access_page = NULL;
+@@ -7439,7 +7349,7 @@ static void free_nested(struct vcpu_vmx
+               vmx->nested.pi_desc = NULL;
+       }
+ 
+-      nested_free_all_saved_vmcss(vmx);
++      free_loaded_vmcs(&vmx->nested.vmcs02);
+ }
+ 
+ /* Emulate the VMXOFF instruction */
+@@ -7482,8 +7392,6 @@ static int handle_vmclear(struct kvm_vcp
+                       vmptr + offsetof(struct vmcs12, launch_state),
+                       &zero, sizeof(zero));
+ 
+-      nested_free_vmcs02(vmx, vmptr);
+-
+       nested_vmx_succeed(vcpu);
+       return kvm_skip_emulated_instruction(vcpu);
+ }
+@@ -8395,10 +8303,11 @@ static bool nested_vmx_exit_reflected(st
+ 
+       /*
+        * The host physical addresses of some pages of guest memory
+-       * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU
+-       * may write to these pages via their host physical address while
+-       * L2 is running, bypassing any address-translation-based dirty
+-       * tracking (e.g. EPT write protection).
++       * are loaded into the vmcs02 (e.g. vmcs12's Virtual APIC
++       * Page). The CPU may write to these pages via their host
++       * physical address while L2 is running, bypassing any
++       * address-translation-based dirty tracking (e.g. EPT write
++       * protection).
+        *
+        * Mark them dirty on every exit from L2 to prevent them from
+        * getting out of sync with dirty tracking.
+@@ -10894,20 +10803,15 @@ static int enter_vmx_non_root_mode(struc
+ {
+       struct vcpu_vmx *vmx = to_vmx(vcpu);
+       struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+-      struct loaded_vmcs *vmcs02;
+       u32 msr_entry_idx;
+       u32 exit_qual;
+ 
+-      vmcs02 = nested_get_current_vmcs02(vmx);
+-      if (!vmcs02)
+-              return -ENOMEM;
+-
+       enter_guest_mode(vcpu);
+ 
+       if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
+               vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
+ 
+-      vmx_switch_vmcs(vcpu, vmcs02);
++      vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02);
+       vmx_segment_cache_clear(vmx);
+ 
+       if (prepare_vmcs02(vcpu, vmcs12, from_vmentry, &exit_qual)) {
+@@ -11522,10 +11426,6 @@ static void nested_vmx_vmexit(struct kvm
+       vm_exit_controls_reset_shadow(vmx);
+       vmx_segment_cache_clear(vmx);
+ 
+-      /* if no vmcs02 cache requested, remove the one we used */
+-      if (VMCS02_POOL_SIZE == 0)
+-              nested_free_vmcs02(vmx, vmx->nested.current_vmptr);
+-
+       /* Update any VMCS fields that might have changed while L2 ran */
+       vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr);
+       vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.nr);
 
--- /dev/null
+Subject: KVM/x86: Add IBPB support
+From: Ashok Raj ashok.raj@intel.com
+Date: Thu Feb  1 22:59:43 2018 +0100
+
+From: Ashok Raj ashok.raj@intel.com
+
+commit 15d45071523d89b3fb7372e2135fbd72f6af9506
+
+The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
+control mechanism. It keeps earlier branches from influencing
+later ones.
+
+Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
+It's a command that ensures predicted branch targets aren't used after
+the barrier. Although IBRS and IBPB are enumerated by the same CPUID
+enumeration, IBPB is very different.
+
+IBPB helps mitigate against three potential attacks:
+
+* Mitigate guests from being attacked by other guests.
+  - This is addressed by issing IBPB when we do a guest switch.
+
+* Mitigate attacks from guest/ring3->host/ring3.
+  These would require a IBPB during context switch in host, or after
+  VMEXIT. The host process has two ways to mitigate
+  - Either it can be compiled with retpoline
+  - If its going through context switch, and has set !dumpable then
+    there is a IBPB in that path.
+    (Tim's patch: https://patchwork.kernel.org/patch/10192871)
+  - The case where after a VMEXIT you return back to Qemu might make
+    Qemu attackable from guest when Qemu isn't compiled with retpoline.
+  There are issues reported when doing IBPB on every VMEXIT that resulted
+  in some tsc calibration woes in guest.
+
+* Mitigate guest/ring0->host/ring0 attacks.
+  When host kernel is using retpoline it is safe against these attacks.
+  If host kernel isn't using retpoline we might need to do a IBPB flush on
+  every VMEXIT.
+
+Even when using retpoline for indirect calls, in certain conditions 'ret'
+can use the BTB on Skylake-era CPUs. There are other mitigations
+available like RSB stuffing/clearing.
+
+* IBPB is issued only for SVM during svm_free_vcpu().
+  VMX has a vmclear and SVM doesn't.  Follow discussion here:
+  https://lkml.org/lkml/2018/1/15/146
+
+Please refer to the following spec for more details on the enumeration
+and control.
+
+Refer here to get documentation about mitigations.
+
+https://software.intel.com/en-us/side-channel-security-support
+
+[peterz: rebase and changelog rewrite]
+[karahmed: - rebase
+           - vmx: expose PRED_CMD if guest has it in CPUID
+           - svm: only pass through IBPB if guest has it in CPUID
+           - vmx: support !cpu_has_vmx_msr_bitmap()]
+           - vmx: support nested]
+[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
+        PRED_CMD is a write-only MSR]
+
+Signed-off-by: Ashok Raj <ashok.raj@intel.com>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
+Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+Cc: Andrea Arcangeli <aarcange@redhat.com>
+Cc: Andi Kleen <ak@linux.intel.com>
+Cc: kvm@vger.kernel.org
+Cc: Asit Mallick <asit.k.mallick@intel.com>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Andy Lutomirski <luto@kernel.org>
+Cc: Dave Hansen <dave.hansen@intel.com>
+Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
+Cc: Greg KH <gregkh@linuxfoundation.org>
+Cc: Jun Nakajima <jun.nakajima@intel.com>
+Cc: Paolo Bonzini <pbonzini@redhat.com>
+Cc: Dan Williams <dan.j.williams@intel.com>
+Cc: Tim Chen <tim.c.chen@linux.intel.com>
+Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
+Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon.de
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/kvm/cpuid.c |   11 ++++++-
+ arch/x86/kvm/svm.c   |   28 +++++++++++++++++
+ arch/x86/kvm/vmx.c   |   80 +++++++++++++++++++++++++++++++++++++++++++++++++--
+ 3 files changed, 116 insertions(+), 3 deletions(-)
+
+--- a/arch/x86/kvm/cpuid.c
++++ b/arch/x86/kvm/cpuid.c
+@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct
+               F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
+               0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
+ 
++      /* cpuid 0x80000008.ebx */
++      const u32 kvm_cpuid_8000_0008_ebx_x86_features =
++              F(IBPB);
++
+       /* cpuid 0xC0000001.edx */
+       const u32 kvm_cpuid_C000_0001_edx_x86_features =
+               F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
+@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct
+               if (!g_phys_as)
+                       g_phys_as = phys_as;
+               entry->eax = g_phys_as | (virt_as << 8);
+-              entry->ebx = entry->edx = 0;
++              entry->edx = 0;
++              /* IBPB isn't necessarily present in hardware cpuid */
++              if (boot_cpu_has(X86_FEATURE_IBPB))
++                      entry->ebx |= F(IBPB);
++              entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
++              cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX);
+               break;
+       }
+       case 0x80000019:
+--- a/arch/x86/kvm/svm.c
++++ b/arch/x86/kvm/svm.c
+@@ -249,6 +249,7 @@ static const struct svm_direct_access_ms
+       { .index = MSR_CSTAR,                           .always = true  },
+       { .index = MSR_SYSCALL_MASK,                    .always = true  },
+ #endif
++      { .index = MSR_IA32_PRED_CMD,                   .always = false },
+       { .index = MSR_IA32_LASTBRANCHFROMIP,           .always = false },
+       { .index = MSR_IA32_LASTBRANCHTOIP,             .always = false },
+       { .index = MSR_IA32_LASTINTFROMIP,              .always = false },
+@@ -529,6 +530,7 @@ struct svm_cpu_data {
+       struct kvm_ldttss_desc *tss_desc;
+ 
+       struct page *save_area;
++      struct vmcb *current_vmcb;
+ };
+ 
+ static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
+@@ -1706,11 +1708,17 @@ static void svm_free_vcpu(struct kvm_vcp
+       __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
+       kvm_vcpu_uninit(vcpu);
+       kmem_cache_free(kvm_vcpu_cache, svm);
++      /*
++       * The vmcb page can be recycled, causing a false negative in
++       * svm_vcpu_load(). So do a full IBPB now.
++       */
++      indirect_branch_prediction_barrier();
+ }
+ 
+ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+ {
+       struct vcpu_svm *svm = to_svm(vcpu);
++      struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
+       int i;
+ 
+       if (unlikely(cpu != vcpu->cpu)) {
+@@ -1739,6 +1747,10 @@ static void svm_vcpu_load(struct kvm_vcp
+       if (static_cpu_has(X86_FEATURE_RDTSCP))
+               wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
+ 
++      if (sd->current_vmcb != svm->vmcb) {
++              sd->current_vmcb = svm->vmcb;
++              indirect_branch_prediction_barrier();
++      }
+       avic_vcpu_load(vcpu, cpu);
+ }
+ 
+@@ -3670,6 +3682,22 @@ static int svm_set_msr(struct kvm_vcpu *
+       case MSR_IA32_TSC:
+               kvm_write_tsc(vcpu, msr);
+               break;
++      case MSR_IA32_PRED_CMD:
++              if (!msr->host_initiated &&
++                  !guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
++                      return 1;
++
++              if (data & ~PRED_CMD_IBPB)
++                      return 1;
++
++              if (!data)
++                      break;
++
++              wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
++              if (is_guest_mode(vcpu))
++                      break;
++              set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
++              break;
+       case MSR_STAR:
+               svm->vmcb->save.star = data;
+               break;
+--- a/arch/x86/kvm/vmx.c
++++ b/arch/x86/kvm/vmx.c
+@@ -582,6 +582,7 @@ struct vcpu_vmx {
+       u64                   msr_host_kernel_gs_base;
+       u64                   msr_guest_kernel_gs_base;
+ #endif
++
+       u32 vm_entry_controls_shadow;
+       u32 vm_exit_controls_shadow;
+       u32 secondary_exec_control;
+@@ -926,6 +927,8 @@ static void vmx_set_nmi_mask(struct kvm_
+ static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
+                                           u16 error_code);
+ static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
++static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
++                                                        u32 msr, int type);
+ 
+ static DEFINE_PER_CPU(struct vmcs *, vmxarea);
+ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
+@@ -1900,6 +1903,29 @@ static void update_exception_bitmap(stru
+       vmcs_write32(EXCEPTION_BITMAP, eb);
+ }
+ 
++/*
++ * Check if MSR is intercepted for L01 MSR bitmap.
++ */
++static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
++{
++      unsigned long *msr_bitmap;
++      int f = sizeof(unsigned long);
++
++      if (!cpu_has_vmx_msr_bitmap())
++              return true;
++
++      msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
++
++      if (msr <= 0x1fff) {
++              return !!test_bit(msr, msr_bitmap + 0x800 / f);
++      } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
++              msr &= 0x1fff;
++              return !!test_bit(msr, msr_bitmap + 0xc00 / f);
++      }
++
++      return true;
++}
++
+ static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
+               unsigned long entry, unsigned long exit)
+ {
+@@ -2278,6 +2304,7 @@ static void vmx_vcpu_load(struct kvm_vcp
+       if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
+               per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
+               vmcs_load(vmx->loaded_vmcs->vmcs);
++              indirect_branch_prediction_barrier();
+       }
+ 
+       if (!already_loaded) {
+@@ -3337,6 +3364,34 @@ static int vmx_set_msr(struct kvm_vcpu *
+       case MSR_IA32_TSC:
+               kvm_write_tsc(vcpu, msr_info);
+               break;
++      case MSR_IA32_PRED_CMD:
++              if (!msr_info->host_initiated &&
++                  !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) &&
++                  !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
++                      return 1;
++
++              if (data & ~PRED_CMD_IBPB)
++                      return 1;
++
++              if (!data)
++                      break;
++
++              wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
++
++              /*
++               * For non-nested:
++               * When it's written (to non-zero) for the first time, pass
++               * it through.
++               *
++               * For nested:
++               * The handling of the MSR bitmap for L2 guests is done in
++               * nested_vmx_merge_msr_bitmap. We should not touch the
++               * vmcs02.msr_bitmap here since it gets completely overwritten
++               * in the merging.
++               */
++              vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
++                                            MSR_TYPE_W);
++              break;
+       case MSR_IA32_CR_PAT:
+               if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
+                       if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
+@@ -10038,9 +10093,23 @@ static inline bool nested_vmx_merge_msr_
+       struct page *page;
+       unsigned long *msr_bitmap_l1;
+       unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
++      /*
++       * pred_cmd is trying to verify two things:
++       *
++       * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
++       *    ensures that we do not accidentally generate an L02 MSR bitmap
++       *    from the L12 MSR bitmap that is too permissive.
++       * 2. That L1 or L2s have actually used the MSR. This avoids
++       *    unnecessarily merging of the bitmap if the MSR is unused. This
++       *    works properly because we only update the L01 MSR bitmap lazily.
++       *    So even if L0 should pass L1 these MSRs, the L01 bitmap is only
++       *    updated to reflect this when L1 (or its L2s) actually write to
++       *    the MSR.
++       */
++      bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+ 
+-      /* This shortcut is ok because we support only x2APIC MSRs so far. */
+-      if (!nested_cpu_has_virt_x2apic_mode(vmcs12))
++      if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
++          !pred_cmd)
+               return false;
+ 
+       page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
+@@ -10073,6 +10142,13 @@ static inline bool nested_vmx_merge_msr_
+                               MSR_TYPE_W);
+               }
+       }
++
++      if (pred_cmd)
++              nested_vmx_disable_intercept_for_msr(
++                                      msr_bitmap_l1, msr_bitmap_l0,
++                                      MSR_IA32_PRED_CMD,
++                                      MSR_TYPE_W);
++
+       kunmap(page);
+       kvm_release_page_clean(page);
+ 
 
--- /dev/null
+Subject: KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX
+From: KarimAllah Ahmed karahmed@amazon.de
+Date: Thu Feb  1 22:59:42 2018 +0100
+
+From: KarimAllah Ahmed karahmed@amazon.de
+
+commit b7b27aa011a1df42728d1768fc181d9ce69e6911
+
+[dwmw2: Stop using KF() for bits in it, too]
+Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
+Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
+Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+Reviewed-by: Jim Mattson <jmattson@google.com>
+Cc: kvm@vger.kernel.org
+Cc: Radim Krčmář <rkrcmar@redhat.com>
+Link: https://lkml.kernel.org/r/1517522386-18410-2-git-send-email-karahmed@amazon.de
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/kvm/cpuid.c |    8 +++-----
+ arch/x86/kvm/cpuid.h |    1 +
+ 2 files changed, 4 insertions(+), 5 deletions(-)
+
+--- a/arch/x86/kvm/cpuid.c
++++ b/arch/x86/kvm/cpuid.c
+@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void)
+ 
+ #define F(x) bit(X86_FEATURE_##x)
+ 
+-/* These are scattered features in cpufeatures.h. */
+-#define KVM_CPUID_BIT_AVX512_4VNNIW     2
+-#define KVM_CPUID_BIT_AVX512_4FMAPS     3
++/* For scattered features from cpufeatures.h; we currently expose none */
+ #define KF(x) bit(KVM_CPUID_BIT_##x)
+ 
+ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
+@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct
+ 
+       /* cpuid 7.0.edx*/
+       const u32 kvm_cpuid_7_0_edx_x86_features =
+-              KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
++              F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+ 
+       /* all calls to cpuid_count() should be made on the same cpu */
+       get_cpu();
+@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct
+                       if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
+                               entry->ecx &= ~F(PKU);
+                       entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+-                      entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
++                      cpuid_mask(&entry->edx, CPUID_7_EDX);
+               } else {
+                       entry->ebx = 0;
+                       entry->ecx = 0;
+--- a/arch/x86/kvm/cpuid.h
++++ b/arch/x86/kvm/cpuid.h
+@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cp
+       [CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},
+       [CPUID_7_ECX]         = {         7, 0, CPUID_ECX},
+       [CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
++      [CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
+ };
+ 
+ static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)
 
--- /dev/null
+Subject: array_index_nospec: Sanitize speculative array de-references
+From: Dan Williams dan.j.williams@intel.com
+Date: Mon Jan 29 17:02:22 2018 -0800
+
+From: Dan Williams dan.j.williams@intel.com
+
+commit f3804203306e098dae9ca51540fcd5eb700d7f40
+
+array_index_nospec() is proposed as a generic mechanism to mitigate
+against Spectre-variant-1 attacks, i.e. an attack that bypasses boundary
+checks via speculative execution. The array_index_nospec()
+implementation is expected to be safe for current generation CPUs across
+multiple architectures (ARM, x86).
+
+Based on an original implementation by Linus Torvalds, tweaked to remove
+speculative flows by Alexei Starovoitov, and tweaked again by Linus to
+introduce an x86 assembly implementation for the mask generation.
+
+Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
+Co-developed-by: Alexei Starovoitov <ast@kernel.org>
+Suggested-by: Cyril Novikov <cnovikov@lynx.com>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: linux-arch@vger.kernel.org
+Cc: kernel-hardening@lists.openwall.com
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Catalin Marinas <catalin.marinas@arm.com>
+Cc: Will Deacon <will.deacon@arm.com>
+Cc: Russell King <linux@armlinux.org.uk>
+Cc: gregkh@linuxfoundation.org
+Cc: torvalds@linux-foundation.org
+Cc: alan@linux.intel.com
+Link: https://lkml.kernel.org/r/151727414229.33451.18411580953862676575.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ include/linux/nospec.h |   72 +++++++++++++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 72 insertions(+)
+
+--- /dev/null
++++ b/include/linux/nospec.h
+@@ -0,0 +1,72 @@
++// SPDX-License-Identifier: GPL-2.0
++// Copyright(c) 2018 Linus Torvalds. All rights reserved.
++// Copyright(c) 2018 Alexei Starovoitov. All rights reserved.
++// Copyright(c) 2018 Intel Corporation. All rights reserved.
++
++#ifndef _LINUX_NOSPEC_H
++#define _LINUX_NOSPEC_H
++
++/**
++ * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 otherwise
++ * @index: array element index
++ * @size: number of elements in array
++ *
++ * When @index is out of bounds (@index >= @size), the sign bit will be
++ * set.  Extend the sign bit to all bits and invert, giving a result of
++ * zero for an out of bounds index, or ~0 if within bounds [0, @size).
++ */
++#ifndef array_index_mask_nospec
++static inline unsigned long array_index_mask_nospec(unsigned long index,
++                                                  unsigned long size)
++{
++      /*
++       * Warn developers about inappropriate array_index_nospec() usage.
++       *
++       * Even if the CPU speculates past the WARN_ONCE branch, the
++       * sign bit of @index is taken into account when generating the
++       * mask.
++       *
++       * This warning is compiled out when the compiler can infer that
++       * @index and @size are less than LONG_MAX.
++       */
++      if (WARN_ONCE(index > LONG_MAX || size > LONG_MAX,
++                      "array_index_nospec() limited to range of [0, LONG_MAX]\n"))
++              return 0;
++
++      /*
++       * Always calculate and emit the mask even if the compiler
++       * thinks the mask is not needed. The compiler does not take
++       * into account the value of @index under speculation.
++       */
++      OPTIMIZER_HIDE_VAR(index);
++      return ~(long)(index | (size - 1UL - index)) >> (BITS_PER_LONG - 1);
++}
++#endif
++
++/*
++ * array_index_nospec - sanitize an array index after a bounds check
++ *
++ * For a code sequence like:
++ *
++ *     if (index < size) {
++ *         index = array_index_nospec(index, size);
++ *         val = array[index];
++ *     }
++ *
++ * ...if the CPU speculates past the bounds check then
++ * array_index_nospec() will clamp the index within the range of [0,
++ * size).
++ */
++#define array_index_nospec(index, size)                                       \
++({                                                                    \
++      typeof(index) _i = (index);                                     \
++      typeof(size) _s = (size);                                       \
++      unsigned long _mask = array_index_mask_nospec(_i, _s);          \
++                                                                      \
++      BUILD_BUG_ON(sizeof(_i) > sizeof(long));                        \
++      BUILD_BUG_ON(sizeof(_s) > sizeof(long));                        \
++                                                                      \
++      _i &= _mask;                                                    \
++      _i;                                                             \
++})
++#endif /* _LINUX_NOSPEC_H */
 
--- /dev/null
+Subject: nl80211: Sanitize array index in parse_txq_params
+From: Dan Williams dan.j.williams@intel.com
+Date: Mon Jan 29 17:03:15 2018 -0800
+
+From: Dan Williams dan.j.williams@intel.com
+
+commit 259d8c1e984318497c84eef547bbb6b1d9f4eb05
+
+Wireless drivers rely on parse_txq_params to validate that txq_params->ac
+is less than NL80211_NUM_ACS by the time the low-level driver's ->conf_tx()
+handler is called. Use a new helper, array_index_nospec(), to sanitize
+txq_params->ac with respect to speculation. I.e. ensure that any
+speculation into ->conf_tx() handlers is done with a value of
+txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).
+
+Reported-by: Christian Lamparter <chunkeey@gmail.com>
+Reported-by: Elena Reshetova <elena.reshetova@intel.com>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Acked-by: Johannes Berg <johannes@sipsolutions.net>
+Cc: linux-arch@vger.kernel.org
+Cc: kernel-hardening@lists.openwall.com
+Cc: gregkh@linuxfoundation.org
+Cc: linux-wireless@vger.kernel.org
+Cc: torvalds@linux-foundation.org
+Cc: "David S. Miller" <davem@davemloft.net>
+Cc: alan@linux.intel.com
+Link: https://lkml.kernel.org/r/151727419584.33451.7700736761686184303.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ net/wireless/nl80211.c |    9 ++++++---
+ 1 file changed, 6 insertions(+), 3 deletions(-)
+
+--- a/net/wireless/nl80211.c
++++ b/net/wireless/nl80211.c
+@@ -16,6 +16,7 @@
+ #include <linux/nl80211.h>
+ #include <linux/rtnetlink.h>
+ #include <linux/netlink.h>
++#include <linux/nospec.h>
+ #include <linux/etherdevice.h>
+ #include <net/net_namespace.h>
+ #include <net/genetlink.h>
+@@ -2056,20 +2057,22 @@ static const struct nla_policy txq_param
+ static int parse_txq_params(struct nlattr *tb[],
+                           struct ieee80211_txq_params *txq_params)
+ {
++      u8 ac;
++
+       if (!tb[NL80211_TXQ_ATTR_AC] || !tb[NL80211_TXQ_ATTR_TXOP] ||
+           !tb[NL80211_TXQ_ATTR_CWMIN] || !tb[NL80211_TXQ_ATTR_CWMAX] ||
+           !tb[NL80211_TXQ_ATTR_AIFS])
+               return -EINVAL;
+ 
+-      txq_params->ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
++      ac = nla_get_u8(tb[NL80211_TXQ_ATTR_AC]);
+       txq_params->txop = nla_get_u16(tb[NL80211_TXQ_ATTR_TXOP]);
+       txq_params->cwmin = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMIN]);
+       txq_params->cwmax = nla_get_u16(tb[NL80211_TXQ_ATTR_CWMAX]);
+       txq_params->aifs = nla_get_u8(tb[NL80211_TXQ_ATTR_AIFS]);
+ 
+-      if (txq_params->ac >= NL80211_NUM_ACS)
++      if (ac >= NL80211_NUM_ACS)
+               return -EINVAL;
+-
++      txq_params->ac = array_index_nospec(ac, NL80211_NUM_ACS);
+       return 0;
+ }
+ 
 
--- /dev/null
+Subject: objtool: Add support for alternatives at the end of a section
+From: Josh Poimboeuf jpoimboe@redhat.com
+Date: Mon Jan 29 22:00:40 2018 -0600
+
+From: Josh Poimboeuf jpoimboe@redhat.com
+
+commit 17bc33914bcc98ba3c6b426fd1c49587a25c0597
+
+Now that the previous patch gave objtool the ability to read retpoline
+alternatives, it shows a new warning:
+
+  arch/x86/entry/entry_64.o: warning: objtool: .entry_trampoline: don't know how to handle alternatives at end of section
+
+This is due to the JMP_NOSPEC in entry_SYSCALL_64_trampoline().
+
+Previously, objtool ignored this situation because it wasn't needed, and
+it would have required a bit of extra code.  Now that this case exists,
+add proper support for it.
+
+Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
+Cc: Andy Lutomirski <luto@kernel.org>
+Cc: Borislav Petkov <bp@alien8.de>
+Cc: Dave Hansen <dave.hansen@linux.intel.com>
+Cc: David Woodhouse <dwmw2@infradead.org>
+Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Cc: Guenter Roeck <linux@roeck-us.net>
+Cc: H. Peter Anvin <hpa@zytor.com>
+Cc: Juergen Gross <jgross@suse.com>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Thomas Gleixner <tglx@linutronix.de>
+Link: http://lkml.kernel.org/r/2a30a3c2158af47d891a76e69bb1ef347e0443fd.1517284349.git.jpoimboe@redhat.com
+Signed-off-by: Ingo Molnar <mingo@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ tools/objtool/check.c |   53 +++++++++++++++++++++++++++++---------------------
+ 1 file changed, 31 insertions(+), 22 deletions(-)
+
+--- a/tools/objtool/check.c
++++ b/tools/objtool/check.c
+@@ -594,7 +594,7 @@ static int handle_group_alt(struct objto
+                           struct instruction *orig_insn,
+                           struct instruction **new_insn)
+ {
+-      struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump;
++      struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump = NULL;
+       unsigned long dest_off;
+ 
+       last_orig_insn = NULL;
+@@ -610,28 +610,30 @@ static int handle_group_alt(struct objto
+               last_orig_insn = insn;
+       }
+ 
+-      if (!next_insn_same_sec(file, last_orig_insn)) {
+-              WARN("%s: don't know how to handle alternatives at end of section",
+-                   special_alt->orig_sec->name);
+-              return -1;
+-      }
+-
+-      fake_jump = malloc(sizeof(*fake_jump));
+-      if (!fake_jump) {
+-              WARN("malloc failed");
+-              return -1;
++      if (next_insn_same_sec(file, last_orig_insn)) {
++              fake_jump = malloc(sizeof(*fake_jump));
++              if (!fake_jump) {
++                      WARN("malloc failed");
++                      return -1;
++              }
++              memset(fake_jump, 0, sizeof(*fake_jump));
++              INIT_LIST_HEAD(&fake_jump->alts);
++              clear_insn_state(&fake_jump->state);
++
++              fake_jump->sec = special_alt->new_sec;
++              fake_jump->offset = -1;
++              fake_jump->type = INSN_JUMP_UNCONDITIONAL;
++              fake_jump->jump_dest = list_next_entry(last_orig_insn, list);
++              fake_jump->ignore = true;
+       }
+-      memset(fake_jump, 0, sizeof(*fake_jump));
+-      INIT_LIST_HEAD(&fake_jump->alts);
+-      clear_insn_state(&fake_jump->state);
+-
+-      fake_jump->sec = special_alt->new_sec;
+-      fake_jump->offset = -1;
+-      fake_jump->type = INSN_JUMP_UNCONDITIONAL;
+-      fake_jump->jump_dest = list_next_entry(last_orig_insn, list);
+-      fake_jump->ignore = true;
+ 
+       if (!special_alt->new_len) {
++              if (!fake_jump) {
++                      WARN("%s: empty alternative at end of section",
++                           special_alt->orig_sec->name);
++                      return -1;
++              }
++
+               *new_insn = fake_jump;
+               return 0;
+       }
+@@ -654,8 +656,14 @@ static int handle_group_alt(struct objto
+                       continue;
+ 
+               dest_off = insn->offset + insn->len + insn->immediate;
+-              if (dest_off == special_alt->new_off + special_alt->new_len)
++              if (dest_off == special_alt->new_off + special_alt->new_len) {
++                      if (!fake_jump) {
++                              WARN("%s: alternative jump to end of section",
++                                   special_alt->orig_sec->name);
++                              return -1;
++                      }
+                       insn->jump_dest = fake_jump;
++              }
+ 
+               if (!insn->jump_dest) {
+                       WARN_FUNC("can't find alternative jump destination",
+@@ -670,7 +678,8 @@ static int handle_group_alt(struct objto
+               return -1;
+       }
+ 
+-      list_add(&fake_jump->list, &last_new_insn->list);
++      if (fake_jump)
++              list_add(&fake_jump->list, &last_new_insn->list);
+ 
+       return 0;
+ }
 
--- /dev/null
+Subject: objtool: Improve retpoline alternative handling
+From: Josh Poimboeuf jpoimboe@redhat.com
+Date: Mon Jan 29 22:00:39 2018 -0600
+
+From: Josh Poimboeuf jpoimboe@redhat.com
+
+commit a845c7cf4b4cb5e9e3b2823867892b27646f3a98
+
+Currently objtool requires all retpolines to be:
+
+  a) patched in with alternatives; and
+
+  b) annotated with ANNOTATE_NOSPEC_ALTERNATIVE.
+
+If you forget to do both of the above, objtool segfaults trying to
+dereference a NULL 'insn->call_dest' pointer.
+
+Avoid that situation and print a more helpful error message:
+
+  quirks.o: warning: objtool: efi_delete_dummy_variable()+0x99: unsupported intra-function call
+  quirks.o: warning: objtool: If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.
+
+Future improvements can be made to make objtool smarter with respect to
+retpolines, but this is a good incremental improvement for now.
+
+Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
+Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
+Cc: Andy Lutomirski <luto@kernel.org>
+Cc: Borislav Petkov <bp@alien8.de>
+Cc: Dave Hansen <dave.hansen@linux.intel.com>
+Cc: David Woodhouse <dwmw2@infradead.org>
+Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Cc: H. Peter Anvin <hpa@zytor.com>
+Cc: Juergen Gross <jgross@suse.com>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Thomas Gleixner <tglx@linutronix.de>
+Link: http://lkml.kernel.org/r/819e50b6d9c2e1a22e34c1a636c0b2057cc8c6e5.1517284349.git.jpoimboe@redhat.com
+Signed-off-by: Ingo Molnar <mingo@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ tools/objtool/check.c |   36 ++++++++++++++++--------------------
+ 1 file changed, 16 insertions(+), 20 deletions(-)
+
+--- a/tools/objtool/check.c
++++ b/tools/objtool/check.c
+@@ -543,18 +543,14 @@ static int add_call_destinations(struct
+                       dest_off = insn->offset + insn->len + insn->immediate;
+                       insn->call_dest = find_symbol_by_offset(insn->sec,
+                                                               dest_off);
+-                      /*
+-                       * FIXME: Thanks to retpolines, it's now considered
+-                       * normal for a function to call within itself.  So
+-                       * disable this warning for now.
+-                       */
+-#if 0
+-                      if (!insn->call_dest) {
+-                              WARN_FUNC("can't find call dest symbol at offset 0x%lx",
+-                                        insn->sec, insn->offset, dest_off);
++
++                      if (!insn->call_dest && !insn->ignore) {
++                              WARN_FUNC("unsupported intra-function call",
++                                        insn->sec, insn->offset);
++                              WARN("If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.");
+                               return -1;
+                       }
+-#endif
++
+               } else if (rela->sym->type == STT_SECTION) {
+                       insn->call_dest = find_symbol_by_offset(rela->sym->sec,
+                                                               rela->addend+4);
+@@ -648,6 +644,8 @@ static int handle_group_alt(struct objto
+ 
+               last_new_insn = insn;
+ 
++              insn->ignore = orig_insn->ignore_alts;
++
+               if (insn->type != INSN_JUMP_CONDITIONAL &&
+                   insn->type != INSN_JUMP_UNCONDITIONAL)
+                       continue;
+@@ -729,10 +727,6 @@ static int add_special_section_alts(stru
+                       goto out;
+               }
+ 
+-              /* Ignore retpoline alternatives. */
+-              if (orig_insn->ignore_alts)
+-                      continue;
+-
+               new_insn = NULL;
+               if (!special_alt->group || special_alt->new_len) {
+                       new_insn = find_insn(file, special_alt->new_sec,
+@@ -1089,11 +1083,11 @@ static int decode_sections(struct objtoo
+       if (ret)
+               return ret;
+ 
+-      ret = add_call_destinations(file);
++      ret = add_special_section_alts(file);
+       if (ret)
+               return ret;
+ 
+-      ret = add_special_section_alts(file);
++      ret = add_call_destinations(file);
+       if (ret)
+               return ret;
+ 
+@@ -1720,10 +1714,12 @@ static int validate_branch(struct objtoo
+ 
+               insn->visited = true;
+ 
+-              list_for_each_entry(alt, &insn->alts, list) {
+-                      ret = validate_branch(file, alt->insn, state);
+-                      if (ret)
+-                              return 1;
++              if (!insn->ignore_alts) {
++                      list_for_each_entry(alt, &insn->alts, list) {
++                              ret = validate_branch(file, alt->insn, state);
++                              if (ret)
++                                      return 1;
++                      }
+               }
+ 
+               switch (insn->type) {
 
--- /dev/null
+Subject: objtool: Warn on stripped section symbol
+From: Josh Poimboeuf jpoimboe@redhat.com
+Date: Mon Jan 29 22:00:41 2018 -0600
+
+From: Josh Poimboeuf jpoimboe@redhat.com
+
+commit 830c1e3d16b2c1733cd1ec9c8f4d47a398ae31bc
+
+With the following fix:
+
+  2a0098d70640 ("objtool: Fix seg fault with gold linker")
+
+... a seg fault was avoided, but the original seg fault condition in
+objtool wasn't fixed.  Replace the seg fault with an error message.
+
+Suggested-by: Ingo Molnar <mingo@kernel.org>
+Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
+Cc: Andy Lutomirski <luto@kernel.org>
+Cc: Borislav Petkov <bp@alien8.de>
+Cc: Dave Hansen <dave.hansen@linux.intel.com>
+Cc: David Woodhouse <dwmw2@infradead.org>
+Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Cc: Guenter Roeck <linux@roeck-us.net>
+Cc: H. Peter Anvin <hpa@zytor.com>
+Cc: Juergen Gross <jgross@suse.com>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Thomas Gleixner <tglx@linutronix.de>
+Link: http://lkml.kernel.org/r/dc4585a70d6b975c99fc51d1957ccdde7bd52f3a.1517284349.git.jpoimboe@redhat.com
+Signed-off-by: Ingo Molnar <mingo@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ tools/objtool/orc_gen.c |    5 +++++
+ 1 file changed, 5 insertions(+)
+
+--- a/tools/objtool/orc_gen.c
++++ b/tools/objtool/orc_gen.c
+@@ -98,6 +98,11 @@ static int create_orc_entry(struct secti
+       struct orc_entry *orc;
+       struct rela *rela;
+ 
++      if (!insn_sec->sym) {
++              WARN("missing symbol for section %s", insn_sec->name);
++              return -1;
++      }
++
+       /* populate ORC data */
+       orc = (struct orc_entry *)u_sec->data->d_buf + idx;
+       memcpy(orc, o, sizeof(*orc));
 
 iio-adc-accel-fix-up-module-licenses.patch
 pinctrl-pxa-pxa2xx-add-missing-module_description-author-license.patch
 asoc-pcm512x-add-missing-module_description-author-license.patch
+KVM_nVMX_Eliminate_vmcs02_pool.patch
+KVM_VMX_introduce_alloc_loaded_vmcs.patch
+objtool_Improve_retpoline_alternative_handling.patch
+objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch
+objtool_Warn_on_stripped_section_symbol.patch
+x86mm_Fix_overlap_of_i386_CPU_ENTRY_AREA_with_FIX_BTMAP.patch
+x86spectre_Check_CONFIG_RETPOLINE_in_command_line_parser.patch
+x86entry64_Remove_the_SYSCALL64_fast_path.patch
+x86entry64_Push_extra_regs_right_away.patch
+x86asm_Move_status_from_thread_struct_to_thread_info.patch
+Documentation_Document_array_index_nospec.patch
+array_index_nospec_Sanitize_speculative_array_de-references.patch
+x86_Implement_array_index_mask_nospec.patch
+x86_Introduce_barrier_nospec.patch
+x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
+x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch
+x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch
+x86get_user_Use_pointer_masking_to_limit_speculation.patch
+x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch
+vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch
+nl80211_Sanitize_array_index_in_parse_txq_params.patch
+x86spectre_Report_get_user_mitigation_for_spectre_v1.patch
+x86spectre_Fix_spelling_mistake_vunerable-_vulnerable.patch
+x86cpuid_Fix_up_virtual_IBRSIBPBSTIBP_feature_bits_on_Intel.patch
+x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch
+x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch
+KVM_VMX_make_MSR_bitmaps_per-VCPU.patch
+x86kvm_Update_spectre-v1_mitigation.patch
+x86retpoline_Avoid_retpolines_for_built-in___init_functions.patch
+x86spectre_Simplify_spectre_v2_command_line_parsing.patch
+x86pti_Mark_constant_arrays_as___initconst.patch
+x86speculation_Fix_typo_IBRS_ATT_which_should_be_IBRS_ALL.patch
+KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch
+KVMx86_Add_IBPB_support.patch
+KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch
+KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
+KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch
 
--- /dev/null
+Subject: vfs, fdtable: Prevent bounds-check bypass via speculative execution
+From: Dan Williams dan.j.williams@intel.com
+Date: Mon Jan 29 17:03:05 2018 -0800
+
+From: Dan Williams dan.j.williams@intel.com
+
+commit 56c30ba7b348b90484969054d561f711ba196507
+
+'fd' is a user controlled value that is used as a data dependency to
+read from the 'fdt->fd' array.  In order to avoid potential leaks of
+kernel memory values, block speculative execution of the instruction
+stream that could issue reads based on an invalid 'file *' returned from
+__fcheck_files.
+
+Co-developed-by: Elena Reshetova <elena.reshetova@intel.com>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: linux-arch@vger.kernel.org
+Cc: kernel-hardening@lists.openwall.com
+Cc: gregkh@linuxfoundation.org
+Cc: Al Viro <viro@zeniv.linux.org.uk>
+Cc: torvalds@linux-foundation.org
+Cc: alan@linux.intel.com
+Link: https://lkml.kernel.org/r/151727418500.33451.17392199002892248656.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ include/linux/fdtable.h |    5 ++++-
+ 1 file changed, 4 insertions(+), 1 deletion(-)
+
+--- a/include/linux/fdtable.h
++++ b/include/linux/fdtable.h
+@@ -10,6 +10,7 @@
+ #include <linux/compiler.h>
+ #include <linux/spinlock.h>
+ #include <linux/rcupdate.h>
++#include <linux/nospec.h>
+ #include <linux/types.h>
+ #include <linux/init.h>
+ #include <linux/fs.h>
+@@ -82,8 +83,10 @@ static inline struct file *__fcheck_file
+ {
+       struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+ 
+-      if (fd < fdt->max_fds)
++      if (fd < fdt->max_fds) {
++              fd = array_index_nospec(fd, fdt->max_fds);
+               return rcu_dereference_raw(fdt->fd[fd]);
++      }
+       return NULL;
+ }
+ 
 
--- /dev/null
+Subject: x86: Implement array_index_mask_nospec
+From: Dan Williams dan.j.williams@intel.com
+Date: Mon Jan 29 17:02:28 2018 -0800
+
+From: Dan Williams dan.j.williams@intel.com
+
+commit babdde2698d482b6c0de1eab4f697cf5856c5859
+
+array_index_nospec() uses a mask to sanitize user controllable array
+indexes, i.e. generate a 0 mask if 'index' >= 'size', and a ~0 mask
+otherwise. While the default array_index_mask_nospec() handles the
+carry-bit from the (index - size) result in software.
+
+The x86 array_index_mask_nospec() does the same, but the carry-bit is
+handled in the processor CF flag without conditional instructions in the
+control flow.
+
+Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: linux-arch@vger.kernel.org
+Cc: kernel-hardening@lists.openwall.com
+Cc: gregkh@linuxfoundation.org
+Cc: alan@linux.intel.com
+Link: https://lkml.kernel.org/r/151727414808.33451.1873237130672785331.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/include/asm/barrier.h |   24 ++++++++++++++++++++++++
+ 1 file changed, 24 insertions(+)
+
+--- a/arch/x86/include/asm/barrier.h
++++ b/arch/x86/include/asm/barrier.h
+@@ -24,6 +24,30 @@
+ #define wmb() asm volatile("sfence" ::: "memory")
+ #endif
+ 
++/**
++ * array_index_mask_nospec() - generate a mask that is ~0UL when the
++ *    bounds check succeeds and 0 otherwise
++ * @index: array element index
++ * @size: number of elements in array
++ *
++ * Returns:
++ *     0 - (index < size)
++ */
++static inline unsigned long array_index_mask_nospec(unsigned long index,
++              unsigned long size)
++{
++      unsigned long mask;
++
++      asm ("cmp %1,%2; sbb %0,%0;"
++                      :"=r" (mask)
++                      :"r"(size),"r" (index)
++                      :"cc");
++      return mask;
++}
++
++/* Override the default implementation from linux/nospec.h. */
++#define array_index_mask_nospec array_index_mask_nospec
++
+ #ifdef CONFIG_X86_PPRO_FENCE
+ #define dma_rmb()     rmb()
+ #else
 
--- /dev/null
+Subject: x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec
+From: Dan Williams dan.j.williams@intel.com
+Date: Mon Jan 29 17:02:39 2018 -0800
+
+From: Dan Williams dan.j.williams@intel.com
+
+commit b3bbfb3fb5d25776b8e3f361d2eedaabb0b496cd
+
+For __get_user() paths, do not allow the kernel to speculate on the value
+of a user controlled pointer. In addition to the 'stac' instruction for
+Supervisor Mode Access Protection (SMAP), a barrier_nospec() causes the
+access_ok() result to resolve in the pipeline before the CPU might take any
+speculative action on the pointer value. Given the cost of 'stac' the
+speculation barrier is placed after 'stac' to hopefully overlap the cost of
+disabling SMAP with the cost of flushing the instruction pipeline.
+
+Since __get_user is a major kernel interface that deals with user
+controlled pointers, the __uaccess_begin_nospec() mechanism will prevent
+speculative execution past an access_ok() permission check. While
+speculative execution past access_ok() is not enough to lead to a kernel
+memory leak, it is a necessary precondition.
+
+To be clear, __uaccess_begin_nospec() is addressing a class of potential
+problems near __get_user() usages.
+
+Note, that while the barrier_nospec() in __uaccess_begin_nospec() is used
+to protect __get_user(), pointer masking similar to array_index_nospec()
+will be used for get_user() since it incorporates a bounds check near the
+usage.
+
+uaccess_try_nospec provides the same mechanism for get_user_try.
+
+No functional changes.
+
+Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
+Suggested-by: Andi Kleen <ak@linux.intel.com>
+Suggested-by: Ingo Molnar <mingo@redhat.com>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: linux-arch@vger.kernel.org
+Cc: Tom Lendacky <thomas.lendacky@amd.com>
+Cc: Kees Cook <keescook@chromium.org>
+Cc: kernel-hardening@lists.openwall.com
+Cc: gregkh@linuxfoundation.org
+Cc: Al Viro <viro@zeniv.linux.org.uk>
+Cc: alan@linux.intel.com
+Link: https://lkml.kernel.org/r/151727415922.33451.5796614273104346583.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/include/asm/uaccess.h |    9 +++++++++
+ 1 file changed, 9 insertions(+)
+
+--- a/arch/x86/include/asm/uaccess.h
++++ b/arch/x86/include/asm/uaccess.h
+@@ -124,6 +124,11 @@ extern int __get_user_bad(void);
+ 
+ #define __uaccess_begin() stac()
+ #define __uaccess_end()   clac()
++#define __uaccess_begin_nospec()      \
++({                                    \
++      stac();                         \
++      barrier_nospec();               \
++})
+ 
+ /*
+  * This is a type: either unsigned long, if the argument fits into
+@@ -487,6 +492,10 @@ struct __large_struct { unsigned long bu
+       __uaccess_begin();                                              \
+       barrier();
+ 
++#define uaccess_try_nospec do {                                               \
++      current->thread.uaccess_err = 0;                                \
++      __uaccess_begin_nospec();                                       \
++
+ #define uaccess_catch(err)                                            \
+       __uaccess_end();                                                \
+       (err) |= (current->thread.uaccess_err ? -EFAULT : 0);           \
 
--- /dev/null
+Subject: x86: Introduce barrier_nospec
+From: Dan Williams dan.j.williams@intel.com
+Date: Mon Jan 29 17:02:33 2018 -0800
+
+From: Dan Williams dan.j.williams@intel.com
+
+commit b3d7ad85b80bbc404635dca80f5b129f6242bc7a
+
+Rename the open coded form of this instruction sequence from
+rdtsc_ordered() into a generic barrier primitive, barrier_nospec().
+
+One of the mitigations for Spectre variant1 vulnerabilities is to fence
+speculative execution after successfully validating a bounds check. I.e.
+force the result of a bounds check to resolve in the instruction pipeline
+to ensure speculative execution honors that result before potentially
+operating on out-of-bounds data.
+
+No functional changes.
+
+Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
+Suggested-by: Andi Kleen <ak@linux.intel.com>
+Suggested-by: Ingo Molnar <mingo@redhat.com>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: linux-arch@vger.kernel.org
+Cc: Tom Lendacky <thomas.lendacky@amd.com>
+Cc: Kees Cook <keescook@chromium.org>
+Cc: kernel-hardening@lists.openwall.com
+Cc: gregkh@linuxfoundation.org
+Cc: Al Viro <viro@zeniv.linux.org.uk>
+Cc: alan@linux.intel.com
+Link: https://lkml.kernel.org/r/151727415361.33451.9049453007262764675.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/include/asm/barrier.h |    4 ++++
+ arch/x86/include/asm/msr.h     |    3 +--
+ 2 files changed, 5 insertions(+), 2 deletions(-)
+
+--- a/arch/x86/include/asm/barrier.h
++++ b/arch/x86/include/asm/barrier.h
+@@ -48,6 +48,10 @@ static inline unsigned long array_index_
+ /* Override the default implementation from linux/nospec.h. */
+ #define array_index_mask_nospec array_index_mask_nospec
+ 
++/* Prevent speculative execution past this barrier. */
++#define barrier_nospec() alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC, \
++                                         "lfence", X86_FEATURE_LFENCE_RDTSC)
++
+ #ifdef CONFIG_X86_PPRO_FENCE
+ #define dma_rmb()     rmb()
+ #else
+--- a/arch/x86/include/asm/msr.h
++++ b/arch/x86/include/asm/msr.h
+@@ -214,8 +214,7 @@ static __always_inline unsigned long lon
+        * that some other imaginary CPU is updating continuously with a
+        * time stamp.
+        */
+-      alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
+-                        "lfence", X86_FEATURE_LFENCE_RDTSC);
++      barrier_nospec();
+       return rdtsc();
+ }
+ 
 
--- /dev/null
+Subject: x86/asm: Move 'status' from thread_struct to thread_info
+From: Andy Lutomirski luto@kernel.org
+Date: Sun Jan 28 10:38:50 2018 -0800
+
+From: Andy Lutomirski luto@kernel.org
+
+commit 37a8f7c38339b22b69876d6f5a0ab851565284e3
+
+The TS_COMPAT bit is very hot and is accessed from code paths that mostly
+also touch thread_info::flags.  Move it into struct thread_info to improve
+cache locality.
+
+The only reason it was in thread_struct is that there was a brief period
+during which arch-specific fields were not allowed in struct thread_info.
+
+Linus suggested further changing:
+
+  ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
+
+to:
+
+  if (unlikely(ti->status & (TS_COMPAT|TS_I386_REGS_POKED)))
+          ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
+
+on the theory that frequently dirtying the cacheline even in pure 64-bit
+code that never needs to modify status hurts performance.  That could be a
+reasonable followup patch, but I suspect it matters less on top of this
+patch.
+
+Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Andy Lutomirski <luto@kernel.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Ingo Molnar <mingo@kernel.org>
+Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Borislav Petkov <bp@alien8.de>
+Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
+Link: https://lkml.kernel.org/r/03148bcc1b217100e6e8ecf6a5468c45cf4304b6.1517164461.git.luto@kernel.org
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/entry/common.c            |    4 ++--
+ arch/x86/include/asm/processor.h   |    2 --
+ arch/x86/include/asm/syscall.h     |    6 +++---
+ arch/x86/include/asm/thread_info.h |    3 ++-
+ arch/x86/kernel/process_64.c       |    4 ++--
+ arch/x86/kernel/ptrace.c           |    2 +-
+ arch/x86/kernel/signal.c           |    2 +-
+ 7 files changed, 11 insertions(+), 12 deletions(-)
+
+--- a/arch/x86/entry/common.c
++++ b/arch/x86/entry/common.c
+@@ -208,7 +208,7 @@ __visible inline void prepare_exit_to_us
+        * special case only applies after poking regs and before the
+        * very next return to user mode.
+        */
+-      current->thread.status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
++      ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
+ #endif
+ 
+       user_enter_irqoff();
+@@ -306,7 +306,7 @@ static __always_inline void do_syscall_3
+       unsigned int nr = (unsigned int)regs->orig_ax;
+ 
+ #ifdef CONFIG_IA32_EMULATION
+-      current->thread.status |= TS_COMPAT;
++      ti->status |= TS_COMPAT;
+ #endif
+ 
+       if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) {
+--- a/arch/x86/include/asm/processor.h
++++ b/arch/x86/include/asm/processor.h
+@@ -459,8 +459,6 @@ struct thread_struct {
+       unsigned short          gsindex;
+ #endif
+ 
+-      u32                     status;         /* thread synchronous flags */
+-
+ #ifdef CONFIG_X86_64
+       unsigned long           fsbase;
+       unsigned long           gsbase;
+--- a/arch/x86/include/asm/syscall.h
++++ b/arch/x86/include/asm/syscall.h
+@@ -60,7 +60,7 @@ static inline long syscall_get_error(str
+        * TS_COMPAT is set for 32-bit syscall entries and then
+        * remains set until we return to user mode.
+        */
+-      if (task->thread.status & (TS_COMPAT|TS_I386_REGS_POKED))
++      if (task->thread_info.status & (TS_COMPAT|TS_I386_REGS_POKED))
+               /*
+                * Sign-extend the value so (int)-EFOO becomes (long)-EFOO
+                * and will match correctly in comparisons.
+@@ -116,7 +116,7 @@ static inline void syscall_get_arguments
+                                        unsigned long *args)
+ {
+ # ifdef CONFIG_IA32_EMULATION
+-      if (task->thread.status & TS_COMPAT)
++      if (task->thread_info.status & TS_COMPAT)
+               switch (i) {
+               case 0:
+                       if (!n--) break;
+@@ -177,7 +177,7 @@ static inline void syscall_set_arguments
+                                        const unsigned long *args)
+ {
+ # ifdef CONFIG_IA32_EMULATION
+-      if (task->thread.status & TS_COMPAT)
++      if (task->thread_info.status & TS_COMPAT)
+               switch (i) {
+               case 0:
+                       if (!n--) break;
+--- a/arch/x86/include/asm/thread_info.h
++++ b/arch/x86/include/asm/thread_info.h
+@@ -55,6 +55,7 @@ struct task_struct;
+ 
+ struct thread_info {
+       unsigned long           flags;          /* low level flags */
++      u32                     status;         /* thread synchronous flags */
+ };
+ 
+ #define INIT_THREAD_INFO(tsk)                 \
+@@ -221,7 +222,7 @@ static inline int arch_within_stack_fram
+ #define in_ia32_syscall() true
+ #else
+ #define in_ia32_syscall() (IS_ENABLED(CONFIG_IA32_EMULATION) && \
+-                         current->thread.status & TS_COMPAT)
++                         current_thread_info()->status & TS_COMPAT)
+ #endif
+ 
+ /*
+--- a/arch/x86/kernel/process_64.c
++++ b/arch/x86/kernel/process_64.c
+@@ -557,7 +557,7 @@ static void __set_personality_x32(void)
+        * Pretend to come from a x32 execve.
+        */
+       task_pt_regs(current)->orig_ax = __NR_x32_execve | __X32_SYSCALL_BIT;
+-      current->thread.status &= ~TS_COMPAT;
++      current_thread_info()->status &= ~TS_COMPAT;
+ #endif
+ }
+ 
+@@ -571,7 +571,7 @@ static void __set_personality_ia32(void)
+       current->personality |= force_personality32;
+       /* Prepare the first "return" to user space */
+       task_pt_regs(current)->orig_ax = __NR_ia32_execve;
+-      current->thread.status |= TS_COMPAT;
++      current_thread_info()->status |= TS_COMPAT;
+ #endif
+ }
+ 
+--- a/arch/x86/kernel/ptrace.c
++++ b/arch/x86/kernel/ptrace.c
+@@ -935,7 +935,7 @@ static int putreg32(struct task_struct *
+                */
+               regs->orig_ax = value;
+               if (syscall_get_nr(child, regs) >= 0)
+-                      child->thread.status |= TS_I386_REGS_POKED;
++                      child->thread_info.status |= TS_I386_REGS_POKED;
+               break;
+ 
+       case offsetof(struct user32, regs.eflags):
+--- a/arch/x86/kernel/signal.c
++++ b/arch/x86/kernel/signal.c
+@@ -787,7 +787,7 @@ static inline unsigned long get_nr_resta
+        * than the tracee.
+        */
+ #ifdef CONFIG_IA32_EMULATION
+-      if (current->thread.status & (TS_COMPAT|TS_I386_REGS_POKED))
++      if (current_thread_info()->status & (TS_COMPAT|TS_I386_REGS_POKED))
+               return __NR_ia32_restart_syscall;
+ #endif
+ #ifdef CONFIG_X86_X32_ABI
 
--- /dev/null
+Subject: x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
+From: David Woodhouse dwmw@amazon.co.uk
+Date: Tue Jan 30 14:30:23 2018 +0000
+
+From: David Woodhouse dwmw@amazon.co.uk
+
+commit 7fcae1118f5fd44a862aa5c3525248e35ee67c3b
+
+Despite the fact that all the other code there seems to be doing it, just
+using set_cpu_cap() in early_intel_init() doesn't actually work.
+
+For CPUs with PKU support, setup_pku() calls get_cpu_cap() after
+c->c_init() has set those feature bits. That resets those bits back to what
+was queried from the hardware.
+
+Turning the bits off for bad microcode is easy to fix. That can just use
+setup_clear_cpu_cap() to force them off for all CPUs.
+
+I was less keen on forcing the feature bits *on* that way, just in case
+of inconsistencies. I appreciate that the kernel is going to get this
+utterly wrong if CPU features are not consistent, because it has already
+applied alternatives by the time secondary CPUs are brought up.
+
+But at least if setup_force_cpu_cap() isn't being used, we might have a
+chance of *detecting* the lack of the corresponding bit and either
+panicking or refusing to bring the offending CPU online.
+
+So ensure that the appropriate feature bits are set within get_cpu_cap()
+regardless of how many extra times it's called.
+
+Fixes: 2961298e ("x86/cpufeatures: Clean up Spectre v2 related CPUID flags")
+Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: karahmed@amazon.de
+Cc: peterz@infradead.org
+Cc: bp@alien8.de
+Link: https://lkml.kernel.org/r/1517322623-15261-1-git-send-email-dwmw@amazon.co.uk
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/kernel/cpu/common.c |   21 +++++++++++++++++++++
+ arch/x86/kernel/cpu/intel.c  |   27 ++++++++-------------------
+ 2 files changed, 29 insertions(+), 19 deletions(-)
+
+--- a/arch/x86/kernel/cpu/common.c
++++ b/arch/x86/kernel/cpu/common.c
+@@ -726,6 +726,26 @@ static void apply_forced_caps(struct cpu
+       }
+ }
+ 
++static void init_speculation_control(struct cpuinfo_x86 *c)
++{
++      /*
++       * The Intel SPEC_CTRL CPUID bit implies IBRS and IBPB support,
++       * and they also have a different bit for STIBP support. Also,
++       * a hypervisor might have set the individual AMD bits even on
++       * Intel CPUs, for finer-grained selection of what's available.
++       *
++       * We use the AMD bits in 0x8000_0008 EBX as the generic hardware
++       * features, which are visible in /proc/cpuinfo and used by the
++       * kernel. So set those accordingly from the Intel bits.
++       */
++      if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
++              set_cpu_cap(c, X86_FEATURE_IBRS);
++              set_cpu_cap(c, X86_FEATURE_IBPB);
++      }
++      if (cpu_has(c, X86_FEATURE_INTEL_STIBP))
++              set_cpu_cap(c, X86_FEATURE_STIBP);
++}
++
+ void get_cpu_cap(struct cpuinfo_x86 *c)
+ {
+       u32 eax, ebx, ecx, edx;
+@@ -820,6 +840,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
+               c->x86_capability[CPUID_8000_000A_EDX] = cpuid_edx(0x8000000a);
+ 
+       init_scattered_cpuid_features(c);
++      init_speculation_control(c);
+ 
+       /*
+        * Clear/Set all flags overridden by options, after probe.
+--- a/arch/x86/kernel/cpu/intel.c
++++ b/arch/x86/kernel/cpu/intel.c
+@@ -175,28 +175,17 @@ static void early_init_intel(struct cpui
+       if (c->x86 >= 6 && !cpu_has(c, X86_FEATURE_IA64))
+               c->microcode = intel_get_microcode_revision();
+ 
+-      /*
+-       * The Intel SPEC_CTRL CPUID bit implies IBRS and IBPB support,
+-       * and they also have a different bit for STIBP support. Also,
+-       * a hypervisor might have set the individual AMD bits even on
+-       * Intel CPUs, for finer-grained selection of what's available.
+-       */
+-      if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) {
+-              set_cpu_cap(c, X86_FEATURE_IBRS);
+-              set_cpu_cap(c, X86_FEATURE_IBPB);
+-      }
+-      if (cpu_has(c, X86_FEATURE_INTEL_STIBP))
+-              set_cpu_cap(c, X86_FEATURE_STIBP);
+-
+       /* Now if any of them are set, check the blacklist and clear the lot */
+-      if ((cpu_has(c, X86_FEATURE_IBRS) || cpu_has(c, X86_FEATURE_IBPB) ||
++      if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) ||
++           cpu_has(c, X86_FEATURE_INTEL_STIBP) ||
++           cpu_has(c, X86_FEATURE_IBRS) || cpu_has(c, X86_FEATURE_IBPB) ||
+            cpu_has(c, X86_FEATURE_STIBP)) && bad_spectre_microcode(c)) {
+               pr_warn("Intel Spectre v2 broken microcode detected; disabling Speculation Control\n");
+-              clear_cpu_cap(c, X86_FEATURE_IBRS);
+-              clear_cpu_cap(c, X86_FEATURE_IBPB);
+-              clear_cpu_cap(c, X86_FEATURE_STIBP);
+-              clear_cpu_cap(c, X86_FEATURE_SPEC_CTRL);
+-              clear_cpu_cap(c, X86_FEATURE_INTEL_STIBP);
++              setup_clear_cpu_cap(X86_FEATURE_IBRS);
++              setup_clear_cpu_cap(X86_FEATURE_IBPB);
++              setup_clear_cpu_cap(X86_FEATURE_STIBP);
++              setup_clear_cpu_cap(X86_FEATURE_SPEC_CTRL);
++              setup_clear_cpu_cap(X86_FEATURE_INTEL_STIBP);
+       }
+ 
+       /*
 
--- /dev/null
+Subject: x86/entry/64: Push extra regs right away
+From: Andy Lutomirski luto@kernel.org
+Date: Sun Jan 28 10:38:49 2018 -0800
+
+From: Andy Lutomirski luto@kernel.org
+
+commit d1f7732009e0549eedf8ea1db948dc37be77fd46
+
+With the fast path removed there is no point in splitting the push of the
+normal and the extra register set. Just push the extra regs right away.
+
+[ tglx: Split out from 'x86/entry/64: Remove the SYSCALL64 fast path' ]
+
+Signed-off-by: Andy Lutomirski <luto@kernel.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Acked-by: Ingo Molnar <mingo@kernel.org>
+Cc: Borislav Petkov <bp@alien8.de>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
+Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.1517164461.git.luto@kernel.org
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/entry/entry_64.S |   10 +++++++---
+ 1 file changed, 7 insertions(+), 3 deletions(-)
+
+--- a/arch/x86/entry/entry_64.S
++++ b/arch/x86/entry/entry_64.S
+@@ -232,13 +232,17 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
+       pushq   %r9                             /* pt_regs->r9 */
+       pushq   %r10                            /* pt_regs->r10 */
+       pushq   %r11                            /* pt_regs->r11 */
+-      sub     $(6*8), %rsp                    /* pt_regs->bp, bx, r12-15 not saved */
+-      UNWIND_HINT_REGS extra=0
++      pushq   %rbx                            /* pt_regs->rbx */
++      pushq   %rbp                            /* pt_regs->rbp */
++      pushq   %r12                            /* pt_regs->r12 */
++      pushq   %r13                            /* pt_regs->r13 */
++      pushq   %r14                            /* pt_regs->r14 */
++      pushq   %r15                            /* pt_regs->r15 */
++      UNWIND_HINT_REGS
+ 
+       TRACE_IRQS_OFF
+ 
+       /* IRQs are off. */
+-      SAVE_EXTRA_REGS
+       movq    %rsp, %rdi
+       call    do_syscall_64           /* returns with IRQs disabled */
+ 
 
--- /dev/null
+Subject: x86/entry/64: Remove the SYSCALL64 fast path
+From: Andy Lutomirski luto@kernel.org
+Date: Sun Jan 28 10:38:49 2018 -0800
+
+From: Andy Lutomirski luto@kernel.org
+
+commit 21d375b6b34ff511a507de27bf316b3dde6938d9
+
+The SYCALLL64 fast path was a nice, if small, optimization back in the good
+old days when syscalls were actually reasonably fast.  Now there is PTI to
+slow everything down, and indirect branches are verboten, making everything
+messier.  The retpoline code in the fast path is particularly nasty.
+
+Just get rid of the fast path. The slow path is barely slower.
+
+[ tglx: Split out the 'push all extra regs' part ]
+
+Signed-off-by: Andy Lutomirski <luto@kernel.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Acked-by: Ingo Molnar <mingo@kernel.org>
+Cc: Borislav Petkov <bp@alien8.de>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
+Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.1517164461.git.luto@kernel.org
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/entry/entry_64.S   |  117 --------------------------------------------
+ arch/x86/entry/syscall_64.c |    7 --
+ 2 files changed, 2 insertions(+), 122 deletions(-)
+
+--- a/arch/x86/entry/entry_64.S
++++ b/arch/x86/entry/entry_64.S
+@@ -237,86 +237,11 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
+ 
+       TRACE_IRQS_OFF
+ 
+-      /*
+-       * If we need to do entry work or if we guess we'll need to do
+-       * exit work, go straight to the slow path.
+-       */
+-      movq    PER_CPU_VAR(current_task), %r11
+-      testl   $_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
+-      jnz     entry_SYSCALL64_slow_path
+-
+-entry_SYSCALL_64_fastpath:
+-      /*
+-       * Easy case: enable interrupts and issue the syscall.  If the syscall
+-       * needs pt_regs, we'll call a stub that disables interrupts again
+-       * and jumps to the slow path.
+-       */
+-      TRACE_IRQS_ON
+-      ENABLE_INTERRUPTS(CLBR_NONE)
+-#if __SYSCALL_MASK == ~0
+-      cmpq    $__NR_syscall_max, %rax
+-#else
+-      andl    $__SYSCALL_MASK, %eax
+-      cmpl    $__NR_syscall_max, %eax
+-#endif
+-      ja      1f                              /* return -ENOSYS (already in pt_regs->ax) */
+-      movq    %r10, %rcx
+-
+-      /*
+-       * This call instruction is handled specially in stub_ptregs_64.
+-       * It might end up jumping to the slow path.  If it jumps, RAX
+-       * and all argument registers are clobbered.
+-       */
+-#ifdef CONFIG_RETPOLINE
+-      movq    sys_call_table(, %rax, 8), %rax
+-      call    __x86_indirect_thunk_rax
+-#else
+-      call    *sys_call_table(, %rax, 8)
+-#endif
+-.Lentry_SYSCALL_64_after_fastpath_call:
+-
+-      movq    %rax, RAX(%rsp)
+-1:
+-
+-      /*
+-       * If we get here, then we know that pt_regs is clean for SYSRET64.
+-       * If we see that no exit work is required (which we are required
+-       * to check with IRQs off), then we can go straight to SYSRET64.
+-       */
+-      DISABLE_INTERRUPTS(CLBR_ANY)
+-      TRACE_IRQS_OFF
+-      movq    PER_CPU_VAR(current_task), %r11
+-      testl   $_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
+-      jnz     1f
+-
+-      LOCKDEP_SYS_EXIT
+-      TRACE_IRQS_ON           /* user mode is traced as IRQs on */
+-      movq    RIP(%rsp), %rcx
+-      movq    EFLAGS(%rsp), %r11
+-      addq    $6*8, %rsp      /* skip extra regs -- they were preserved */
+-      UNWIND_HINT_EMPTY
+-      jmp     .Lpop_c_regs_except_rcx_r11_and_sysret
+-
+-1:
+-      /*
+-       * The fast path looked good when we started, but something changed
+-       * along the way and we need to switch to the slow path.  Calling
+-       * raise(3) will trigger this, for example.  IRQs are off.
+-       */
+-      TRACE_IRQS_ON
+-      ENABLE_INTERRUPTS(CLBR_ANY)
+-      SAVE_EXTRA_REGS
+-      movq    %rsp, %rdi
+-      call    syscall_return_slowpath /* returns with IRQs disabled */
+-      jmp     return_from_SYSCALL_64
+-
+-entry_SYSCALL64_slow_path:
+       /* IRQs are off. */
+       SAVE_EXTRA_REGS
+       movq    %rsp, %rdi
+       call    do_syscall_64           /* returns with IRQs disabled */
+ 
+-return_from_SYSCALL_64:
+       TRACE_IRQS_IRETQ                /* we're about to change IF */
+ 
+       /*
+@@ -389,7 +314,6 @@ syscall_return_via_sysret:
+       /* rcx and r11 are already restored (see code above) */
+       UNWIND_HINT_EMPTY
+       POP_EXTRA_REGS
+-.Lpop_c_regs_except_rcx_r11_and_sysret:
+       popq    %rsi    /* skip r11 */
+       popq    %r10
+       popq    %r9
+@@ -420,47 +344,6 @@ syscall_return_via_sysret:
+       USERGS_SYSRET64
+ END(entry_SYSCALL_64)
+ 
+-ENTRY(stub_ptregs_64)
+-      /*
+-       * Syscalls marked as needing ptregs land here.
+-       * If we are on the fast path, we need to save the extra regs,
+-       * which we achieve by trying again on the slow path.  If we are on
+-       * the slow path, the extra regs are already saved.
+-       *
+-       * RAX stores a pointer to the C function implementing the syscall.
+-       * IRQs are on.
+-       */
+-      cmpq    $.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
+-      jne     1f
+-
+-      /*
+-       * Called from fast path -- disable IRQs again, pop return address
+-       * and jump to slow path
+-       */
+-      DISABLE_INTERRUPTS(CLBR_ANY)
+-      TRACE_IRQS_OFF
+-      popq    %rax
+-      UNWIND_HINT_REGS extra=0
+-      jmp     entry_SYSCALL64_slow_path
+-
+-1:
+-      JMP_NOSPEC %rax                         /* Called from C */
+-END(stub_ptregs_64)
+-
+-.macro ptregs_stub func
+-ENTRY(ptregs_\func)
+-      UNWIND_HINT_FUNC
+-      leaq    \func(%rip), %rax
+-      jmp     stub_ptregs_64
+-END(ptregs_\func)
+-.endm
+-
+-/* Instantiate ptregs_stub for each ptregs-using syscall */
+-#define __SYSCALL_64_QUAL_(sym)
+-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
+-#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
+-#include <asm/syscalls_64.h>
+-
+ /*
+  * %rdi: prev task
+  * %rsi: next task
+--- a/arch/x86/entry/syscall_64.c
++++ b/arch/x86/entry/syscall_64.c
+@@ -7,14 +7,11 @@
+ #include <asm/asm-offsets.h>
+ #include <asm/syscall.h>
+ 
+-#define __SYSCALL_64_QUAL_(sym) sym
+-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym
+-
+-#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
++#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+ #include <asm/syscalls_64.h>
+ #undef __SYSCALL_64
+ 
+-#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
++#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
+ 
+ extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+ 
 
--- /dev/null
+Subject: x86/get_user: Use pointer masking to limit speculation
+From: Dan Williams dan.j.williams@intel.com
+Date: Mon Jan 29 17:02:54 2018 -0800
+
+From: Dan Williams dan.j.williams@intel.com
+
+commit c7f631cb07e7da06ac1d231ca178452339e32a94
+
+Quoting Linus:
+
+    I do think that it would be a good idea to very expressly document
+    the fact that it's not that the user access itself is unsafe. I do
+    agree that things like "get_user()" want to be protected, but not
+    because of any direct bugs or problems with get_user() and friends,
+    but simply because get_user() is an excellent source of a pointer
+    that is obviously controlled from a potentially attacking user
+    space. So it's a prime candidate for then finding _subsequent_
+    accesses that can then be used to perturb the cache.
+
+Unlike the __get_user() case get_user() includes the address limit check
+near the pointer de-reference. With that locality the speculation can be
+mitigated with pointer narrowing rather than a barrier, i.e.
+array_index_nospec(). Where the narrowing is performed by:
+
+       cmp %limit, %ptr
+       sbb %mask, %mask
+       and %mask, %ptr
+
+With respect to speculation the value of %ptr is either less than %limit
+or NULL.
+
+Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: linux-arch@vger.kernel.org
+Cc: Kees Cook <keescook@chromium.org>
+Cc: kernel-hardening@lists.openwall.com
+Cc: gregkh@linuxfoundation.org
+Cc: Al Viro <viro@zeniv.linux.org.uk>
+Cc: Andy Lutomirski <luto@kernel.org>
+Cc: torvalds@linux-foundation.org
+Cc: alan@linux.intel.com
+Link: https://lkml.kernel.org/r/151727417469.33451.11804043010080838495.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/lib/getuser.S |   10 ++++++++++
+ 1 file changed, 10 insertions(+)
+
+--- a/arch/x86/lib/getuser.S
++++ b/arch/x86/lib/getuser.S
+@@ -40,6 +40,8 @@ ENTRY(__get_user_1)
+       mov PER_CPU_VAR(current_task), %_ASM_DX
+       cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
+       jae bad_get_user
++      sbb %_ASM_DX, %_ASM_DX          /* array_index_mask_nospec() */
++      and %_ASM_DX, %_ASM_AX
+       ASM_STAC
+ 1:    movzbl (%_ASM_AX),%edx
+       xor %eax,%eax
+@@ -54,6 +56,8 @@ ENTRY(__get_user_2)
+       mov PER_CPU_VAR(current_task), %_ASM_DX
+       cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
+       jae bad_get_user
++      sbb %_ASM_DX, %_ASM_DX          /* array_index_mask_nospec() */
++      and %_ASM_DX, %_ASM_AX
+       ASM_STAC
+ 2:    movzwl -1(%_ASM_AX),%edx
+       xor %eax,%eax
+@@ -68,6 +72,8 @@ ENTRY(__get_user_4)
+       mov PER_CPU_VAR(current_task), %_ASM_DX
+       cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
+       jae bad_get_user
++      sbb %_ASM_DX, %_ASM_DX          /* array_index_mask_nospec() */
++      and %_ASM_DX, %_ASM_AX
+       ASM_STAC
+ 3:    movl -3(%_ASM_AX),%edx
+       xor %eax,%eax
+@@ -83,6 +89,8 @@ ENTRY(__get_user_8)
+       mov PER_CPU_VAR(current_task), %_ASM_DX
+       cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
+       jae bad_get_user
++      sbb %_ASM_DX, %_ASM_DX          /* array_index_mask_nospec() */
++      and %_ASM_DX, %_ASM_AX
+       ASM_STAC
+ 4:    movq -7(%_ASM_AX),%rdx
+       xor %eax,%eax
+@@ -94,6 +102,8 @@ ENTRY(__get_user_8)
+       mov PER_CPU_VAR(current_task), %_ASM_DX
+       cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
+       jae bad_get_user_8
++      sbb %_ASM_DX, %_ASM_DX          /* array_index_mask_nospec() */
++      and %_ASM_DX, %_ASM_AX
+       ASM_STAC
+ 4:    movl -7(%_ASM_AX),%edx
+ 5:    movl -3(%_ASM_AX),%ecx
 
--- /dev/null
+Subject: x86/kvm: Update spectre-v1 mitigation
+From: Dan Williams dan.j.williams@intel.com
+Date: Wed Jan 31 17:47:03 2018 -0800
+
+From: Dan Williams dan.j.williams@intel.com
+
+commit 085331dfc6bbe3501fb936e657331ca943827600
+
+Commit 75f139aaf896 "KVM: x86: Add memory barrier on vmcs field lookup"
+added a raw 'asm("lfence");' to prevent a bounds check bypass of
+'vmcs_field_to_offset_table'.
+
+The lfence can be avoided in this path by using the array_index_nospec()
+helper designed for these types of fixes.
+
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Acked-by: Paolo Bonzini <pbonzini@redhat.com>
+Cc: Andrew Honig <ahonig@google.com>
+Cc: kvm@vger.kernel.org
+Cc: Jim Mattson <jmattson@google.com>
+Link: https://lkml.kernel.org/r/151744959670.6342.3001723920950249067.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/kvm/vmx.c |   20 +++++++++-----------
+ 1 file changed, 9 insertions(+), 11 deletions(-)
+
+--- a/arch/x86/kvm/vmx.c
++++ b/arch/x86/kvm/vmx.c
+@@ -34,6 +34,7 @@
+ #include <linux/tboot.h>
+ #include <linux/hrtimer.h>
+ #include <linux/frame.h>
++#include <linux/nospec.h>
+ #include "kvm_cache_regs.h"
+ #include "x86.h"
+ 
+@@ -887,21 +888,18 @@ static const unsigned short vmcs_field_t
+ 
+ static inline short vmcs_field_to_offset(unsigned long field)
+ {
+-      BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) > SHRT_MAX);
++      const size_t size = ARRAY_SIZE(vmcs_field_to_offset_table);
++      unsigned short offset;
+ 
+-      if (field >= ARRAY_SIZE(vmcs_field_to_offset_table))
++      BUILD_BUG_ON(size > SHRT_MAX);
++      if (field >= size)
+               return -ENOENT;
+ 
+-      /*
+-       * FIXME: Mitigation for CVE-2017-5753.  To be replaced with a
+-       * generic mechanism.
+-       */
+-      asm("lfence");
+-
+-      if (vmcs_field_to_offset_table[field] == 0)
++      field = array_index_nospec(field, size);
++      offset = vmcs_field_to_offset_table[field];
++      if (offset == 0)
+               return -ENOENT;
+-
+-      return vmcs_field_to_offset_table[field];
++      return offset;
+ }
+ 
+ static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)
 
--- /dev/null
+Subject: x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP
+From: William Grant william.grant@canonical.com
+Date: Tue Jan 30 22:22:55 2018 +1100
+
+From: William Grant william.grant@canonical.com
+
+commit 55f49fcb879fbeebf2a8c1ac7c9e6d90df55f798
+
+Since commit 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the
+fixmap"), i386's CPU_ENTRY_AREA has been mapped to the memory area just
+below FIXADDR_START. But already immediately before FIXADDR_START is the
+FIX_BTMAP area, which means that early_ioremap can collide with the entry
+area.
+
+It's especially bad on PAE where FIX_BTMAP_BEGIN gets aligned to exactly
+match CPU_ENTRY_AREA_BASE, so the first early_ioremap slot clobbers the
+IDT and causes interrupts during early boot to reset the system.
+
+The overlap wasn't a problem before the CPU entry area was introduced,
+as the fixmap has classically been preceded by the pkmap or vmalloc
+areas, neither of which is used until early_ioremap is out of the
+picture.
+
+Relocate CPU_ENTRY_AREA to below FIX_BTMAP, not just below the permanent
+fixmap area.
+
+Fixes: commit 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
+Signed-off-by: William Grant <william.grant@canonical.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: stable@vger.kernel.org
+Link: https://lkml.kernel.org/r/7041d181-a019-e8b9-4e4e-48215f841e2c@canonical.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/include/asm/fixmap.h           |    6 ++++--
+ arch/x86/include/asm/pgtable_32_types.h |    5 +++--
+ 2 files changed, 7 insertions(+), 4 deletions(-)
+
+--- a/arch/x86/include/asm/fixmap.h
++++ b/arch/x86/include/asm/fixmap.h
+@@ -137,8 +137,10 @@ enum fixed_addresses {
+ 
+ extern void reserve_top_address(unsigned long reserve);
+ 
+-#define FIXADDR_SIZE  (__end_of_permanent_fixed_addresses << PAGE_SHIFT)
+-#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
++#define FIXADDR_SIZE          (__end_of_permanent_fixed_addresses << PAGE_SHIFT)
++#define FIXADDR_START         (FIXADDR_TOP - FIXADDR_SIZE)
++#define FIXADDR_TOT_SIZE      (__end_of_fixed_addresses << PAGE_SHIFT)
++#define FIXADDR_TOT_START     (FIXADDR_TOP - FIXADDR_TOT_SIZE)
+ 
+ extern int fixmaps_set;
+ 
+--- a/arch/x86/include/asm/pgtable_32_types.h
++++ b/arch/x86/include/asm/pgtable_32_types.h
+@@ -44,8 +44,9 @@ extern bool __vmalloc_start_set; /* set
+  */
+ #define CPU_ENTRY_AREA_PAGES  (NR_CPUS * 40)
+ 
+-#define CPU_ENTRY_AREA_BASE                           \
+-      ((FIXADDR_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) & PMD_MASK)
++#define CPU_ENTRY_AREA_BASE                                           \
++      ((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1))   \
++       & PMD_MASK)
+ 
+ #define PKMAP_BASE            \
+       ((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
 
--- /dev/null
+Subject: x86/paravirt: Remove 'noreplace-paravirt' cmdline option
+From: Josh Poimboeuf jpoimboe@redhat.com
+Date: Tue Jan 30 22:13:33 2018 -0600
+
+From: Josh Poimboeuf jpoimboe@redhat.com
+
+commit 12c69f1e94c89d40696e83804dd2f0965b5250cd
+
+The 'noreplace-paravirt' option disables paravirt patching, leaving the
+original pv indirect calls in place.
+
+That's highly incompatible with retpolines, unless we want to uglify
+paravirt even further and convert the paravirt calls to retpolines.
+
+As far as I can tell, the option doesn't seem to be useful for much
+other than introducing surprising corner cases and making the kernel
+vulnerable to Spectre v2.  It was probably a debug option from the early
+paravirt days.  So just remove it.
+
+Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Cc: Andrea Arcangeli <aarcange@redhat.com>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Andi Kleen <ak@linux.intel.com>
+Cc: Ashok Raj <ashok.raj@intel.com>
+Cc: Greg KH <gregkh@linuxfoundation.org>
+Cc: Jun Nakajima <jun.nakajima@intel.com>
+Cc: Tim Chen <tim.c.chen@linux.intel.com>
+Cc: Rusty Russell <rusty@rustcorp.com.au>
+Cc: Dave Hansen <dave.hansen@intel.com>
+Cc: Asit Mallick <asit.k.mallick@intel.com>
+Cc: Andy Lutomirski <luto@kernel.org>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Jason Baron <jbaron@akamai.com>
+Cc: Paolo Bonzini <pbonzini@redhat.com>
+Cc: Alok Kataria <akataria@vmware.com>
+Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
+Cc: David Woodhouse <dwmw2@infradead.org>
+Cc: Dan Williams <dan.j.williams@intel.com>
+Link: https://lkml.kernel.org/r/20180131041333.2x6blhxirc2kclrq@treble
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ Documentation/admin-guide/kernel-parameters.txt |    2 --
+ arch/x86/kernel/alternative.c                   |   14 --------------
+ 2 files changed, 16 deletions(-)
+
+--- a/Documentation/admin-guide/kernel-parameters.txt
++++ b/Documentation/admin-guide/kernel-parameters.txt
+@@ -2718,8 +2718,6 @@
+       norandmaps      Don't use address space randomization.  Equivalent to
+                       echo 0 > /proc/sys/kernel/randomize_va_space
+ 
+-      noreplace-paravirt      [X86,IA-64,PV_OPS] Don't patch paravirt_ops
+-
+       noreplace-smp   [X86-32,SMP] Don't replace SMP instructions
+                       with UP alternatives
+ 
+--- a/arch/x86/kernel/alternative.c
++++ b/arch/x86/kernel/alternative.c
+@@ -46,17 +46,6 @@ static int __init setup_noreplace_smp(ch
+ }
+ __setup("noreplace-smp", setup_noreplace_smp);
+ 
+-#ifdef CONFIG_PARAVIRT
+-static int __initdata_or_module noreplace_paravirt = 0;
+-
+-static int __init setup_noreplace_paravirt(char *str)
+-{
+-      noreplace_paravirt = 1;
+-      return 1;
+-}
+-__setup("noreplace-paravirt", setup_noreplace_paravirt);
+-#endif
+-
+ #define DPRINTK(fmt, args...)                                         \
+ do {                                                                  \
+       if (debug_alternative)                                          \
+@@ -599,9 +588,6 @@ void __init_or_module apply_paravirt(str
+       struct paravirt_patch_site *p;
+       char insnbuf[MAX_PATCH_LEN];
+ 
+-      if (noreplace_paravirt)
+-              return;
+-
+       for (p = start; p < end; p++) {
+               unsigned int used;
+ 
 
--- /dev/null
+Subject: x86/pti: Mark constant arrays as __initconst
+From: Arnd Bergmann arnd@arndb.de
+Date: Fri Feb  2 22:39:23 2018 +0100
+
+From: Arnd Bergmann arnd@arndb.de
+
+commit 4bf5d56d429cbc96c23d809a08f63cd29e1a702e
+
+I'm seeing build failures from the two newly introduced arrays that
+are marked 'const' and '__initdata', which are mutually exclusive:
+
+arch/x86/kernel/cpu/common.c:882:43: error: 'cpu_no_speculation' causes a section type conflict with 'e820_table_firmware_init'
+arch/x86/kernel/cpu/common.c:895:43: error: 'cpu_no_meltdown' causes a section type conflict with 'e820_table_firmware_init'
+
+The correct annotation is __initconst.
+
+Fixes: fec9434a12f3 ("x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown")
+Signed-off-by: Arnd Bergmann <arnd@arndb.de>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
+Cc: Andy Lutomirski <luto@kernel.org>
+Cc: Borislav Petkov <bp@suse.de>
+Cc: Thomas Garnier <thgarnie@google.com>
+Cc: David Woodhouse <dwmw@amazon.co.uk>
+Link: https://lkml.kernel.org/r/20180202213959.611210-1-arnd@arndb.de
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/kernel/cpu/common.c |    4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/arch/x86/kernel/cpu/common.c
++++ b/arch/x86/kernel/cpu/common.c
+@@ -876,7 +876,7 @@ static void identify_cpu_without_cpuid(s
+ #endif
+ }
+ 
+-static const __initdata struct x86_cpu_id cpu_no_speculation[] = {
++static const __initconst struct x86_cpu_id cpu_no_speculation[] = {
+       { X86_VENDOR_INTEL,     6, INTEL_FAM6_ATOM_CEDARVIEW,   X86_FEATURE_ANY },
+       { X86_VENDOR_INTEL,     6, INTEL_FAM6_ATOM_CLOVERVIEW,  X86_FEATURE_ANY },
+       { X86_VENDOR_INTEL,     6, INTEL_FAM6_ATOM_LINCROFT,    X86_FEATURE_ANY },
+@@ -889,7 +889,7 @@ static const __initdata struct x86_cpu_i
+       {}
+ };
+ 
+-static const __initdata struct x86_cpu_id cpu_no_meltdown[] = {
++static const __initconst struct x86_cpu_id cpu_no_meltdown[] = {
+       { X86_VENDOR_AMD },
+       {}
+ };
 
--- /dev/null
+Subject: x86/retpoline: Avoid retpolines for built-in __init functions
+From: David Woodhouse dwmw@amazon.co.uk
+Date: Thu Feb  1 11:27:20 2018 +0000
+
+From: David Woodhouse dwmw@amazon.co.uk
+
+commit 66f793099a636862a71c59d4a6ba91387b155e0c
+
+There's no point in building init code with retpolines, since it runs before
+any potentially hostile userspace does. And before the retpoline is actually
+ALTERNATIVEd into place, for much of it.
+
+Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: karahmed@amazon.de
+Cc: peterz@infradead.org
+Cc: bp@alien8.de
+Link: https://lkml.kernel.org/r/1517484441-1420-2-git-send-email-dwmw@amazon.co.uk
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ include/linux/init.h |    9 ++++++++-
+ 1 file changed, 8 insertions(+), 1 deletion(-)
+
+--- a/include/linux/init.h
++++ b/include/linux/init.h
+@@ -5,6 +5,13 @@
+ #include <linux/compiler.h>
+ #include <linux/types.h>
+ 
++/* Built-in __init functions needn't be compiled with retpoline */
++#if defined(RETPOLINE) && !defined(MODULE)
++#define __noretpoline __attribute__((indirect_branch("keep")))
++#else
++#define __noretpoline
++#endif
++
+ /* These macros are used to mark some functions or 
+  * initialized data (doesn't apply to uninitialized data)
+  * as `initialization' functions. The kernel can take this
+@@ -40,7 +47,7 @@
+ 
+ /* These are for everybody (although not all archs will actually
+    discard it in modules) */
+-#define __init                __section(.init.text) __cold __inittrace __latent_entropy
++#define __init                __section(.init.text) __cold __inittrace __latent_entropy __noretpoline
+ #define __initdata    __section(.init.data)
+ #define __initconst   __section(.init.rodata)
+ #define __exitdata    __section(.exit.data)
 
--- /dev/null
+Subject: x86/spectre: Check CONFIG_RETPOLINE in command line parser
+From: Dou Liyang douly.fnst@cn.fujitsu.com
+Date: Tue Jan 30 14:13:50 2018 +0800
+
+From: Dou Liyang douly.fnst@cn.fujitsu.com
+
+commit 9471eee9186a46893726e22ebb54cade3f9bc043
+
+The spectre_v2 option 'auto' does not check whether CONFIG_RETPOLINE is
+enabled. As a consequence it fails to emit the appropriate warning and sets
+feature flags which have no effect at all.
+
+Add the missing IS_ENABLED() check.
+
+Fixes: da285121560e ("x86/spectre: Add boot time option to select Spectre v2 mitigation")
+Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: ak@linux.intel.com
+Cc: peterz@infradead.org
+Cc: Tomohiro" <misono.tomohiro@jp.fujitsu.com>
+Cc: dave.hansen@intel.com
+Cc: bp@alien8.de
+Cc: arjan@linux.intel.com
+Cc: dwmw@amazon.co.uk
+Cc: stable@vger.kernel.org
+Link: https://lkml.kernel.org/r/f5892721-7528-3647-08fb-f8d10e65ad87@cn.fujitsu.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/kernel/cpu/bugs.c |    6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+--- a/arch/x86/kernel/cpu/bugs.c
++++ b/arch/x86/kernel/cpu/bugs.c
+@@ -213,10 +213,10 @@ static void __init spectre_v2_select_mit
+               return;
+ 
+       case SPECTRE_V2_CMD_FORCE:
+-              /* FALLTRHU */
+       case SPECTRE_V2_CMD_AUTO:
+-              goto retpoline_auto;
+-
++              if (IS_ENABLED(CONFIG_RETPOLINE))
++                      goto retpoline_auto;
++              break;
+       case SPECTRE_V2_CMD_RETPOLINE_AMD:
+               if (IS_ENABLED(CONFIG_RETPOLINE))
+                       goto retpoline_amd;
 
--- /dev/null
+Subject: x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable"
+From: Colin Ian King colin.king@canonical.com
+Date: Tue Jan 30 19:32:18 2018 +0000
+
+From: Colin Ian King colin.king@canonical.com
+
+commit e698dcdfcda41efd0984de539767b4cddd235f1e
+
+Trivial fix to spelling mistake in pr_err error message text.
+
+Signed-off-by: Colin Ian King <colin.king@canonical.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: Andi Kleen <ak@linux.intel.com>
+Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Cc: kernel-janitors@vger.kernel.org
+Cc: Andy Lutomirski <luto@kernel.org>
+Cc: Borislav Petkov <bp@suse.de>
+Cc: David Woodhouse <dwmw@amazon.co.uk>
+Link: https://lkml.kernel.org/r/20180130193218.9271-1-colin.king@canonical.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ arch/x86/kernel/cpu/bugs.c |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/arch/x86/kernel/cpu/bugs.c
++++ b/arch/x86/kernel/cpu/bugs.c
+@@ -103,7 +103,7 @@ bool retpoline_module_ok(bool has_retpol
+       if (spectre_v2_enabled == SPECTRE_V2_NONE || has_retpoline)
+               return true;
+ 
+-      pr_err("System may be vunerable to spectre v2\n");
++      pr_err("System may be vulnerable to spectre v2\n");
+       spectre_v2_bad_module = true;
+       return false;
+ }
 
--- /dev/null
+Subject: x86/spectre: Report get_user mitigation for spectre_v1
+From: Dan Williams dan.j.williams@intel.com
+Date: Mon Jan 29 17:03:21 2018 -0800
+
+From: Dan Williams dan.j.williams@intel.com
+
+commit edfbae53dab8348fca778531be9f4855d2ca0360
+
+Reflect the presence of get_user(), __get_user(), and 'syscall' protections
+in sysfs. The expectation is that new and better tooling will allow the
+kernel to grow more usages of array_index_nospec(), for now, only claim
+mitigation for __user pointer de-references.
+
+Reported-by: Jiri Slaby <jslaby@suse.cz>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: linux-arch@vger.kernel.org
+Cc: kernel-hardening@lists.openwall.com
+Cc: gregkh@linuxfoundation.org
+Cc: torvalds@linux-foundation.org
+Cc: alan@linux.intel.com
+Link: https://lkml.kernel.org/r/151727420158.33451.11658324346540434635.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/kernel/cpu/bugs.c |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/arch/x86/kernel/cpu/bugs.c
++++ b/arch/x86/kernel/cpu/bugs.c
+@@ -297,7 +297,7 @@ ssize_t cpu_show_spectre_v1(struct devic
+ {
+       if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
+               return sprintf(buf, "Not affected\n");
+-      return sprintf(buf, "Vulnerable\n");
++      return sprintf(buf, "Mitigation: __user pointer sanitization\n");
+ }
+ 
+ ssize_t cpu_show_spectre_v2(struct device *dev,
 
--- /dev/null
+Subject: x86/spectre: Simplify spectre_v2 command line parsing
+From: KarimAllah Ahmed karahmed@amazon.de
+Date: Thu Feb  1 11:27:21 2018 +0000
+
+From: KarimAllah Ahmed karahmed@amazon.de
+
+commit 9005c6834c0ffdfe46afa76656bd9276cca864f6
+
+[dwmw2: Use ARRAY_SIZE]
+
+Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
+Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: peterz@infradead.org
+Cc: bp@alien8.de
+Link: https://lkml.kernel.org/r/1517484441-1420-3-git-send-email-dwmw@amazon.co.uk
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/kernel/cpu/bugs.c |   84 +++++++++++++++++++++++++++++----------------
+ 1 file changed, 55 insertions(+), 29 deletions(-)
+
+--- a/arch/x86/kernel/cpu/bugs.c
++++ b/arch/x86/kernel/cpu/bugs.c
+@@ -119,13 +119,13 @@ static inline const char *spectre_v2_mod
+ static void __init spec2_print_if_insecure(const char *reason)
+ {
+       if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
+-              pr_info("%s\n", reason);
++              pr_info("%s selected on command line.\n", reason);
+ }
+ 
+ static void __init spec2_print_if_secure(const char *reason)
+ {
+       if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
+-              pr_info("%s\n", reason);
++              pr_info("%s selected on command line.\n", reason);
+ }
+ 
+ static inline bool retp_compiler(void)
+@@ -140,42 +140,68 @@ static inline bool match_option(const ch
+       return len == arglen && !strncmp(arg, opt, len);
+ }
+ 
++static const struct {
++      const char *option;
++      enum spectre_v2_mitigation_cmd cmd;
++      bool secure;
++} mitigation_options[] = {
++      { "off",               SPECTRE_V2_CMD_NONE,              false },
++      { "on",                SPECTRE_V2_CMD_FORCE,             true },
++      { "retpoline",         SPECTRE_V2_CMD_RETPOLINE,         false },
++      { "retpoline,amd",     SPECTRE_V2_CMD_RETPOLINE_AMD,     false },
++      { "retpoline,generic", SPECTRE_V2_CMD_RETPOLINE_GENERIC, false },
++      { "auto",              SPECTRE_V2_CMD_AUTO,              false },
++};
++
+ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
+ {
+       char arg[20];
+-      int ret;
++      int ret, i;
++      enum spectre_v2_mitigation_cmd cmd = SPECTRE_V2_CMD_AUTO;
++
++      if (cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
++              return SPECTRE_V2_CMD_NONE;
++      else {
++              ret = cmdline_find_option(boot_command_line, "spectre_v2", arg,
++                                        sizeof(arg));
++              if (ret < 0)
++                      return SPECTRE_V2_CMD_AUTO;
+ 
+-      ret = cmdline_find_option(boot_command_line, "spectre_v2", arg,
+-                                sizeof(arg));
+-      if (ret > 0)  {
+-              if (match_option(arg, ret, "off")) {
+-                      goto disable;
+-              } else if (match_option(arg, ret, "on")) {
+-                      spec2_print_if_secure("force enabled on command line.");
+-                      return SPECTRE_V2_CMD_FORCE;
+-              } else if (match_option(arg, ret, "retpoline")) {
+-                      spec2_print_if_insecure("retpoline selected on command line.");
+-                      return SPECTRE_V2_CMD_RETPOLINE;
+-              } else if (match_option(arg, ret, "retpoline,amd")) {
+-                      if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) {
+-                              pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n");
+-                              return SPECTRE_V2_CMD_AUTO;
+-                      }
+-                      spec2_print_if_insecure("AMD retpoline selected on command line.");
+-                      return SPECTRE_V2_CMD_RETPOLINE_AMD;
+-              } else if (match_option(arg, ret, "retpoline,generic")) {
+-                      spec2_print_if_insecure("generic retpoline selected on command line.");
+-                      return SPECTRE_V2_CMD_RETPOLINE_GENERIC;
+-              } else if (match_option(arg, ret, "auto")) {
++              for (i = 0; i < ARRAY_SIZE(mitigation_options); i++) {
++                      if (!match_option(arg, ret, mitigation_options[i].option))
++                              continue;
++                      cmd = mitigation_options[i].cmd;
++                      break;
++              }
++
++              if (i >= ARRAY_SIZE(mitigation_options)) {
++                      pr_err("unknown option (%s). Switching to AUTO select\n",
++                             mitigation_options[i].option);
+                       return SPECTRE_V2_CMD_AUTO;
+               }
+       }
+ 
+-      if (!cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
++      if ((cmd == SPECTRE_V2_CMD_RETPOLINE ||
++           cmd == SPECTRE_V2_CMD_RETPOLINE_AMD ||
++           cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) &&
++          !IS_ENABLED(CONFIG_RETPOLINE)) {
++              pr_err("%s selected but not compiled in. Switching to AUTO select\n",
++                     mitigation_options[i].option);
+               return SPECTRE_V2_CMD_AUTO;
+-disable:
+-      spec2_print_if_insecure("disabled on command line.");
+-      return SPECTRE_V2_CMD_NONE;
++      }
++
++      if (cmd == SPECTRE_V2_CMD_RETPOLINE_AMD &&
++          boot_cpu_data.x86_vendor != X86_VENDOR_AMD) {
++              pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n");
++              return SPECTRE_V2_CMD_AUTO;
++      }
++
++      if (mitigation_options[i].secure)
++              spec2_print_if_secure(mitigation_options[i].option);
++      else
++              spec2_print_if_insecure(mitigation_options[i].option);
++
++      return cmd;
+ }
+ 
+ /* Check for Skylake-like CPUs (for RSB handling) */
 
--- /dev/null
+Subject: x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL
+From: Darren Kenny darren.kenny@oracle.com
+Date: Fri Feb  2 19:12:20 2018 +0000
+
+From: Darren Kenny darren.kenny@oracle.com
+
+commit af189c95a371b59f493dbe0f50c0a09724868881
+
+Fixes: 117cc7a908c83 ("x86/retpoline: Fill return stack buffer on vmexit")
+Signed-off-by: Darren Kenny <darren.kenny@oracle.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+Cc: Tom Lendacky <thomas.lendacky@amd.com>
+Cc: Andi Kleen <ak@linux.intel.com>
+Cc: Borislav Petkov <bp@alien8.de>
+Cc: Masami Hiramatsu <mhiramat@kernel.org>
+Cc: Arjan van de Ven <arjan@linux.intel.com>
+Cc: David Woodhouse <dwmw@amazon.co.uk>
+Link: https://lkml.kernel.org/r/20180202191220.blvgkgutojecxr3b@starbug-vm.ie.oracle.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/include/asm/nospec-branch.h |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/arch/x86/include/asm/nospec-branch.h
++++ b/arch/x86/include/asm/nospec-branch.h
+@@ -150,7 +150,7 @@ extern char __indirect_thunk_end[];
+  * On VMEXIT we must ensure that no RSB predictions learned in the guest
+  * can be followed in the host, by overwriting the RSB completely. Both
+  * retpoline and IBRS mitigations for Spectre v2 need this; only on future
+- * CPUs with IBRS_ATT *might* it be avoided.
++ * CPUs with IBRS_ALL *might* it be avoided.
+  */
+ static inline void vmexit_fill_RSB(void)
+ {
 
--- /dev/null
+Subject: x86/speculation: Use Indirect Branch Prediction Barrier in context switch
+From: Tim Chen tim.c.chen@linux.intel.com
+Date: Mon Jan 29 22:04:47 2018 +0000
+
+From: Tim Chen tim.c.chen@linux.intel.com
+
+commit 18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7
+
+Flush indirect branches when switching into a process that marked itself
+non dumpable. This protects high value processes like gpg better,
+without having too high performance overhead.
+
+If done naïvely, we could switch to a kernel idle thread and then back
+to the original process, such as:
+
+    process A -> idle -> process A
+
+In such scenario, we do not have to do IBPB here even though the process
+is non-dumpable, as we are switching back to the same process after a
+hiatus.
+
+To avoid the redundant IBPB, which is expensive, we track the last mm
+user context ID. The cost is to have an extra u64 mm context id to track
+the last mm we were using before switching to the init_mm used by idle.
+Avoiding the extra IBPB is probably worth the extra memory for this
+common scenario.
+
+For those cases where tlb_defer_switch_to_init_mm() returns true (non
+PCID), lazy tlb will defer switch to init_mm, so we will not be changing
+the mm for the process A -> idle -> process A switch. So IBPB will be
+skipped for this case.
+
+Thanks to the reviewers and Andy Lutomirski for the suggestion of
+using ctx_id which got rid of the problem of mm pointer recycling.
+
+Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
+Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: ak@linux.intel.com
+Cc: karahmed@amazon.de
+Cc: arjan@linux.intel.com
+Cc: torvalds@linux-foundation.org
+Cc: linux@dominikbrodowski.net
+Cc: peterz@infradead.org
+Cc: bp@alien8.de
+Cc: luto@kernel.org
+Cc: pbonzini@redhat.com
+Cc: gregkh@linux-foundation.org
+Link: https://lkml.kernel.org/r/1517263487-3708-1-git-send-email-dwmw@amazon.co.uk
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/include/asm/tlbflush.h |    2 ++
+ arch/x86/mm/tlb.c               |   33 ++++++++++++++++++++++++++++++++-
+ 2 files changed, 34 insertions(+), 1 deletion(-)
+
+--- a/arch/x86/include/asm/tlbflush.h
++++ b/arch/x86/include/asm/tlbflush.h
+@@ -174,6 +174,8 @@ struct tlb_state {
+       struct mm_struct *loaded_mm;
+       u16 loaded_mm_asid;
+       u16 next_asid;
++      /* last user mm's ctx id */
++      u64 last_ctx_id;
+ 
+       /*
+        * We can be in one of several states:
+--- a/arch/x86/mm/tlb.c
++++ b/arch/x86/mm/tlb.c
+@@ -6,13 +6,14 @@
+ #include <linux/interrupt.h>
+ #include <linux/export.h>
+ #include <linux/cpu.h>
++#include <linux/debugfs.h>
+ 
+ #include <asm/tlbflush.h>
+ #include <asm/mmu_context.h>
++#include <asm/nospec-branch.h>
+ #include <asm/cache.h>
+ #include <asm/apic.h>
+ #include <asm/uv/uv.h>
+-#include <linux/debugfs.h>
+ 
+ /*
+  *    TLB flushing, formerly SMP-only
+@@ -247,6 +248,27 @@ void switch_mm_irqs_off(struct mm_struct
+       } else {
+               u16 new_asid;
+               bool need_flush;
++              u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
++
++              /*
++               * Avoid user/user BTB poisoning by flushing the branch
++               * predictor when switching between processes. This stops
++               * one process from doing Spectre-v2 attacks on another.
++               *
++               * As an optimization, flush indirect branches only when
++               * switching into processes that disable dumping. This
++               * protects high value processes like gpg, without having
++               * too high performance overhead. IBPB is *expensive*!
++               *
++               * This will not flush branches when switching into kernel
++               * threads. It will also not flush if we switch to idle
++               * thread and back to the same process. It will flush if we
++               * switch to a different non-dumpable process.
++               */
++              if (tsk && tsk->mm &&
++                  tsk->mm->context.ctx_id != last_ctx_id &&
++                  get_dumpable(tsk->mm) != SUID_DUMP_USER)
++                      indirect_branch_prediction_barrier();
+ 
+               if (IS_ENABLED(CONFIG_VMAP_STACK)) {
+                       /*
+@@ -292,6 +314,14 @@ void switch_mm_irqs_off(struct mm_struct
+                       trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
+               }
+ 
++              /*
++               * Record last user mm's context id, so we can avoid
++               * flushing branch buffer with IBPB if we switch back
++               * to the same user.
++               */
++              if (next != &init_mm)
++                      this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id);
++
+               this_cpu_write(cpu_tlbstate.loaded_mm, next);
+               this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);
+       }
+@@ -369,6 +399,7 @@ void initialize_tlbstate_and_flush(void)
+       write_cr3(build_cr3(mm->pgd, 0));
+ 
+       /* Reinitialize tlbstate. */
++      this_cpu_write(cpu_tlbstate.last_ctx_id, mm->context.ctx_id);
+       this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
+       this_cpu_write(cpu_tlbstate.next_asid, 1);
+       this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
 
--- /dev/null
+Subject: x86/syscall: Sanitize syscall table de-references under speculation
+From: Dan Williams dan.j.williams@intel.com
+Date: Mon Jan 29 17:02:59 2018 -0800
+
+From: Dan Williams dan.j.williams@intel.com
+
+commit 2fbd7af5af8665d18bcefae3e9700be07e22b681
+
+The syscall table base is a user controlled function pointer in kernel
+space. Use array_index_nospec() to prevent any out of bounds speculation.
+
+While retpoline prevents speculating into a userspace directed target it
+does not stop the pointer de-reference, the concern is leaking memory
+relative to the syscall table base, by observing instruction cache
+behavior.
+
+Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: linux-arch@vger.kernel.org
+Cc: kernel-hardening@lists.openwall.com
+Cc: gregkh@linuxfoundation.org
+Cc: Andy Lutomirski <luto@kernel.org>
+Cc: alan@linux.intel.com
+Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/entry/common.c |    5 ++++-
+ 1 file changed, 4 insertions(+), 1 deletion(-)
+
+--- a/arch/x86/entry/common.c
++++ b/arch/x86/entry/common.c
+@@ -21,6 +21,7 @@
+ #include <linux/export.h>
+ #include <linux/context_tracking.h>
+ #include <linux/user-return-notifier.h>
++#include <linux/nospec.h>
+ #include <linux/uprobes.h>
+ #include <linux/livepatch.h>
+ #include <linux/syscalls.h>
+@@ -284,7 +285,8 @@ __visible void do_syscall_64(struct pt_r
+        * regs->orig_ax, which changes the behavior of some syscalls.
+        */
+       if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
+-              regs->ax = sys_call_table[nr & __SYSCALL_MASK](
++              nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
++              regs->ax = sys_call_table[nr](
+                       regs->di, regs->si, regs->dx,
+                       regs->r10, regs->r8, regs->r9);
+       }
+@@ -320,6 +322,7 @@ static __always_inline void do_syscall_3
+       }
+ 
+       if (likely(nr < IA32_NR_syscalls)) {
++              nr = array_index_nospec(nr, IA32_NR_syscalls);
+               /*
+                * It's possible that a 32-bit syscall implementation
+                * takes a 64-bit parameter but nonetheless assumes that
 
--- /dev/null
+Subject: x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec
+From: Dan Williams dan.j.williams@intel.com
+Date: Mon Jan 29 17:02:49 2018 -0800
+
+From: Dan Williams dan.j.williams@intel.com
+
+commit 304ec1b050310548db33063e567123fae8fd0301
+
+Quoting Linus:
+
+    I do think that it would be a good idea to very expressly document
+    the fact that it's not that the user access itself is unsafe. I do
+    agree that things like "get_user()" want to be protected, but not
+    because of any direct bugs or problems with get_user() and friends,
+    but simply because get_user() is an excellent source of a pointer
+    that is obviously controlled from a potentially attacking user
+    space. So it's a prime candidate for then finding _subsequent_
+    accesses that can then be used to perturb the cache.
+
+__uaccess_begin_nospec() covers __get_user() and copy_from_iter() where the
+limit check is far away from the user pointer de-reference. In those cases
+a barrier_nospec() prevents speculation with a potential pointer to
+privileged memory. uaccess_try_nospec covers get_user_try.
+
+Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
+Suggested-by: Andi Kleen <ak@linux.intel.com>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: linux-arch@vger.kernel.org
+Cc: Kees Cook <keescook@chromium.org>
+Cc: kernel-hardening@lists.openwall.com
+Cc: gregkh@linuxfoundation.org
+Cc: Al Viro <viro@zeniv.linux.org.uk>
+Cc: alan@linux.intel.com
+Link: https://lkml.kernel.org/r/151727416953.33451.10508284228526170604.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/include/asm/uaccess.h    |    6 +++---
+ arch/x86/include/asm/uaccess_32.h |    6 +++---
+ arch/x86/include/asm/uaccess_64.h |   12 ++++++------
+ arch/x86/lib/usercopy_32.c        |    4 ++--
+ 4 files changed, 14 insertions(+), 14 deletions(-)
+
+--- a/arch/x86/include/asm/uaccess.h
++++ b/arch/x86/include/asm/uaccess.h
+@@ -450,7 +450,7 @@ do {                                                                       \
+ ({                                                                    \
+       int __gu_err;                                                   \
+       __inttype(*(ptr)) __gu_val;                                     \
+-      __uaccess_begin();                                              \
++      __uaccess_begin_nospec();                                       \
+       __get_user_size(__gu_val, (ptr), (size), __gu_err, -EFAULT);    \
+       __uaccess_end();                                                \
+       (x) = (__force __typeof__(*(ptr)))__gu_val;                     \
+@@ -557,7 +557,7 @@ struct __large_struct { unsigned long bu
+  *    get_user_ex(...);
+  * } get_user_catch(err)
+  */
+-#define get_user_try          uaccess_try
++#define get_user_try          uaccess_try_nospec
+ #define get_user_catch(err)   uaccess_catch(err)
+ 
+ #define get_user_ex(x, ptr)   do {                                    \
+@@ -591,7 +591,7 @@ extern void __cmpxchg_wrong_size(void)
+       __typeof__(ptr) __uval = (uval);                                \
+       __typeof__(*(ptr)) __old = (old);                               \
+       __typeof__(*(ptr)) __new = (new);                               \
+-      __uaccess_begin();                                              \
++      __uaccess_begin_nospec();                                       \
+       switch (size) {                                                 \
+       case 1:                                                         \
+       {                                                               \
+--- a/arch/x86/include/asm/uaccess_32.h
++++ b/arch/x86/include/asm/uaccess_32.h
+@@ -29,21 +29,21 @@ raw_copy_from_user(void *to, const void
+               switch (n) {
+               case 1:
+                       ret = 0;
+-                      __uaccess_begin();
++                      __uaccess_begin_nospec();
+                       __get_user_asm_nozero(*(u8 *)to, from, ret,
+                                             "b", "b", "=q", 1);
+                       __uaccess_end();
+                       return ret;
+               case 2:
+                       ret = 0;
+-                      __uaccess_begin();
++                      __uaccess_begin_nospec();
+                       __get_user_asm_nozero(*(u16 *)to, from, ret,
+                                             "w", "w", "=r", 2);
+                       __uaccess_end();
+                       return ret;
+               case 4:
+                       ret = 0;
+-                      __uaccess_begin();
++                      __uaccess_begin_nospec();
+                       __get_user_asm_nozero(*(u32 *)to, from, ret,
+                                             "l", "k", "=r", 4);
+                       __uaccess_end();
+--- a/arch/x86/include/asm/uaccess_64.h
++++ b/arch/x86/include/asm/uaccess_64.h
+@@ -55,31 +55,31 @@ raw_copy_from_user(void *dst, const void
+               return copy_user_generic(dst, (__force void *)src, size);
+       switch (size) {
+       case 1:
+-              __uaccess_begin();
++              __uaccess_begin_nospec();
+               __get_user_asm_nozero(*(u8 *)dst, (u8 __user *)src,
+                             ret, "b", "b", "=q", 1);
+               __uaccess_end();
+               return ret;
+       case 2:
+-              __uaccess_begin();
++              __uaccess_begin_nospec();
+               __get_user_asm_nozero(*(u16 *)dst, (u16 __user *)src,
+                             ret, "w", "w", "=r", 2);
+               __uaccess_end();
+               return ret;
+       case 4:
+-              __uaccess_begin();
++              __uaccess_begin_nospec();
+               __get_user_asm_nozero(*(u32 *)dst, (u32 __user *)src,
+                             ret, "l", "k", "=r", 4);
+               __uaccess_end();
+               return ret;
+       case 8:
+-              __uaccess_begin();
++              __uaccess_begin_nospec();
+               __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
+                             ret, "q", "", "=r", 8);
+               __uaccess_end();
+               return ret;
+       case 10:
+-              __uaccess_begin();
++              __uaccess_begin_nospec();
+               __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
+                              ret, "q", "", "=r", 10);
+               if (likely(!ret))
+@@ -89,7 +89,7 @@ raw_copy_from_user(void *dst, const void
+               __uaccess_end();
+               return ret;
+       case 16:
+-              __uaccess_begin();
++              __uaccess_begin_nospec();
+               __get_user_asm_nozero(*(u64 *)dst, (u64 __user *)src,
+                              ret, "q", "", "=r", 16);
+               if (likely(!ret))
+--- a/arch/x86/lib/usercopy_32.c
++++ b/arch/x86/lib/usercopy_32.c
+@@ -331,7 +331,7 @@ do {                                                                       \
+ 
+ unsigned long __copy_user_ll(void *to, const void *from, unsigned long n)
+ {
+-      __uaccess_begin();
++      __uaccess_begin_nospec();
+       if (movsl_is_ok(to, from, n))
+               __copy_user(to, from, n);
+       else
+@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
+ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
+                                       unsigned long n)
+ {
+-      __uaccess_begin();
++      __uaccess_begin_nospec();
+ #ifdef CONFIG_X86_INTEL_USERCOPY
+       if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
+               n = __copy_user_intel_nocache(to, from, n);
 
--- /dev/null
+Subject: x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end}
+From: Dan Williams dan.j.williams@intel.com
+Date: Mon Jan 29 17:02:44 2018 -0800
+
+From: Dan Williams dan.j.williams@intel.com
+
+commit b5c4ae4f35325d520b230bab6eb3310613b72ac1
+
+In preparation for converting some __uaccess_begin() instances to
+__uacess_begin_nospec(), make sure all 'from user' uaccess paths are
+using the _begin(), _end() helpers rather than open-coded stac() and
+clac().
+
+No functional changes.
+
+Suggested-by: Ingo Molnar <mingo@redhat.com>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: linux-arch@vger.kernel.org
+Cc: Tom Lendacky <thomas.lendacky@amd.com>
+Cc: Kees Cook <keescook@chromium.org>
+Cc: kernel-hardening@lists.openwall.com
+Cc: gregkh@linuxfoundation.org
+Cc: Al Viro <viro@zeniv.linux.org.uk>
+Cc: torvalds@linux-foundation.org
+Cc: alan@linux.intel.com
+Link: https://lkml.kernel.org/r/151727416438.33451.17309465232057176966.stgit@dwillia2-desk3.amr.corp.intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+
+---
+ arch/x86/lib/usercopy_32.c |    8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+--- a/arch/x86/lib/usercopy_32.c
++++ b/arch/x86/lib/usercopy_32.c
+@@ -331,12 +331,12 @@ do {                                                                     \
+ 
+ unsigned long __copy_user_ll(void *to, const void *from, unsigned long n)
+ {
+-      stac();
++      __uaccess_begin();
+       if (movsl_is_ok(to, from, n))
+               __copy_user(to, from, n);
+       else
+               n = __copy_user_intel(to, from, n);
+-      clac();
++      __uaccess_end();
+       return n;
+ }
+ EXPORT_SYMBOL(__copy_user_ll);
+@@ -344,7 +344,7 @@ EXPORT_SYMBOL(__copy_user_ll);
+ unsigned long __copy_from_user_ll_nocache_nozero(void *to, const void __user *from,
+                                       unsigned long n)
+ {
+-      stac();
++      __uaccess_begin();
+ #ifdef CONFIG_X86_INTEL_USERCOPY
+       if (n > 64 && static_cpu_has(X86_FEATURE_XMM2))
+               n = __copy_user_intel_nocache(to, from, n);
+@@ -353,7 +353,7 @@ unsigned long __copy_from_user_ll_nocach
+ #else
+       __copy_user(to, from, n);
+ #endif
+-      clac();
++      __uaccess_end();
+       return n;
+ }
+ EXPORT_SYMBOL(__copy_from_user_ll_nocache_nozero);