From: Greg Kroah-Hartman Date: Mon, 8 Feb 2021 12:05:02 +0000 (+0100) Subject: 5.10-stable patches X-Git-Tag: v4.4.257~14 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=d59d15c4e3efdad5b81b5be022a8feb80d40422d;p=thirdparty%2Fkernel%2Fstable-queue.git 5.10-stable patches added patches: x86-apic-add-extra-serialization-for-non-serializing-msrs.patch x86-build-disable-cet-instrumentation-in-the-kernel.patch x86-debug-fix-dr6-handling.patch x86-debug-prevent-data-breakpoints-on-__per_cpu_offset.patch x86-debug-prevent-data-breakpoints-on-cpu_dr7.patch --- diff --git a/queue-5.10/series b/queue-5.10/series index 28912c1a240..5d48f0c8720 100644 --- a/queue-5.10/series +++ b/queue-5.10/series @@ -103,3 +103,8 @@ mm-compaction-move-high_pfn-to-the-for-loop-scope.patch mm-vmalloc-separate-put-pages-and-flush-vm-flags.patch mm-thp-fix-madv_remove-deadlock-on-shmem-thp.patch mm-filemap-add-missing-mem_cgroup_uncharge-to-__add_to_page_cache_locked.patch +x86-build-disable-cet-instrumentation-in-the-kernel.patch +x86-debug-fix-dr6-handling.patch +x86-debug-prevent-data-breakpoints-on-__per_cpu_offset.patch +x86-debug-prevent-data-breakpoints-on-cpu_dr7.patch +x86-apic-add-extra-serialization-for-non-serializing-msrs.patch diff --git a/queue-5.10/x86-apic-add-extra-serialization-for-non-serializing-msrs.patch b/queue-5.10/x86-apic-add-extra-serialization-for-non-serializing-msrs.patch new file mode 100644 index 00000000000..2f98c5b0d1b --- /dev/null +++ b/queue-5.10/x86-apic-add-extra-serialization-for-non-serializing-msrs.patch @@ -0,0 +1,202 @@ +From 25a068b8e9a4eb193d755d58efcb3c98928636e0 Mon Sep 17 00:00:00 2001 +From: Dave Hansen +Date: Thu, 5 Mar 2020 09:47:08 -0800 +Subject: x86/apic: Add extra serialization for non-serializing MSRs + +From: Dave Hansen + +commit 25a068b8e9a4eb193d755d58efcb3c98928636e0 upstream. + +Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain +MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for +MFENCE; LFENCE. + +Short summary: we have special MSRs that have weaker ordering than all +the rest. Add fencing consistent with current SDM recommendations. + +This is not known to cause any issues in practice, only in theory. + +Longer story below: + +The reason the kernel uses a different semantic is that the SDM changed +(roughly in late 2017). The SDM changed because folks at Intel were +auditing all of the recommended fences in the SDM and realized that the +x2apic fences were insufficient. + +Why was the pain MFENCE judged insufficient? + +WRMSR itself is normally a serializing instruction. No fences are needed +because the instruction itself serializes everything. + +But, there are explicit exceptions for this serializing behavior written +into the WRMSR instruction documentation for two classes of MSRs: +IA32_TSC_DEADLINE and the X2APIC MSRs. + +Back to x2apic: WRMSR is *not* serializing in this specific case. +But why is MFENCE insufficient? MFENCE makes writes visible, but +only affects load/store instructions. WRMSR is unfortunately not a +load/store instruction and is unaffected by MFENCE. This means that a +non-serializing WRMSR could be reordered by the CPU to execute before +the writes made visible by the MFENCE have even occurred in the first +place. + +This means that an x2apic IPI could theoretically be triggered before +there is any (visible) data to process. + +Does this affect anything in practice? I honestly don't know. It seems +quite possible that by the time an interrupt gets to consume the (not +yet) MFENCE'd data, it has become visible, mostly by accident. + +To be safe, add the SDM-recommended fences for all x2apic WRMSRs. + +This also leaves open the question of the _other_ weakly-ordered WRMSR: +MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as +the x2APIC MSRs, it seems substantially less likely to be a problem in +practice. While writes to the in-memory Local Vector Table (LVT) might +theoretically be reordered with respect to a weakly-ordered WRMSR like +TSC_DEADLINE, the SDM has this to say: + + In x2APIC mode, the WRMSR instruction is used to write to the LVT + entry. The processor ensures the ordering of this write and any + subsequent WRMSR to the deadline; no fencing is required. + +But, that might still leave xAPIC exposed. The safest thing to do for +now is to add the extra, recommended LFENCE. + + [ bp: Massage commit message, fix typos, drop accidentally added + newline to tools/arch/x86/include/asm/barrier.h. ] + +Reported-by: Jan Kiszka +Signed-off-by: Dave Hansen +Signed-off-by: Borislav Petkov +Acked-by: Peter Zijlstra (Intel) +Acked-by: Thomas Gleixner +Cc: +Link: https://lkml.kernel.org/r/20200305174708.F77040DD@viggo.jf.intel.com +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/include/asm/apic.h | 10 ---------- + arch/x86/include/asm/barrier.h | 18 ++++++++++++++++++ + arch/x86/kernel/apic/apic.c | 4 ++++ + arch/x86/kernel/apic/x2apic_cluster.c | 6 ++++-- + arch/x86/kernel/apic/x2apic_phys.c | 9 ++++++--- + 5 files changed, 32 insertions(+), 15 deletions(-) + +--- a/arch/x86/include/asm/apic.h ++++ b/arch/x86/include/asm/apic.h +@@ -197,16 +197,6 @@ static inline bool apic_needs_pit(void) + #endif /* !CONFIG_X86_LOCAL_APIC */ + + #ifdef CONFIG_X86_X2APIC +-/* +- * Make previous memory operations globally visible before +- * sending the IPI through x2apic wrmsr. We need a serializing instruction or +- * mfence for this. +- */ +-static inline void x2apic_wrmsr_fence(void) +-{ +- asm volatile("mfence" : : : "memory"); +-} +- + static inline void native_apic_msr_write(u32 reg, u32 v) + { + if (reg == APIC_DFR || reg == APIC_ID || reg == APIC_LDR || +--- a/arch/x86/include/asm/barrier.h ++++ b/arch/x86/include/asm/barrier.h +@@ -84,4 +84,22 @@ do { \ + + #include + ++/* ++ * Make previous memory operations globally visible before ++ * a WRMSR. ++ * ++ * MFENCE makes writes visible, but only affects load/store ++ * instructions. WRMSR is unfortunately not a load/store ++ * instruction and is unaffected by MFENCE. The LFENCE ensures ++ * that the WRMSR is not reordered. ++ * ++ * Most WRMSRs are full serializing instructions themselves and ++ * do not require this barrier. This is only required for the ++ * IA32_TSC_DEADLINE and X2APIC MSRs. ++ */ ++static inline void weak_wrmsr_fence(void) ++{ ++ asm volatile("mfence; lfence" : : : "memory"); ++} ++ + #endif /* _ASM_X86_BARRIER_H */ +--- a/arch/x86/kernel/apic/apic.c ++++ b/arch/x86/kernel/apic/apic.c +@@ -41,6 +41,7 @@ + #include + #include + #include ++#include + #include + #include + #include +@@ -472,6 +473,9 @@ static int lapic_next_deadline(unsigned + { + u64 tsc; + ++ /* This MSR is special and need a special fence: */ ++ weak_wrmsr_fence(); ++ + tsc = rdtsc(); + wrmsrl(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR)); + return 0; +--- a/arch/x86/kernel/apic/x2apic_cluster.c ++++ b/arch/x86/kernel/apic/x2apic_cluster.c +@@ -29,7 +29,8 @@ static void x2apic_send_IPI(int cpu, int + { + u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu); + +- x2apic_wrmsr_fence(); ++ /* x2apic MSRs are special and need a special fence: */ ++ weak_wrmsr_fence(); + __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL); + } + +@@ -41,7 +42,8 @@ __x2apic_send_IPI_mask(const struct cpum + unsigned long flags; + u32 dest; + +- x2apic_wrmsr_fence(); ++ /* x2apic MSRs are special and need a special fence: */ ++ weak_wrmsr_fence(); + local_irq_save(flags); + + tmpmsk = this_cpu_cpumask_var_ptr(ipi_mask); +--- a/arch/x86/kernel/apic/x2apic_phys.c ++++ b/arch/x86/kernel/apic/x2apic_phys.c +@@ -43,7 +43,8 @@ static void x2apic_send_IPI(int cpu, int + { + u32 dest = per_cpu(x86_cpu_to_apicid, cpu); + +- x2apic_wrmsr_fence(); ++ /* x2apic MSRs are special and need a special fence: */ ++ weak_wrmsr_fence(); + __x2apic_send_IPI_dest(dest, vector, APIC_DEST_PHYSICAL); + } + +@@ -54,7 +55,8 @@ __x2apic_send_IPI_mask(const struct cpum + unsigned long this_cpu; + unsigned long flags; + +- x2apic_wrmsr_fence(); ++ /* x2apic MSRs are special and need a special fence: */ ++ weak_wrmsr_fence(); + + local_irq_save(flags); + +@@ -125,7 +127,8 @@ void __x2apic_send_IPI_shorthand(int vec + { + unsigned long cfg = __prepare_ICR(which, vector, 0); + +- x2apic_wrmsr_fence(); ++ /* x2apic MSRs are special and need a special fence: */ ++ weak_wrmsr_fence(); + native_x2apic_icr_write(cfg, 0); + } + diff --git a/queue-5.10/x86-build-disable-cet-instrumentation-in-the-kernel.patch b/queue-5.10/x86-build-disable-cet-instrumentation-in-the-kernel.patch new file mode 100644 index 00000000000..c40bf972572 --- /dev/null +++ b/queue-5.10/x86-build-disable-cet-instrumentation-in-the-kernel.patch @@ -0,0 +1,65 @@ +From 20bf2b378729c4a0366a53e2018a0b70ace94bcd Mon Sep 17 00:00:00 2001 +From: Josh Poimboeuf +Date: Thu, 28 Jan 2021 15:52:19 -0600 +Subject: x86/build: Disable CET instrumentation in the kernel + +From: Josh Poimboeuf + +commit 20bf2b378729c4a0366a53e2018a0b70ace94bcd upstream. + +With retpolines disabled, some configurations of GCC, and specifically +the GCC versions 9 and 10 in Ubuntu will add Intel CET instrumentation +to the kernel by default. That breaks certain tracing scenarios by +adding a superfluous ENDBR64 instruction before the fentry call, for +functions which can be called indirectly. + +CET instrumentation isn't currently necessary in the kernel, as CET is +only supported in user space. Disable it unconditionally and move it +into the x86's Makefile as CET/CFI... enablement should be a per-arch +decision anyway. + + [ bp: Massage and extend commit message. ] + +Fixes: 29be86d7f9cb ("kbuild: add -fcf-protection=none when using retpoline flags") +Reported-by: Nikolay Borisov +Signed-off-by: Josh Poimboeuf +Signed-off-by: Borislav Petkov +Reviewed-by: Nikolay Borisov +Tested-by: Nikolay Borisov +Cc: +Cc: Seth Forshee +Cc: Masahiro Yamada +Link: https://lkml.kernel.org/r/20210128215219.6kct3h2eiustncws@treble +Signed-off-by: Greg Kroah-Hartman +--- + Makefile | 6 ------ + arch/x86/Makefile | 3 +++ + 2 files changed, 3 insertions(+), 6 deletions(-) + +--- a/Makefile ++++ b/Makefile +@@ -950,12 +950,6 @@ KBUILD_CFLAGS += $(call cc-option,-Wer + # change __FILE__ to the relative path from the srctree + KBUILD_CPPFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=) + +-# ensure -fcf-protection is disabled when using retpoline as it is +-# incompatible with -mindirect-branch=thunk-extern +-ifdef CONFIG_RETPOLINE +-KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none) +-endif +- + # include additional Makefiles when needed + include-y := scripts/Makefile.extrawarn + include-$(CONFIG_KASAN) += scripts/Makefile.kasan +--- a/arch/x86/Makefile ++++ b/arch/x86/Makefile +@@ -127,6 +127,9 @@ else + + KBUILD_CFLAGS += -mno-red-zone + KBUILD_CFLAGS += -mcmodel=kernel ++ ++ # Intel CET isn't enabled in the kernel ++ KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none) + endif + + ifdef CONFIG_X86_X32 diff --git a/queue-5.10/x86-debug-fix-dr6-handling.patch b/queue-5.10/x86-debug-fix-dr6-handling.patch new file mode 100644 index 00000000000..e286a5d5b91 --- /dev/null +++ b/queue-5.10/x86-debug-fix-dr6-handling.patch @@ -0,0 +1,115 @@ +From 9ad22e165994ccb64d85b68499eaef97342c175b Mon Sep 17 00:00:00 2001 +From: Peter Zijlstra +Date: Thu, 28 Jan 2021 22:16:27 +0100 +Subject: x86/debug: Fix DR6 handling + +From: Peter Zijlstra + +commit 9ad22e165994ccb64d85b68499eaef97342c175b upstream. + +Tom reported that one of the GDB test-cases failed, and Boris bisected +it to commit: + + d53d9bc0cf78 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6") + +The debugging session led us to commit: + + 6c0aca288e72 ("x86: Ignore trap bits on single step exceptions") + +It turns out that TF and data breakpoints are both traps and will be +merged, while instruction breakpoints are faults and will not be merged. +This means 6c0aca288e72 is wrong, only TF and instruction breakpoints +need to be excluded while TF and data breakpoints can be merged. + + [ bp: Massage commit message. ] + +Fixes: d53d9bc0cf78 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6") +Fixes: 6c0aca288e72 ("x86: Ignore trap bits on single step exceptions") +Reported-by: Tom de Vries +Signed-off-by: Peter Zijlstra (Intel) +Signed-off-by: Borislav Petkov +Cc: +Link: https://lkml.kernel.org/r/YBMAbQGACujjfz%2Bi@hirez.programming.kicks-ass.net +Link: https://lkml.kernel.org/r/20210128211627.GB4348@worktop.programming.kicks-ass.net +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kernel/hw_breakpoint.c | 39 ++++++++++++++++++--------------------- + 1 file changed, 18 insertions(+), 21 deletions(-) + +--- a/arch/x86/kernel/hw_breakpoint.c ++++ b/arch/x86/kernel/hw_breakpoint.c +@@ -491,15 +491,12 @@ static int hw_breakpoint_handler(struct + struct perf_event *bp; + unsigned long *dr6_p; + unsigned long dr6; ++ bool bpx; + + /* The DR6 value is pointed by args->err */ + dr6_p = (unsigned long *)ERR_PTR(args->err); + dr6 = *dr6_p; + +- /* If it's a single step, TRAP bits are random */ +- if (dr6 & DR_STEP) +- return NOTIFY_DONE; +- + /* Do an early return if no trap bits are set in DR6 */ + if ((dr6 & DR_TRAP_BITS) == 0) + return NOTIFY_DONE; +@@ -509,28 +506,29 @@ static int hw_breakpoint_handler(struct + if (likely(!(dr6 & (DR_TRAP0 << i)))) + continue; + ++ bp = this_cpu_read(bp_per_reg[i]); ++ if (!bp) ++ continue; ++ ++ bpx = bp->hw.info.type == X86_BREAKPOINT_EXECUTE; ++ + /* +- * The counter may be concurrently released but that can only +- * occur from a call_rcu() path. We can then safely fetch +- * the breakpoint, use its callback, touch its counter +- * while we are in an rcu_read_lock() path. ++ * TF and data breakpoints are traps and can be merged, however ++ * instruction breakpoints are faults and will be raised ++ * separately. ++ * ++ * However DR6 can indicate both TF and instruction ++ * breakpoints. In that case take TF as that has precedence and ++ * delay the instruction breakpoint for the next exception. + */ +- rcu_read_lock(); ++ if (bpx && (dr6 & DR_STEP)) ++ continue; + +- bp = this_cpu_read(bp_per_reg[i]); + /* + * Reset the 'i'th TRAP bit in dr6 to denote completion of + * exception handling + */ + (*dr6_p) &= ~(DR_TRAP0 << i); +- /* +- * bp can be NULL due to lazy debug register switching +- * or due to concurrent perf counter removing. +- */ +- if (!bp) { +- rcu_read_unlock(); +- break; +- } + + perf_bp_event(bp, args->regs); + +@@ -538,11 +536,10 @@ static int hw_breakpoint_handler(struct + * Set up resume flag to avoid breakpoint recursion when + * returning back to origin. + */ +- if (bp->hw.info.type == X86_BREAKPOINT_EXECUTE) ++ if (bpx) + args->regs->flags |= X86_EFLAGS_RF; +- +- rcu_read_unlock(); + } ++ + /* + * Further processing in do_debug() is needed for a) user-space + * breakpoints (to generate signals) and b) when the system has diff --git a/queue-5.10/x86-debug-prevent-data-breakpoints-on-__per_cpu_offset.patch b/queue-5.10/x86-debug-prevent-data-breakpoints-on-__per_cpu_offset.patch new file mode 100644 index 00000000000..e7fc4659187 --- /dev/null +++ b/queue-5.10/x86-debug-prevent-data-breakpoints-on-__per_cpu_offset.patch @@ -0,0 +1,52 @@ +From c4bed4b96918ff1d062ee81fdae4d207da4fa9b0 Mon Sep 17 00:00:00 2001 +From: Lai Jiangshan +Date: Thu, 4 Feb 2021 23:27:06 +0800 +Subject: x86/debug: Prevent data breakpoints on __per_cpu_offset + +From: Lai Jiangshan + +commit c4bed4b96918ff1d062ee81fdae4d207da4fa9b0 upstream. + +When FSGSBASE is enabled, paranoid_entry() fetches the per-CPU GSBASE value +via __per_cpu_offset or pcpu_unit_offsets. + +When a data breakpoint is set on __per_cpu_offset[cpu] (read-write +operation), the specific CPU will be stuck in an infinite #DB loop. + +RCU will try to send an NMI to the specific CPU, but it is not working +either since NMI also relies on paranoid_entry(). Which means it's +undebuggable. + +Fixes: eaad981291ee3("x86/entry/64: Introduce the FIND_PERCPU_BASE macro") +Signed-off-by: Lai Jiangshan +Signed-off-by: Thomas Gleixner +Cc: stable@vger.kernel.org +Link: https://lore.kernel.org/r/20210204152708.21308-1-jiangshanlai@gmail.com +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kernel/hw_breakpoint.c | 14 ++++++++++++++ + 1 file changed, 14 insertions(+) + +--- a/arch/x86/kernel/hw_breakpoint.c ++++ b/arch/x86/kernel/hw_breakpoint.c +@@ -269,6 +269,20 @@ static inline bool within_cpu_entry(unsi + CPU_ENTRY_AREA_TOTAL_SIZE)) + return true; + ++ /* ++ * When FSGSBASE is enabled, paranoid_entry() fetches the per-CPU ++ * GSBASE value via __per_cpu_offset or pcpu_unit_offsets. ++ */ ++#ifdef CONFIG_SMP ++ if (within_area(addr, end, (unsigned long)__per_cpu_offset, ++ sizeof(unsigned long) * nr_cpu_ids)) ++ return true; ++#else ++ if (within_area(addr, end, (unsigned long)&pcpu_unit_offsets, ++ sizeof(pcpu_unit_offsets))) ++ return true; ++#endif ++ + for_each_possible_cpu(cpu) { + /* The original rw GDT is being used after load_direct_gdt() */ + if (within_area(addr, end, (unsigned long)get_cpu_gdt_rw(cpu), diff --git a/queue-5.10/x86-debug-prevent-data-breakpoints-on-cpu_dr7.patch b/queue-5.10/x86-debug-prevent-data-breakpoints-on-cpu_dr7.patch new file mode 100644 index 00000000000..21d7f408671 --- /dev/null +++ b/queue-5.10/x86-debug-prevent-data-breakpoints-on-cpu_dr7.patch @@ -0,0 +1,47 @@ +From 3943abf2dbfae9ea4d2da05c1db569a0603f76da Mon Sep 17 00:00:00 2001 +From: Lai Jiangshan +Date: Thu, 4 Feb 2021 23:27:07 +0800 +Subject: x86/debug: Prevent data breakpoints on cpu_dr7 + +From: Lai Jiangshan + +commit 3943abf2dbfae9ea4d2da05c1db569a0603f76da upstream. + +local_db_save() is called at the start of exc_debug_kernel(), reads DR7 and +disables breakpoints to prevent recursion. + +When running in a guest (X86_FEATURE_HYPERVISOR), local_db_save() reads the +per-cpu variable cpu_dr7 to check whether a breakpoint is active or not +before it accesses DR7. + +A data breakpoint on cpu_dr7 therefore results in infinite #DB recursion. + +Disallow data breakpoints on cpu_dr7 to prevent that. + +Fixes: 84b6a3491567a("x86/entry: Optimize local_db_save() for virt") +Signed-off-by: Lai Jiangshan +Signed-off-by: Thomas Gleixner +Cc: stable@vger.kernel.org +Link: https://lore.kernel.org/r/20210204152708.21308-2-jiangshanlai@gmail.com +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kernel/hw_breakpoint.c | 8 ++++++++ + 1 file changed, 8 insertions(+) + +--- a/arch/x86/kernel/hw_breakpoint.c ++++ b/arch/x86/kernel/hw_breakpoint.c +@@ -307,6 +307,14 @@ static inline bool within_cpu_entry(unsi + (unsigned long)&per_cpu(cpu_tlbstate, cpu), + sizeof(struct tlb_state))) + return true; ++ ++ /* ++ * When in guest (X86_FEATURE_HYPERVISOR), local_db_save() ++ * will read per-cpu cpu_dr7 before clear dr7 register. ++ */ ++ if (within_area(addr, end, (unsigned long)&per_cpu(cpu_dr7, cpu), ++ sizeof(cpu_dr7))) ++ return true; + } + + return false;