From: Greg Kroah-Hartman Date: Fri, 5 Apr 2024 09:35:50 +0000 (+0200) Subject: 5.15-stable patches X-Git-Tag: v5.15.154~103 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=c8a76974963b74ad0615d06737f67d0a2b88adc8;p=thirdparty%2Fkernel%2Fstable-queue.git 5.15-stable patches added patches: kvm-x86-bail-to-userspace-if-emulation-of-atomic-user-access-faults.patch kvm-x86-mark-target-gfn-of-emulated-atomic-instruction-as-dirty.patch mm-vmscan-prevent-infinite-loop-for-costly-gfp_noio-__gfp_retry_mayfail-allocations.patch thermal-devfreq_cooling-fix-perf-state-when-calculate-dfc-res_util.patch --- diff --git a/queue-5.15/kvm-x86-bail-to-userspace-if-emulation-of-atomic-user-access-faults.patch b/queue-5.15/kvm-x86-bail-to-userspace-if-emulation-of-atomic-user-access-faults.patch new file mode 100644 index 00000000000..e6419bbfd95 --- /dev/null +++ b/queue-5.15/kvm-x86-bail-to-userspace-if-emulation-of-atomic-user-access-faults.patch @@ -0,0 +1,33 @@ +From 5d6c7de6446e9ab3fb41d6f7d82770e50998f3de Mon Sep 17 00:00:00 2001 +From: Sean Christopherson +Date: Wed, 2 Feb 2022 00:49:45 +0000 +Subject: KVM: x86: Bail to userspace if emulation of atomic user access faults + +From: Sean Christopherson + +commit 5d6c7de6446e9ab3fb41d6f7d82770e50998f3de upstream. + +Exit to userspace when emulating an atomic guest access if the CMPXCHG on +the userspace address faults. Emulating the access as a write and thus +likely treating it as emulated MMIO is wrong, as KVM has already +confirmed there is a valid, writable memslot. + +Signed-off-by: Sean Christopherson +Message-Id: <20220202004945.2540433-6-seanjc@google.com> +Signed-off-by: Paolo Bonzini +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kvm/x86.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/arch/x86/kvm/x86.c ++++ b/arch/x86/kvm/x86.c +@@ -7108,7 +7108,7 @@ static int emulator_cmpxchg_emulated(str + } + + if (r < 0) +- goto emul_write; ++ return X86EMUL_UNHANDLEABLE; + if (r) + return X86EMUL_CMPXCHG_FAILED; + diff --git a/queue-5.15/kvm-x86-mark-target-gfn-of-emulated-atomic-instruction-as-dirty.patch b/queue-5.15/kvm-x86-mark-target-gfn-of-emulated-atomic-instruction-as-dirty.patch new file mode 100644 index 00000000000..1130f1018dc --- /dev/null +++ b/queue-5.15/kvm-x86-mark-target-gfn-of-emulated-atomic-instruction-as-dirty.patch @@ -0,0 +1,62 @@ +From 910c57dfa4d113aae6571c2a8b9ae8c430975902 Mon Sep 17 00:00:00 2001 +From: Sean Christopherson +Date: Wed, 14 Feb 2024 17:00:03 -0800 +Subject: KVM: x86: Mark target gfn of emulated atomic instruction as dirty + +From: Sean Christopherson + +commit 910c57dfa4d113aae6571c2a8b9ae8c430975902 upstream. + +When emulating an atomic access on behalf of the guest, mark the target +gfn dirty if the CMPXCHG by KVM is attempted and doesn't fault. This +fixes a bug where KVM effectively corrupts guest memory during live +migration by writing to guest memory without informing userspace that the +page is dirty. + +Marking the page dirty got unintentionally dropped when KVM's emulated +CMPXCHG was converted to do a user access. Before that, KVM explicitly +mapped the guest page into kernel memory, and marked the page dirty during +the unmap phase. + +Mark the page dirty even if the CMPXCHG fails, as the old data is written +back on failure, i.e. the page is still written. The value written is +guaranteed to be the same because the operation is atomic, but KVM's ABI +is that all writes are dirty logged regardless of the value written. And +more importantly, that's what KVM did before the buggy commit. + +Huge kudos to the folks on the Cc list (and many others), who did all the +actual work of triaging and debugging. + +Fixes: 1c2361f667f3 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses") +Cc: stable@vger.kernel.org +Cc: David Matlack +Cc: Pasha Tatashin +Cc: Michael Krebs +base-commit: 6769ea8da8a93ed4630f1ce64df6aafcaabfce64 +Reviewed-by: Jim Mattson +Link: https://lore.kernel.org/r/20240215010004.1456078-2-seanjc@google.com +Signed-off-by: Sean Christopherson +Signed-off-by: Greg Kroah-Hartman +--- + arch/x86/kvm/x86.c | 10 ++++++++++ + 1 file changed, 10 insertions(+) + +--- a/arch/x86/kvm/x86.c ++++ b/arch/x86/kvm/x86.c +@@ -7109,6 +7109,16 @@ static int emulator_cmpxchg_emulated(str + + if (r < 0) + return X86EMUL_UNHANDLEABLE; ++ ++ /* ++ * Mark the page dirty _before_ checking whether or not the CMPXCHG was ++ * successful, as the old value is written back on failure. Note, for ++ * live migration, this is unnecessarily conservative as CMPXCHG writes ++ * back the original value and the access is atomic, but KVM's ABI is ++ * that all writes are dirty logged, regardless of the value written. ++ */ ++ kvm_vcpu_mark_page_dirty(vcpu, gpa_to_gfn(gpa)); ++ + if (r) + return X86EMUL_CMPXCHG_FAILED; + diff --git a/queue-5.15/mm-vmscan-prevent-infinite-loop-for-costly-gfp_noio-__gfp_retry_mayfail-allocations.patch b/queue-5.15/mm-vmscan-prevent-infinite-loop-for-costly-gfp_noio-__gfp_retry_mayfail-allocations.patch new file mode 100644 index 00000000000..f83a9eebba8 --- /dev/null +++ b/queue-5.15/mm-vmscan-prevent-infinite-loop-for-costly-gfp_noio-__gfp_retry_mayfail-allocations.patch @@ -0,0 +1,188 @@ +From 803de9000f334b771afacb6ff3e78622916668b0 Mon Sep 17 00:00:00 2001 +From: Vlastimil Babka +Date: Wed, 21 Feb 2024 12:43:58 +0100 +Subject: mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations + +From: Vlastimil Babka + +commit 803de9000f334b771afacb6ff3e78622916668b0 upstream. + +Sven reports an infinite loop in __alloc_pages_slowpath() for costly order +__GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination +can happen in a suspend/resume context where a GFP_KERNEL allocation can +have __GFP_IO masked out via gfp_allowed_mask. + +Quoting Sven: + +1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) + with __GFP_RETRY_MAYFAIL set. + +2. page alloc's __alloc_pages_slowpath tries to get a page from the + freelist. This fails because there is nothing free of that costly + order. + +3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, + which bails out because a zone is ready to be compacted; it pretends + to have made a single page of progress. + +4. page alloc tries to compact, but this always bails out early because + __GFP_IO is not set (it's not passed by the snd allocator, and even + if it were, we are suspending so the __GFP_IO flag would be cleared + anyway). + +5. page alloc believes reclaim progress was made (because of the + pretense in item 3) and so it checks whether it should retry + compaction. The compaction retry logic thinks it should try again, + because: + a) reclaim is needed because of the early bail-out in item 4 + b) a zonelist is suitable for compaction + +6. goto 2. indefinite stall. + +(end quote) + +The immediate root cause is confusing the COMPACT_SKIPPED returned from +__alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be +indicating a lack of order-0 pages, and in step 5 evaluating that in +should_compact_retry() as a reason to retry, before incrementing and +limiting the number of retries. There are however other places that +wrongly assume that compaction can happen while we lack __GFP_IO. + +To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO +evaluation and switch the open-coded test in try_to_compact_pages() to use +it. + +Also use the new helper in: +- compaction_ready(), which will make reclaim not bail out in step 3, so + there's at least one attempt to actually reclaim, even if chances are + small for a costly order +- in_reclaim_compaction() which will make should_continue_reclaim() + return false and we don't over-reclaim unnecessarily +- in __alloc_pages_slowpath() to set a local variable can_compact, + which is then used to avoid retrying reclaim/compaction for costly + allocations (step 5) if we can't compact and also to skip the early + compaction attempt that we do in some cases + +Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz +Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") +Signed-off-by: Vlastimil Babka +Reported-by: Sven van Ashbrook +Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/ +Tested-by: Karthikeyan Ramasubramanian +Cc: Brian Geffon +Cc: Curtis Malainey +Cc: Jaroslav Kysela +Cc: Mel Gorman +Cc: Michal Hocko +Cc: Takashi Iwai +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Vlastimil Babka +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/gfp.h | 9 +++++++++ + mm/compaction.c | 7 +------ + mm/page_alloc.c | 10 ++++++---- + mm/vmscan.c | 5 ++++- + 4 files changed, 20 insertions(+), 11 deletions(-) + +--- a/include/linux/gfp.h ++++ b/include/linux/gfp.h +@@ -660,6 +660,15 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_ma + extern void pm_restrict_gfp_mask(void); + extern void pm_restore_gfp_mask(void); + ++/* ++ * Check if the gfp flags allow compaction - GFP_NOIO is a really ++ * tricky context because the migration might require IO. ++ */ ++static inline bool gfp_compaction_allowed(gfp_t gfp_mask) ++{ ++ return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO); ++} ++ + extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma); + + #ifdef CONFIG_PM_SLEEP +--- a/mm/compaction.c ++++ b/mm/compaction.c +@@ -2582,16 +2582,11 @@ enum compact_result try_to_compact_pages + unsigned int alloc_flags, const struct alloc_context *ac, + enum compact_priority prio, struct page **capture) + { +- int may_perform_io = gfp_mask & __GFP_IO; + struct zoneref *z; + struct zone *zone; + enum compact_result rc = COMPACT_SKIPPED; + +- /* +- * Check if the GFP flags allow compaction - GFP_NOIO is really +- * tricky context because the migration might require IO +- */ +- if (!may_perform_io) ++ if (!gfp_compaction_allowed(gfp_mask)) + return COMPACT_SKIPPED; + + trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio); +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -4903,6 +4903,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, u + struct alloc_context *ac) + { + bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; ++ bool can_compact = gfp_compaction_allowed(gfp_mask); + const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER; + struct page *page = NULL; + unsigned int alloc_flags; +@@ -4968,7 +4969,7 @@ restart: + * Don't try this for allocations that are allowed to ignore + * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen. + */ +- if (can_direct_reclaim && ++ if (can_direct_reclaim && can_compact && + (costly_order || + (order > 0 && ac->migratetype != MIGRATE_MOVABLE)) + && !gfp_pfmemalloc_allowed(gfp_mask)) { +@@ -5065,9 +5066,10 @@ retry: + + /* + * Do not retry costly high order allocations unless they are +- * __GFP_RETRY_MAYFAIL ++ * __GFP_RETRY_MAYFAIL and we can compact + */ +- if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL)) ++ if (costly_order && (!can_compact || ++ !(gfp_mask & __GFP_RETRY_MAYFAIL))) + goto nopage; + + if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, +@@ -5080,7 +5082,7 @@ retry: + * implementation of the compaction depends on the sufficient amount + * of free memory (see __compaction_suitable) + */ +- if (did_some_progress > 0 && ++ if (did_some_progress > 0 && can_compact && + should_compact_retry(ac, order, alloc_flags, + compact_result, &compact_priority, + &compaction_retries)) +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -2834,7 +2834,7 @@ static void shrink_lruvec(struct lruvec + /* Use reclaim/compaction for costly allocs or under memory pressure */ + static bool in_reclaim_compaction(struct scan_control *sc) + { +- if (IS_ENABLED(CONFIG_COMPACTION) && sc->order && ++ if (gfp_compaction_allowed(sc->gfp_mask) && sc->order && + (sc->order > PAGE_ALLOC_COSTLY_ORDER || + sc->priority < DEF_PRIORITY - 2)) + return true; +@@ -3167,6 +3167,9 @@ static inline bool compaction_ready(stru + unsigned long watermark; + enum compact_result suitable; + ++ if (!gfp_compaction_allowed(sc->gfp_mask)) ++ return false; ++ + suitable = compaction_suitable(zone, sc->order, 0, sc->reclaim_idx); + if (suitable == COMPACT_SUCCESS) + /* Allocation should succeed already. Don't reclaim. */ diff --git a/queue-5.15/series b/queue-5.15/series index a76e92a471f..89ba4088b1b 100644 --- a/queue-5.15/series +++ b/queue-5.15/series @@ -629,3 +629,7 @@ net-rds-fix-possible-cp-null-dereference.patch locking-rwsem-disable-preemption-while-trying-for-rwsem-lock.patch io_uring-ensure-0-is-returned-on-file-registration-success.patch revert-x86-mm-ident_map-use-gbpages-only-where-full-gb-page-should-be-mapped.patch +mm-vmscan-prevent-infinite-loop-for-costly-gfp_noio-__gfp_retry_mayfail-allocations.patch +thermal-devfreq_cooling-fix-perf-state-when-calculate-dfc-res_util.patch +kvm-x86-bail-to-userspace-if-emulation-of-atomic-user-access-faults.patch +kvm-x86-mark-target-gfn-of-emulated-atomic-instruction-as-dirty.patch diff --git a/queue-5.15/thermal-devfreq_cooling-fix-perf-state-when-calculate-dfc-res_util.patch b/queue-5.15/thermal-devfreq_cooling-fix-perf-state-when-calculate-dfc-res_util.patch new file mode 100644 index 00000000000..37fbf1fe306 --- /dev/null +++ b/queue-5.15/thermal-devfreq_cooling-fix-perf-state-when-calculate-dfc-res_util.patch @@ -0,0 +1,42 @@ +From a26de34b3c77ae3a969654d94be49e433c947e3b Mon Sep 17 00:00:00 2001 +From: Ye Zhang +Date: Thu, 21 Mar 2024 18:21:00 +0800 +Subject: thermal: devfreq_cooling: Fix perf state when calculate dfc res_util +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Ye Zhang + +commit a26de34b3c77ae3a969654d94be49e433c947e3b upstream. + +The issue occurs when the devfreq cooling device uses the EM power model +and the get_real_power() callback is provided by the driver. + +The EM power table is sorted ascending,can't index the table by cooling +device state,so convert cooling state to performance state by +dfc->max_state - dfc->capped_state. + +Fixes: 615510fe13bd ("thermal: devfreq_cooling: remove old power model and use EM") +Cc: 5.11+ # 5.11+ +Signed-off-by: Ye Zhang +Reviewed-by: Dhruva Gole +Reviewed-by: Lukasz Luba +Signed-off-by: Rafael J. Wysocki +Signed-off-by: Lukasz Luba +Signed-off-by: Greg Kroah-Hartman +--- + drivers/thermal/devfreq_cooling.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/drivers/thermal/devfreq_cooling.c ++++ b/drivers/thermal/devfreq_cooling.c +@@ -199,7 +199,7 @@ static int devfreq_cooling_get_requested + + res = dfc->power_ops->get_real_power(df, power, freq, voltage); + if (!res) { +- state = dfc->capped_state; ++ state = dfc->max_state - dfc->capped_state; + dfc->res_util = dfc->em_pd->table[state].power; + dfc->res_util *= SCALE_ERROR_MITIGATION; +