From f8a39fcf03a7f5d32f1f95fe0bac78f8fba2c7d3 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 18 Oct 2024 11:12:25 +0200 Subject: [PATCH] 6.6-stable patches added patches: maple_tree-correct-tree-corruption-on-spanning-store.patch --- ...ct-tree-corruption-on-spanning-store.patch | 245 ++++++++++++++++++ queue-6.6/series | 1 + 2 files changed, 246 insertions(+) create mode 100644 queue-6.6/maple_tree-correct-tree-corruption-on-spanning-store.patch diff --git a/queue-6.6/maple_tree-correct-tree-corruption-on-spanning-store.patch b/queue-6.6/maple_tree-correct-tree-corruption-on-spanning-store.patch new file mode 100644 index 00000000000..76b624778b7 --- /dev/null +++ b/queue-6.6/maple_tree-correct-tree-corruption-on-spanning-store.patch @@ -0,0 +1,245 @@ +From bea07fd63192b61209d48cbb81ef474cc3ee4c62 Mon Sep 17 00:00:00 2001 +From: Lorenzo Stoakes +Date: Mon, 7 Oct 2024 16:28:32 +0100 +Subject: maple_tree: correct tree corruption on spanning store + +From: Lorenzo Stoakes + +commit bea07fd63192b61209d48cbb81ef474cc3ee4c62 upstream. + +Patch series "maple_tree: correct tree corruption on spanning store", v3. + +There has been a nasty yet subtle maple tree corruption bug that appears +to have been in existence since the inception of the algorithm. + +This bug seems far more likely to happen since commit f8d112a4e657 +("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point +at which reports started to be submitted concerning this bug. + +We were made definitely aware of the bug thanks to the kind efforts of +Bert Karwatzki who helped enormously in my being able to track this down +and identify the cause of it. + +The bug arises when an attempt is made to perform a spanning store across +two leaf nodes, where the right leaf node is the rightmost child of the +shared parent, AND the store completely consumes the right-mode node. + +This results in mas_wr_spanning_store() mitakenly duplicating the new and +existing entries at the maximum pivot within the range, and thus maple +tree corruption. + +The fix patch corrects this by detecting this scenario and disallowing the +mistaken duplicate copy. + +The fix patch commit message goes into great detail as to how this occurs. + +This series also includes a test which reliably reproduces the issue, and +asserts that the fix works correctly. + +Bert has kindly tested the fix and confirmed it resolved his issues. Also +Mikhail Gavrilov kindly reported what appears to be precisely the same +bug, which this fix should also resolve. + + +This patch (of 2): + +There has been a subtle bug present in the maple tree implementation from +its inception. + +This arises from how stores are performed - when a store occurs, it will +overwrite overlapping ranges and adjust the tree as necessary to +accommodate this. + +A range may always ultimately span two leaf nodes. In this instance we +walk the two leaf nodes, determine which elements are not overwritten to +the left and to the right of the start and end of the ranges respectively +and then rebalance the tree to contain these entries and the newly +inserted one. + +This kind of store is dubbed a 'spanning store' and is implemented by +mas_wr_spanning_store(). + +In order to reach this stage, mas_store_gfp() invokes +mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to +walk the tree and update the object (mas) to traverse to the location +where the write should be performed, determining its store type. + +When a spanning store is required, this function returns false stopping at +the parent node which contains the target range, and mas_wr_store_type() +marks the mas->store_type as wr_spanning_store to denote this fact. + +When we go to perform the store in mas_wr_spanning_store(), we first +determine the elements AFTER the END of the range we wish to store (that +is, to the right of the entry to be inserted) - we do this by walking to +the NEXT pivot in the tree (i.e. r_mas.last + 1), starting at the node we +have just determined contains the range over which we intend to write. + +We then turn our attention to the entries to the left of the entry we are +inserting, whose state is represented by l_mas, and copy these into a 'big +node', which is a special node which contains enough slots to contain two +leaf node's worth of data. + +We then copy the entry we wish to store immediately after this - the copy +and the insertion of the new entry is performed by mas_store_b_node(). + +After this we copy the elements to the right of the end of the range which +we are inserting, if we have not exceeded the length of the node (i.e. +r_mas.offset <= r_mas.end). + +Herein lies the bug - under very specific circumstances, this logic can +break and corrupt the maple tree. + +Consider the following tree: + +Height + 0 Root Node + / \ + pivot = 0xffff / \ pivot = ULONG_MAX + / \ + 1 A [-----] ... + / \ + pivot = 0x4fff / \ pivot = 0xffff + / \ + 2 (LEAVES) B [-----] [-----] C + ^--- Last pivot 0xffff. + +Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note +that all ranges expressed in maple tree code are inclusive): + +1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then + determines that this is a spanning store across nodes B and C. The mas + state is set such that the current node from which we traverse further + is node A. + +2. In mas_wr_spanning_store() we try to find elements to the right of pivot + 0xffff by searching for an index of 0x10000: + + - mas_wr_walk_index() invokes mas_wr_walk_descend() and + mas_wr_node_walk() in turn. + + - mas_wr_node_walk() loops over entries in node A until EITHER it + finds an entry whose pivot equals or exceeds 0x10000 OR it + reaches the final entry. + + - Since no entry has a pivot equal to or exceeding 0x10000, pivot + 0xffff is selected, leading to node C. + + - mas_wr_walk_traverse() resets the mas state to traverse node C. We + loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk() + in turn once again. + + - Again, we reach the last entry in node C, which has a pivot of + 0xffff. + +3. We then copy the elements to the left of 0x4000 in node B to the big + node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry + too. + +4. We determine whether we have any entries to copy from the right of the + end of the range via - and with r_mas set up at the entry at pivot + 0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at + pivot 0xffff. + +5. BUG! The maple tree is corrupted with a duplicate entry. + +This requires a very specific set of circumstances - we must be spanning +the last element in a leaf node, which is the last element in the parent +node. + +spanning store across two leaf nodes with a range that ends at that shared +pivot. + +A potential solution to this problem would simply be to reset the walk +each time we traverse r_mas, however given the rarity of this situation it +seems that would be rather inefficient. + +Instead, this patch detects if the right hand node is populated, i.e. has +anything we need to copy. + +We do so by only copying elements from the right of the entry being +inserted when the maximum value present exceeds the last, rather than +basing this on offset position. + +The patch also updates some comments and eliminates the unused bool return +value in mas_wr_walk_index(). + +The work performed in commit f8d112a4e657 ("mm/mmap: avoid zeroing vma +tree in mmap_region()") seems to have made the probability of this event +much more likely, which is the point at which reports started to be +submitted concerning this bug. + +The motivation for this change arose from Bert Karwatzki's report of +encountering mm instability after the release of kernel v6.12-rc1 which, +after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration +options, was identified as maple tree corruption. + +After Bert very generously provided his time and ability to reproduce this +event consistently, I was able to finally identify that the issue +discussed in this commit message was occurring for him. + +Link: https://lkml.kernel.org/r/cover.1728314402.git.lorenzo.stoakes@oracle.com +Link: https://lkml.kernel.org/r/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com +Fixes: 54a611b60590 ("Maple Tree: add new data structure") +Signed-off-by: Lorenzo Stoakes +Reported-by: Bert Karwatzki +Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/ +Tested-by: Bert Karwatzki +Reported-by: Mikhail Gavrilov +Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7-+Dg@mail.gmail.com/ +Acked-by: Vlastimil Babka +Reviewed-by: Liam R. Howlett +Tested-by: Mikhail Gavrilov +Reviewed-by: Wei Yang +Cc: Matthew Wilcox +Cc: Sidhartha Kumar +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + lib/maple_tree.c | 12 ++++++------ + 1 file changed, 6 insertions(+), 6 deletions(-) + +--- a/lib/maple_tree.c ++++ b/lib/maple_tree.c +@@ -2228,6 +2228,8 @@ static inline struct maple_enode *mte_no + + /* + * mas_wr_node_walk() - Find the correct offset for the index in the @mas. ++ * If @mas->index cannot be found within the containing ++ * node, we traverse to the last entry in the node. + * @wr_mas: The maple write state + * + * Uses mas_slot_locked() and does not need to worry about dead nodes. +@@ -3643,7 +3645,7 @@ static bool mas_wr_walk(struct ma_wr_sta + return true; + } + +-static bool mas_wr_walk_index(struct ma_wr_state *wr_mas) ++static void mas_wr_walk_index(struct ma_wr_state *wr_mas) + { + struct ma_state *mas = wr_mas->mas; + +@@ -3652,11 +3654,9 @@ static bool mas_wr_walk_index(struct ma_ + wr_mas->content = mas_slot_locked(mas, wr_mas->slots, + mas->offset); + if (ma_is_leaf(wr_mas->type)) +- return true; ++ return; + mas_wr_walk_traverse(wr_mas); +- + } +- return true; + } + /* + * mas_extend_spanning_null() - Extend a store of a %NULL to include surrounding %NULLs. +@@ -3892,8 +3892,8 @@ static inline int mas_wr_spanning_store( + memset(&b_node, 0, sizeof(struct maple_big_node)); + /* Copy l_mas and store the value in b_node. */ + mas_store_b_node(&l_wr_mas, &b_node, l_wr_mas.node_end); +- /* Copy r_mas into b_node. */ +- if (r_mas.offset <= r_wr_mas.node_end) ++ /* Copy r_mas into b_node if there is anything to copy. */ ++ if (r_mas.max > r_mas.last) + mas_mab_cp(&r_mas, r_mas.offset, r_wr_mas.node_end, + &b_node, b_node.b_end + 1); + else diff --git a/queue-6.6/series b/queue-6.6/series index 171f49fe4dc..67f426a646c 100644 --- a/queue-6.6/series +++ b/queue-6.6/series @@ -42,3 +42,4 @@ xfs-fix-unlink-vs-cluster-buffer-instantiation-race.patch xfs-fix-freeing-speculative-preallocations-for-preallocated-files.patch xfs-allow-unlinked-symlinks-and-dirs-with-zero-size.patch xfs-restrict-when-we-try-to-align-cow-fork-delalloc-to-cowextsz-hints.patch +maple_tree-correct-tree-corruption-on-spanning-store.patch -- 2.47.3