]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob - queue-5.10/mm-swap-fix-race-between-free_swap_and_cache-and-swa.patch
Fixes for 5.10
[thirdparty/kernel/stable-queue.git] / queue-5.10 / mm-swap-fix-race-between-free_swap_and_cache-and-swa.patch
1 From 01c918ea3e4f7778cef3a485cedc774e4cfb77fb Mon Sep 17 00:00:00 2001
2 From: Sasha Levin <sashal@kernel.org>
3 Date: Wed, 6 Mar 2024 14:03:56 +0000
4 Subject: mm: swap: fix race between free_swap_and_cache() and swapoff()
5
6 From: Ryan Roberts <ryan.roberts@arm.com>
7
8 [ Upstream commit 82b1c07a0af603e3c47b906c8e991dc96f01688e ]
9
10 There was previously a theoretical window where swapoff() could run and
11 teardown a swap_info_struct while a call to free_swap_and_cache() was
12 running in another thread. This could cause, amongst other bad
13 possibilities, swap_page_trans_huge_swapped() (called by
14 free_swap_and_cache()) to access the freed memory for swap_map.
15
16 This is a theoretical problem and I haven't been able to provoke it from a
17 test case. But there has been agreement based on code review that this is
18 possible (see link below).
19
20 Fix it by using get_swap_device()/put_swap_device(), which will stall
21 swapoff(). There was an extra check in _swap_info_get() to confirm that
22 the swap entry was not free. This isn't present in get_swap_device()
23 because it doesn't make sense in general due to the race between getting
24 the reference and swapoff. So I've added an equivalent check directly in
25 free_swap_and_cache().
26
27 Details of how to provoke one possible issue (thanks to David Hildenbrand
28 for deriving this):
29
30 --8<-----
31
32 __swap_entry_free() might be the last user and result in
33 "count == SWAP_HAS_CACHE".
34
35 swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0.
36
37 So the question is: could someone reclaim the folio and turn
38 si->inuse_pages==0, before we completed swap_page_trans_huge_swapped().
39
40 Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are
41 still references by swap entries.
42
43 Process 1 still references subpage 0 via swap entry.
44 Process 2 still references subpage 1 via swap entry.
45
46 Process 1 quits. Calls free_swap_and_cache().
47 -> count == SWAP_HAS_CACHE
48 [then, preempted in the hypervisor etc.]
49
50 Process 2 quits. Calls free_swap_and_cache().
51 -> count == SWAP_HAS_CACHE
52
53 Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls
54 __try_to_reclaim_swap().
55
56 __try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
57 put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
58 swap_entry_free()->swap_range_free()->
59 ...
60 WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries);
61
62 What stops swapoff to succeed after process 2 reclaimed the swap cache
63 but before process1 finished its call to swap_page_trans_huge_swapped()?
64
65 --8<-----
66
67 Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@arm.com
68 Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch")
69 Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@redhat.com/
70 Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
71 Cc: David Hildenbrand <david@redhat.com>
72 Cc: "Huang, Ying" <ying.huang@intel.com>
73 Cc: <stable@vger.kernel.org>
74 Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
75 Signed-off-by: Sasha Levin <sashal@kernel.org>
76 ---
77 mm/swapfile.c | 13 ++++++++++++-
78 1 file changed, 12 insertions(+), 1 deletion(-)
79
80 diff --git a/mm/swapfile.c b/mm/swapfile.c
81 index 86ade667a7af6..4ca1d04d8732f 100644
82 --- a/mm/swapfile.c
83 +++ b/mm/swapfile.c
84 @@ -1271,6 +1271,11 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p,
85 }
86
87 /*
88 + * Note that when only holding the PTL, swapoff might succeed immediately
89 + * after freeing a swap entry. Therefore, immediately after
90 + * __swap_entry_free(), the swap info might become stale and should not
91 + * be touched without a prior get_swap_device().
92 + *
93 * Check whether swap entry is valid in the swap device. If so,
94 * return pointer to swap_info_struct, and keep the swap entry valid
95 * via preventing the swap device from being swapoff, until
96 @@ -1797,13 +1802,19 @@ int free_swap_and_cache(swp_entry_t entry)
97 if (non_swap_entry(entry))
98 return 1;
99
100 - p = _swap_info_get(entry);
101 + p = get_swap_device(entry);
102 if (p) {
103 + if (WARN_ON(data_race(!p->swap_map[swp_offset(entry)]))) {
104 + put_swap_device(p);
105 + return 0;
106 + }
107 +
108 count = __swap_entry_free(p, entry);
109 if (count == SWAP_HAS_CACHE &&
110 !swap_page_trans_huge_swapped(p, entry))
111 __try_to_reclaim_swap(p, swp_offset(entry),
112 TTRS_UNMAPPED | TTRS_FULL);
113 + put_swap_device(p);
114 }
115 return p != NULL;
116 }
117 --
118 2.43.0
119