-From f4096b3c428fb472ade1b6c65952ea7a7866f9d7 Mon Sep 17 00:00:00 2001
+From d13ba9b5b8a29bbeeff08a73e2d50f696acee303 Mon Sep 17 00:00:00 2001
From: Sasha Levin <sashal@kernel.org>
-Date: Thu, 24 Oct 2019 23:03:20 +0800
-Subject: mm/memfd: should be lock the radix_tree when iterating its slot
+Date: Fri, 25 Oct 2019 09:58:34 -0700
+Subject: memfd: Fix locking when tagging pins
-From: zhong jiang <zhongjiang@huawei.com>
+From: Matthew Wilcox (Oracle) <willy@infradead.org>
-Recently, We test an linux 4.19 stable and find the following issue.
+The RCU lock is insufficient to protect the radix tree iteration as
+a deletion from the tree can occur before we take the spinlock to
+tag the entry. In 4.19, this has manifested as a bug with the following
+trace:
kernel BUG at lib/radix-tree.c:1429!
invalid opcode: 0000 [#1] SMP KASAN PTI
__x64_sys_fcntl+0x12d/0x180 fs/fcntl.c:448
do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:293
-By reviewing the code, I find that there is an race between iterate
-the radix_tree and radix_tree_insert/delete. Because the former just
-access its slot in rcu protected period. but it fails to prevent the
-radix_tree from being changed.
-
-On Thu, Oct 24, 2019 at 10:41:15AM -0700, Matthew Wilcox wrote:
->The locking here now matches the locking in memfd_tag_pins() that
->was changed in ef3038a573aa8bf2f3797b110f7244b55a0e519c (part of 4.20-rc1).
->I didn't notice that I was fixing a bug when I changed the locking.
->This bug has been present since 05f65b5c70909ef686f865f0a85406d74d75f70f
->(part of 3.17) so backports will need to go further back. This code has
->moved around a bit (mm/shmem.c) and the APIs have changed, so it will
->take some effort
+The problem does not occur in mainline due to the XArray rewrite which
+changed the locking to exclude modification of the tree during iteration.
+At the time, nobody realised this was a bugfix. Backport the locking
+changes to stable.
Cc: stable@vger.kernel.org
-Signed-off-by: zhong jiang <zhongjiang@huawei.com>
-Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
+Reported-by: zhong jiang <zhongjiang@huawei.com>
+Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
- mm/memfd.c | 8 +++-----
- 1 file changed, 3 insertions(+), 5 deletions(-)
+ mm/memfd.c | 18 ++++++++++--------
+ 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/mm/memfd.c b/mm/memfd.c
-index 2bb5e257080e9..0b3fedc779236 100644
+index 2bb5e257080e9..5859705dafe19 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
-@@ -37,8 +37,8 @@ static void memfd_tag_pins(struct address_space *mapping)
+@@ -34,11 +34,12 @@ static void memfd_tag_pins(struct address_space *mapping)
+ void __rcu **slot;
+ pgoff_t start;
+ struct page *page;
++ unsigned int tagged = 0;
lru_add_drain();
start = 0;
radix_tree_for_each_slot(slot, &mapping->i_pages, &iter, start) {
page = radix_tree_deref_slot(slot);
if (!page || radix_tree_exception(page)) {
-@@ -47,18 +47,16 @@ static void memfd_tag_pins(struct address_space *mapping)
+@@ -47,18 +48,19 @@ static void memfd_tag_pins(struct address_space *mapping)
continue;
}
} else if (page_count(page) - page_mapcount(page) > 1) {
- xa_unlock_irq(&mapping->i_pages);
}
- if (need_resched()) {
- slot = radix_tree_iter_resume(slot, &iter);
+- if (need_resched()) {
+- slot = radix_tree_iter_resume(slot, &iter);
- cond_resched_rcu();
-+ cond_resched_lock(&mapping->i_pages.xa_lock);
- }
+- }
++ if (++tagged % 1024)
++ continue;
++
++ slot = radix_tree_iter_resume(slot, &iter);
++ xa_unlock_irq(&mapping->i_pages);
++ cond_resched();
++ xa_lock_irq(&mapping->i_pages);
}
- rcu_read_unlock();
+ xa_unlock_irq(&mapping->i_pages);