From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon, 13 Aug 2018 09:54:01 +0000 (+0200)
Subject: 4.14-stable patches
X-Git-Tag: v4.18.1~31
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=da46cd97d7775a6aa7ce830449d69f3d4cedef59;p=thirdparty%2Fkernel%2Fstable-queue.git

4.14-stable patches

added patches:
	fix-__legitimize_mnt-mntput-race.patch
	fix-mntput-mntput-race.patch
	make-sure-that-__dentry_kill-always-invalidates-d_seq-unhashed-or-not.patch
	root-dentries-need-rcu-delayed-freeing.patch
---

diff --git a/queue-4.14/fix-__legitimize_mnt-mntput-race.patch b/queue-4.14/fix-__legitimize_mnt-mntput-race.patch
new file mode 100644
index 00000000000..5030641860d
--- /dev/null
+++ b/queue-4.14/fix-__legitimize_mnt-mntput-race.patch
@@ -0,0 +1,82 @@
+From 119e1ef80ecfe0d1deb6378d4ab41f5b71519de1 Mon Sep 17 00:00:00 2001
+From: Al Viro <viro@zeniv.linux.org.uk>
+Date: Thu, 9 Aug 2018 17:51:32 -0400
+Subject: fix __legitimize_mnt()/mntput() race
+
+From: Al Viro <viro@zeniv.linux.org.uk>
+
+commit 119e1ef80ecfe0d1deb6378d4ab41f5b71519de1 upstream.
+
+__legitimize_mnt() has two problems - one is that in case of success
+the check of mount_lock is not ordered wrt preceding increment of
+refcount, making it possible to have successful __legitimize_mnt()
+on one CPU just before the otherwise final mntpu() on another,
+with __legitimize_mnt() not seeing mntput() taking the lock and
+mntput() not seeing the increment done by __legitimize_mnt().
+Solved by a pair of barriers.
+
+Another is that failure of __legitimize_mnt() on the second
+read_seqretry() leaves us with reference that'll need to be
+dropped by caller; however, if that races with final mntput()
+we can end up with caller dropping rcu_read_lock() and doing
+mntput() to release that reference - with the first mntput()
+having freed the damn thing just as rcu_read_lock() had been
+dropped.  Solution: in "do mntput() yourself" failure case
+grab mount_lock, check if MNT_DOOMED has been set by racing
+final mntput() that has missed our increment and if it has -
+undo the increment and treat that as "failure, caller doesn't
+need to drop anything" case.
+
+It's not easy to hit - the final mntput() has to come right
+after the first read_seqretry() in __legitimize_mnt() *and*
+manage to miss the increment done by __legitimize_mnt() before
+the second read_seqretry() in there.  The things that are almost
+impossible to hit on bare hardware are not impossible on SMP
+KVM, though...
+
+Reported-by: Oleg Nesterov <oleg@redhat.com>
+Fixes: 48a066e72d97 ("RCU'd vsfmounts")
+Cc: stable@vger.kernel.org
+Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/namespace.c |   14 ++++++++++++++
+ 1 file changed, 14 insertions(+)
+
+--- a/fs/namespace.c
++++ b/fs/namespace.c
+@@ -659,12 +659,21 @@ int __legitimize_mnt(struct vfsmount *ba
+ 		return 0;
+ 	mnt = real_mount(bastard);
+ 	mnt_add_count(mnt, 1);
++	smp_mb();			// see mntput_no_expire()
+ 	if (likely(!read_seqretry(&mount_lock, seq)))
+ 		return 0;
+ 	if (bastard->mnt_flags & MNT_SYNC_UMOUNT) {
+ 		mnt_add_count(mnt, -1);
+ 		return 1;
+ 	}
++	lock_mount_hash();
++	if (unlikely(bastard->mnt_flags & MNT_DOOMED)) {
++		mnt_add_count(mnt, -1);
++		unlock_mount_hash();
++		return 1;
++	}
++	unlock_mount_hash();
++	/* caller will mntput() */
+ 	return -1;
+ }
+ 
+@@ -1210,6 +1219,11 @@ static void mntput_no_expire(struct moun
+ 		return;
+ 	}
+ 	lock_mount_hash();
++	/*
++	 * make sure that if __legitimize_mnt() has not seen us grab
++	 * mount_lock, we'll see their refcount increment here.
++	 */
++	smp_mb();
+ 	mnt_add_count(mnt, -1);
+ 	if (mnt_get_count(mnt)) {
+ 		rcu_read_unlock();
diff --git a/queue-4.14/fix-mntput-mntput-race.patch b/queue-4.14/fix-mntput-mntput-race.patch
new file mode 100644
index 00000000000..e201337f94d
--- /dev/null
+++ b/queue-4.14/fix-mntput-mntput-race.patch
@@ -0,0 +1,77 @@
+From 9ea0a46ca2c318fcc449c1e6b62a7230a17888f1 Mon Sep 17 00:00:00 2001
+From: Al Viro <viro@zeniv.linux.org.uk>
+Date: Thu, 9 Aug 2018 17:21:17 -0400
+Subject: fix mntput/mntput race
+
+From: Al Viro <viro@zeniv.linux.org.uk>
+
+commit 9ea0a46ca2c318fcc449c1e6b62a7230a17888f1 upstream.
+
+mntput_no_expire() does the calculation of total refcount under mount_lock;
+unfortunately, the decrement (as well as all increments) are done outside
+of it, leading to false positives in the "are we dropping the last reference"
+test.  Consider the following situation:
+	* mnt is a lazy-umounted mount, kept alive by two opened files.  One
+of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
+mntput(mnt) (called from __fput()) drops one reference, decrementing component
+	* After it has looked at component #0, the process on CPU 0 does
+mntget(), incrementing component #0, gets preempted and gets to run again -
+on CPU 69.  There it does mntput(), which drops the reference (component #69)
+and proceeds to spin on mount_lock.
+	* On CPU 42 our first mntput() finishes counting.  It observes the
+decrement of component #69, but not the increment of component #0.  As the
+result, the total it gets is not 1 as it should've been - it's 0.  At which
+point we decide that vfsmount needs to be killed and proceed to free it and
+shut the filesystem down.  However, there's still another opened file
+on that filesystem, with reference to (now freed) vfsmount, etc. and we are
+screwed.
+
+It's not a wide race, but it can be reproduced with artificial slowdown of
+the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
+
+Fix consists of moving the refcount decrement under mount_lock; the tricky
+part is that we want (and can) keep the fast case (i.e. mount that still
+has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
+mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
+before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
+a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
+be dropped.
+
+Reported-by: Jann Horn <jannh@google.com>
+Tested-by: Jann Horn <jannh@google.com>
+Fixes: 48a066e72d97 ("RCU'd vsfmounts")
+Cc: stable@vger.kernel.org
+Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/namespace.c |   14 ++++++++++++--
+ 1 file changed, 12 insertions(+), 2 deletions(-)
+
+--- a/fs/namespace.c
++++ b/fs/namespace.c
+@@ -1195,12 +1195,22 @@ static DECLARE_DELAYED_WORK(delayed_mntp
+ static void mntput_no_expire(struct mount *mnt)
+ {
+ 	rcu_read_lock();
+-	mnt_add_count(mnt, -1);
+-	if (likely(mnt->mnt_ns)) { /* shouldn't be the last one */
++	if (likely(READ_ONCE(mnt->mnt_ns))) {
++		/*
++		 * Since we don't do lock_mount_hash() here,
++		 * ->mnt_ns can change under us.  However, if it's
++		 * non-NULL, then there's a reference that won't
++		 * be dropped until after an RCU delay done after
++		 * turning ->mnt_ns NULL.  So if we observe it
++		 * non-NULL under rcu_read_lock(), the reference
++		 * we are dropping is not the final one.
++		 */
++		mnt_add_count(mnt, -1);
+ 		rcu_read_unlock();
+ 		return;
+ 	}
+ 	lock_mount_hash();
++	mnt_add_count(mnt, -1);
+ 	if (mnt_get_count(mnt)) {
+ 		rcu_read_unlock();
+ 		unlock_mount_hash();
diff --git a/queue-4.14/make-sure-that-__dentry_kill-always-invalidates-d_seq-unhashed-or-not.patch b/queue-4.14/make-sure-that-__dentry_kill-always-invalidates-d_seq-unhashed-or-not.patch
new file mode 100644
index 00000000000..3fbf1a4fa8a
--- /dev/null
+++ b/queue-4.14/make-sure-that-__dentry_kill-always-invalidates-d_seq-unhashed-or-not.patch
@@ -0,0 +1,50 @@
+From 4c0d7cd5c8416b1ef41534d19163cb07ffaa03ab Mon Sep 17 00:00:00 2001
+From: Al Viro <viro@zeniv.linux.org.uk>
+Date: Thu, 9 Aug 2018 10:15:54 -0400
+Subject: make sure that __dentry_kill() always invalidates d_seq, unhashed or not
+
+From: Al Viro <viro@zeniv.linux.org.uk>
+
+commit 4c0d7cd5c8416b1ef41534d19163cb07ffaa03ab upstream.
+
+RCU pathwalk relies upon the assumption that anything that changes
+->d_inode of a dentry will invalidate its ->d_seq.  That's almost
+true - the one exception is that the final dput() of already unhashed
+dentry does *not* touch ->d_seq at all.  Unhashing does, though,
+so for anything we'd found by RCU dcache lookup we are fine.
+Unfortunately, we can *start* with an unhashed dentry or jump into
+it.
+
+We could try and be careful in the (few) places where that could
+happen.  Or we could just make the final dput() invalidate the damn
+thing, unhashed or not.  The latter is much simpler and easier to
+backport, so let's do it that way.
+
+Reported-by: "Dae R. Jeong" <threeearcat@gmail.com>
+Cc: stable@vger.kernel.org
+Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/dcache.c |    7 ++-----
+ 1 file changed, 2 insertions(+), 5 deletions(-)
+
+--- a/fs/dcache.c
++++ b/fs/dcache.c
+@@ -357,14 +357,11 @@ static void dentry_unlink_inode(struct d
+ 	__releases(dentry->d_inode->i_lock)
+ {
+ 	struct inode *inode = dentry->d_inode;
+-	bool hashed = !d_unhashed(dentry);
+ 
+-	if (hashed)
+-		raw_write_seqcount_begin(&dentry->d_seq);
++	raw_write_seqcount_begin(&dentry->d_seq);
+ 	__d_clear_type_and_inode(dentry);
+ 	hlist_del_init(&dentry->d_u.d_alias);
+-	if (hashed)
+-		raw_write_seqcount_end(&dentry->d_seq);
++	raw_write_seqcount_end(&dentry->d_seq);
+ 	spin_unlock(&dentry->d_lock);
+ 	spin_unlock(&inode->i_lock);
+ 	if (!inode->i_nlink)
diff --git a/queue-4.14/root-dentries-need-rcu-delayed-freeing.patch b/queue-4.14/root-dentries-need-rcu-delayed-freeing.patch
new file mode 100644
index 00000000000..cd9f08ebc66
--- /dev/null
+++ b/queue-4.14/root-dentries-need-rcu-delayed-freeing.patch
@@ -0,0 +1,47 @@
+From 90bad5e05bcdb0308cfa3d3a60f5c0b9c8e2efb3 Mon Sep 17 00:00:00 2001
+From: Al Viro <viro@zeniv.linux.org.uk>
+Date: Mon, 6 Aug 2018 09:03:58 -0400
+Subject: root dentries need RCU-delayed freeing
+
+From: Al Viro <viro@zeniv.linux.org.uk>
+
+commit 90bad5e05bcdb0308cfa3d3a60f5c0b9c8e2efb3 upstream.
+
+Since mountpoint crossing can happen without leaving lazy mode,
+root dentries do need the same protection against having their
+memory freed without RCU delay as everything else in the tree.
+
+It's partially hidden by RCU delay between detaching from the
+mount tree and dropping the vfsmount reference, but the starting
+point of pathwalk can be on an already detached mount, in which
+case umount-caused RCU delay has already passed by the time the
+lazy pathwalk grabs rcu_read_lock().  If the starting point
+happens to be at the root of that vfsmount *and* that vfsmount
+covers the entire filesystem, we get trouble.
+
+Fixes: 48a066e72d97 ("RCU'd vsfmounts")
+Cc: stable@vger.kernel.org
+Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/dcache.c |    6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+--- a/fs/dcache.c
++++ b/fs/dcache.c
+@@ -1922,10 +1922,12 @@ struct dentry *d_make_root(struct inode
+ 
+ 	if (root_inode) {
+ 		res = __d_alloc(root_inode->i_sb, NULL);
+-		if (res)
++		if (res) {
++			res->d_flags |= DCACHE_RCUACCESS;
+ 			d_instantiate(res, root_inode);
+-		else
++		} else {
+ 			iput(root_inode);
++		}
+ 	}
+ 	return res;
+ }
diff --git a/queue-4.14/series b/queue-4.14/series
index f8c07c0d710..d7002b456c5 100644
--- a/queue-4.14/series
+++ b/queue-4.14/series
@@ -10,3 +10,7 @@ xen-netfront-don-t-cache-skb_shinfo.patch
 scsi-sr-avoid-that-opening-a-cd-rom-hangs-with-runtime-power-management-enabled.patch
 scsi-qla2xxx-fix-memory-leak-for-allocating-abort-iocb.patch
 init-rename-and-re-order-boot_cpu_state_init.patch
+root-dentries-need-rcu-delayed-freeing.patch
+make-sure-that-__dentry_kill-always-invalidates-d_seq-unhashed-or-not.patch
+fix-mntput-mntput-race.patch
+fix-__legitimize_mnt-mntput-race.patch