From adef9255e51300b477380c41bf0377bbe81bc47e Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 26 Jan 2024 16:19:34 -0800 Subject: [PATCH] 5.10-stable patches added patches: rename-fix-the-locking-of-subdirectories.patch --- ...me-fix-the-locking-of-subdirectories.patch | 276 ++++++++++++++++++ queue-5.10/series | 1 + 2 files changed, 277 insertions(+) create mode 100644 queue-5.10/rename-fix-the-locking-of-subdirectories.patch diff --git a/queue-5.10/rename-fix-the-locking-of-subdirectories.patch b/queue-5.10/rename-fix-the-locking-of-subdirectories.patch new file mode 100644 index 00000000000..cfee0238feb --- /dev/null +++ b/queue-5.10/rename-fix-the-locking-of-subdirectories.patch @@ -0,0 +1,276 @@ +From 22e111ed6c83dcde3037fc81176012721bc34c0b Mon Sep 17 00:00:00 2001 +From: Al Viro +Date: Sun, 19 Nov 2023 20:25:58 -0500 +Subject: rename(): fix the locking of subdirectories + +From: Al Viro + +commit 22e111ed6c83dcde3037fc81176012721bc34c0b upstream. + + We should never lock two subdirectories without having taken +->s_vfs_rename_mutex; inode pointer order or not, the "order" proposed +in 28eceeda130f "fs: Lock moved directories" is not transitive, with +the usual consequences. + + The rationale for locking renamed subdirectory in all cases was +the possibility of race between rename modifying .. in a subdirectory to +reflect the new parent and another thread modifying the same subdirectory. +For a lot of filesystems that's not a problem, but for some it can lead +to trouble (e.g. the case when short directory contents is kept in the +inode, but creating a file in it might push it across the size limit +and copy its contents into separate data block(s)). + + However, we need that only in case when the parent does change - +otherwise ->rename() doesn't need to do anything with .. entry in the +first place. Some instances are lazy and do a tautological update anyway, +but it's really not hard to avoid. + +Amended locking rules for rename(): + find the parent(s) of source and target + if source and target have the same parent + lock the common parent + else + lock ->s_vfs_rename_mutex + lock both parents, in ancestor-first order; if neither + is an ancestor of another, lock the parent of source + first. + find the source and target. + if source and target have the same parent + if operation is an overwriting rename of a subdirectory + lock the target subdirectory + else + if source is a subdirectory + lock the source + if target is a subdirectory + lock the target + lock non-directories involved, in inode pointer order if both + source and target are such. + +That way we are guaranteed that parents are locked (for obvious reasons), +that any renamed non-directory is locked (nfsd relies upon that), +that any victim is locked (emptiness check needs that, among other things) +and subdirectory that changes parent is locked (needed to protect the update +of .. entries). We are also guaranteed that any operation locking more +than one directory either takes ->s_vfs_rename_mutex or locks a parent +followed by its child. + +Cc: stable@vger.kernel.org +Fixes: 28eceeda130f "fs: Lock moved directories" +Reviewed-by: Jan Kara +Signed-off-by: Al Viro +Signed-off-by: Greg Kroah-Hartman +--- + Documentation/filesystems/directory-locking.rst | 29 ++++++----- + Documentation/filesystems/locking.rst | 5 +- + Documentation/filesystems/porting.rst | 18 +++++++ + fs/namei.c | 60 ++++++++++++++---------- + 4 files changed, 74 insertions(+), 38 deletions(-) + +--- a/Documentation/filesystems/directory-locking.rst ++++ b/Documentation/filesystems/directory-locking.rst +@@ -22,13 +22,16 @@ exclusive. + 3) object removal. Locking rules: caller locks parent, finds victim, + locks victim and calls the method. Locks are exclusive. + +-4) rename() that is _not_ cross-directory. Locking rules: caller locks the +-parent and finds source and target. We lock both (provided they exist). If we +-need to lock two inodes of different type (dir vs non-dir), we lock directory +-first. If we need to lock two inodes of the same type, lock them in inode +-pointer order. Then call the method. All locks are exclusive. +-NB: we might get away with locking the source (and target in exchange +-case) shared. ++4) rename() that is _not_ cross-directory. Locking rules: caller locks ++the parent and finds source and target. Then we decide which of the ++source and target need to be locked. Source needs to be locked if it's a ++non-directory; target - if it's a non-directory or about to be removed. ++Take the locks that need to be taken, in inode pointer order if need ++to take both (that can happen only when both source and target are ++non-directories - the source because it wouldn't be locked otherwise ++and the target because mixing directory and non-directory is allowed ++only with RENAME_EXCHANGE, and that won't be removing the target). ++After the locks had been taken, call the method. All locks are exclusive. + + 5) link creation. Locking rules: + +@@ -44,20 +47,17 @@ rules: + + * lock the filesystem + * lock parents in "ancestors first" order. If one is not ancestor of +- the other, lock them in inode pointer order. ++ the other, lock the parent of source first. + * find source and target. + * if old parent is equal to or is a descendent of target + fail with -ENOTEMPTY + * if new parent is equal to or is a descendent of source + fail with -ELOOP +- * Lock both the source and the target provided they exist. If we +- need to lock two inodes of different type (dir vs non-dir), we lock +- the directory first. If we need to lock two inodes of the same type, +- lock them in inode pointer order. ++ * Lock subdirectories involved (source before target). ++ * Lock non-directories involved, in inode pointer order. + * call the method. + +-All ->i_rwsem are taken exclusive. Again, we might get away with locking +-the source (and target in exchange case) shared. ++All ->i_rwsem are taken exclusive. + + The rules above obviously guarantee that all directories that are going to be + read, modified or removed by method will be locked by caller. +@@ -67,6 +67,7 @@ If no directory is its own ancestor, the + + Proof: + ++[XXX: will be updated once we are done massaging the lock_rename()] + First of all, at any moment we have a linear ordering of the + objects - A < B iff (A is an ancestor of B) or (B is not an ancestor + of A and ptr(A) < ptr(B)). +--- a/Documentation/filesystems/locking.rst ++++ b/Documentation/filesystems/locking.rst +@@ -95,7 +95,7 @@ symlink: exclusive + mkdir: exclusive + unlink: exclusive (both) + rmdir: exclusive (both)(see below) +-rename: exclusive (all) (see below) ++rename: exclusive (both parents, some children) (see below) + readlink: no + get_link: no + setattr: exclusive +@@ -113,6 +113,9 @@ tmpfile: no + Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem + exclusive on victim. + cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem. ++ ->unlink() and ->rename() have ->i_rwsem exclusive on all non-directories ++ involved. ++ ->rename() has ->i_rwsem exclusive on any subdirectory that changes parent. + + See Documentation/filesystems/directory-locking.rst for more detailed discussion + of the locking scheme for directory operations. +--- a/Documentation/filesystems/porting.rst ++++ b/Documentation/filesystems/porting.rst +@@ -865,3 +865,21 @@ no matter what. Everything is handled b + + clone_private_mount() returns a longterm mount now, so the proper destructor of + its result is kern_unmount() or kern_unmount_array(). ++ ++--- ++ ++**mandatory** ++ ++If ->rename() update of .. on cross-directory move needs an exclusion with ++directory modifications, do *not* lock the subdirectory in question in your ++->rename() - it's done by the caller now [that item should've been added in ++28eceeda130f "fs: Lock moved directories"]. ++ ++--- ++ ++**mandatory** ++ ++On same-directory ->rename() the (tautological) update of .. is not protected ++by any locks; just don't do it if the old parent is the same as the new one. ++We really can't lock two subdirectories in same-directory rename - not without ++deadlocks. +--- a/fs/namei.c ++++ b/fs/namei.c +@@ -2771,20 +2771,14 @@ struct dentry *lock_rename(struct dentry + p = d_ancestor(p2, p1); + if (p) { + inode_lock_nested(p2->d_inode, I_MUTEX_PARENT); +- inode_lock_nested(p1->d_inode, I_MUTEX_CHILD); ++ inode_lock_nested(p1->d_inode, I_MUTEX_PARENT2); + return p; + } + + p = d_ancestor(p1, p2); +- if (p) { +- inode_lock_nested(p1->d_inode, I_MUTEX_PARENT); +- inode_lock_nested(p2->d_inode, I_MUTEX_CHILD); +- return p; +- } +- +- lock_two_inodes(p1->d_inode, p2->d_inode, +- I_MUTEX_PARENT, I_MUTEX_PARENT2); +- return NULL; ++ inode_lock_nested(p1->d_inode, I_MUTEX_PARENT); ++ inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2); ++ return p; + } + EXPORT_SYMBOL(lock_rename); + +@@ -4260,11 +4254,12 @@ SYSCALL_DEFINE2(link, const char __user + * + * a) we can get into loop creation. + * b) race potential - two innocent renames can create a loop together. +- * That's where 4.4 screws up. Current fix: serialization on ++ * That's where 4.4BSD screws up. Current fix: serialization on + * sb->s_vfs_rename_mutex. We might be more accurate, but that's another + * story. +- * c) we have to lock _four_ objects - parents and victim (if it exists), +- * and source. ++ * c) we may have to lock up to _four_ objects - parents and victim (if it exists), ++ * and source (if it's a non-directory or a subdirectory that moves to ++ * different parent). + * And that - after we got ->i_mutex on parents (until then we don't know + * whether the target exists). Solution: try to be smart with locking + * order for inodes. We rely on the fact that tree topology may change +@@ -4293,6 +4288,7 @@ int vfs_rename(struct inode *old_dir, st + bool new_is_dir = false; + unsigned max_links = new_dir->i_sb->s_max_links; + struct name_snapshot old_name; ++ bool lock_old_subdir, lock_new_subdir; + + if (source == target) + return 0; +@@ -4342,15 +4338,32 @@ int vfs_rename(struct inode *old_dir, st + take_dentry_name_snapshot(&old_name, old_dentry); + dget(new_dentry); + /* +- * Lock all moved children. Moved directories may need to change parent +- * pointer so they need the lock to prevent against concurrent +- * directory changes moving parent pointer. For regular files we've +- * historically always done this. The lockdep locking subclasses are +- * somewhat arbitrary but RENAME_EXCHANGE in particular can swap +- * regular files and directories so it's difficult to tell which +- * subclasses to use. ++ * Lock children. ++ * The source subdirectory needs to be locked on cross-directory ++ * rename or cross-directory exchange since its parent changes. ++ * The target subdirectory needs to be locked on cross-directory ++ * exchange due to parent change and on any rename due to becoming ++ * a victim. ++ * Non-directories need locking in all cases (for NFS reasons); ++ * they get locked after any subdirectories (in inode address order). ++ * ++ * NOTE: WE ONLY LOCK UNRELATED DIRECTORIES IN CROSS-DIRECTORY CASE. ++ * NEVER, EVER DO THAT WITHOUT ->s_vfs_rename_mutex. + */ +- lock_two_inodes(source, target, I_MUTEX_NORMAL, I_MUTEX_NONDIR2); ++ lock_old_subdir = new_dir != old_dir; ++ lock_new_subdir = new_dir != old_dir || !(flags & RENAME_EXCHANGE); ++ if (is_dir) { ++ if (lock_old_subdir) ++ inode_lock_nested(source, I_MUTEX_CHILD); ++ if (target && (!new_is_dir || lock_new_subdir)) ++ inode_lock(target); ++ } else if (new_is_dir) { ++ if (lock_new_subdir) ++ inode_lock_nested(target, I_MUTEX_CHILD); ++ inode_lock(source); ++ } else { ++ lock_two_nondirectories(source, target); ++ } + + error = -EBUSY; + if (is_local_mountpoint(old_dentry) || is_local_mountpoint(new_dentry)) +@@ -4394,8 +4407,9 @@ int vfs_rename(struct inode *old_dir, st + d_exchange(old_dentry, new_dentry); + } + out: +- inode_unlock(source); +- if (target) ++ if (!is_dir || lock_old_subdir) ++ inode_unlock(source); ++ if (target && (!new_is_dir || lock_new_subdir)) + inode_unlock(target); + dput(new_dentry); + if (!error) { diff --git a/queue-5.10/series b/queue-5.10/series index 7dde75a01d5..ec12295ee3d 100644 --- a/queue-5.10/series +++ b/queue-5.10/series @@ -34,3 +34,4 @@ mmc-mmc_spi-remove-custom-dma-mapped-buffers.patch rtc-adjust-failure-return-code-for-cmos_set_alarm.patch nouveau-vmm-don-t-set-addr-on-the-fail-path-to-avoid-warning.patch ubifs-ubifs_symlink-fix-memleak-of-inode-i_link-in-error-path.patch +rename-fix-the-locking-of-subdirectories.patch -- 2.47.3