--- /dev/null
+From ac05ca913e9f3871126d61da275bfe8516ff01ca Mon Sep 17 00:00:00 2001
+From: Filipe Manana <fdmanana@suse.com>
+Date: Fri, 31 Jan 2020 14:06:07 +0000
+Subject: Btrfs: fix race between using extent maps and merging them
+
+From: Filipe Manana <fdmanana@suse.com>
+
+commit ac05ca913e9f3871126d61da275bfe8516ff01ca upstream.
+
+We have a few cases where we allow an extent map that is in an extent map
+tree to be merged with other extents in the tree. Such cases include the
+unpinning of an extent after the respective ordered extent completed or
+after logging an extent during a fast fsync. This can lead to subtle and
+dangerous problems because when doing the merge some other task might be
+using the same extent map and as consequence see an inconsistent state of
+the extent map - for example sees the new length but has seen the old start
+offset.
+
+With luck this triggers a BUG_ON(), and not some silent bug, such as the
+following one in __do_readpage():
+
+ $ cat -n fs/btrfs/extent_io.c
+ 3061 static int __do_readpage(struct extent_io_tree *tree,
+ 3062 struct page *page,
+ (...)
+ 3127 em = __get_extent_map(inode, page, pg_offset, cur,
+ 3128 end - cur + 1, get_extent, em_cached);
+ 3129 if (IS_ERR_OR_NULL(em)) {
+ 3130 SetPageError(page);
+ 3131 unlock_extent(tree, cur, end);
+ 3132 break;
+ 3133 }
+ 3134 extent_offset = cur - em->start;
+ 3135 BUG_ON(extent_map_end(em) <= cur);
+ (...)
+
+Consider the following example scenario, where we end up hitting the
+BUG_ON() in __do_readpage().
+
+We have an inode with a size of 8KiB and 2 extent maps:
+
+ extent A: file offset 0, length 4KiB, disk_bytenr = X, persisted on disk by
+ a previous transaction
+
+ extent B: file offset 4KiB, length 4KiB, disk_bytenr = X + 4KiB, not yet
+ persisted but writeback started for it already. The extent map
+ is pinned since there's writeback and an ordered extent in
+ progress, so it can not be merged with extent map A yet
+
+The following sequence of steps leads to the BUG_ON():
+
+1) The ordered extent for extent B completes, the respective page gets its
+ writeback bit cleared and the extent map is unpinned, at that point it
+ is not yet merged with extent map A because it's in the list of modified
+ extents;
+
+2) Due to memory pressure, or some other reason, the MM subsystem releases
+ the page corresponding to extent B - btrfs_releasepage() is called and
+ returns 1, meaning the page can be released as it's not dirty, not under
+ writeback anymore and the extent range is not locked in the inode's
+ iotree. However the extent map is not released, either because we are
+ not in a context that allows memory allocations to block or because the
+ inode's size is smaller than 16MiB - in this case our inode has a size
+ of 8KiB;
+
+3) Task B needs to read extent B and ends up __do_readpage() through the
+ btrfs_readpage() callback. At __do_readpage() it gets a reference to
+ extent map B;
+
+4) Task A, doing a fast fsync, calls clear_em_loggin() against extent map B
+ while holding the write lock on the inode's extent map tree - this
+ results in try_merge_map() being called and since it's possible to merge
+ extent map B with extent map A now (the extent map B was removed from
+ the list of modified extents), the merging begins - it sets extent map
+ B's start offset to 0 (was 4KiB), but before it increments the map's
+ length to 8KiB (4kb + 4KiB), task A is at:
+
+ BUG_ON(extent_map_end(em) <= cur);
+
+ The call to extent_map_end() sees the extent map has a start of 0
+ and a length still at 4KiB, so it returns 4KiB and 'cur' is 4KiB, so
+ the BUG_ON() is triggered.
+
+So it's dangerous to modify an extent map that is in the tree, because some
+other task might have got a reference to it before and still using it, and
+needs to see a consistent map while using it. Generally this is very rare
+since most paths that lookup and use extent maps also have the file range
+locked in the inode's iotree. The fsync path is pretty much the only
+exception where we don't do it to avoid serialization with concurrent
+reads.
+
+Fix this by not allowing an extent map do be merged if if it's being used
+by tasks other then the one attempting to merge the extent map (when the
+reference count of the extent map is greater than 2).
+
+Reported-by: ryusuke1925 <st13s20@gm.ibaraki-ct.ac.jp>
+Reported-by: Koki Mitani <koki.mitani.xg@hco.ntt.co.jp>
+Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206211
+CC: stable@vger.kernel.org # 4.4+
+Reviewed-by: Josef Bacik <josef@toxicpanda.com>
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/extent_map.c | 11 +++++++++++
+ 1 file changed, 11 insertions(+)
+
+--- a/fs/btrfs/extent_map.c
++++ b/fs/btrfs/extent_map.c
+@@ -227,6 +227,17 @@ static void try_merge_map(struct extent_
+ struct extent_map *merge = NULL;
+ struct rb_node *rb;
+
++ /*
++ * We can't modify an extent map that is in the tree and that is being
++ * used by another task, as it can cause that other task to see it in
++ * inconsistent state during the merging. We always have 1 reference for
++ * the tree and 1 for this task (which is unpinning the extent map or
++ * clearing the logging flag), so anything > 2 means it's being used by
++ * other tasks too.
++ */
++ if (refcount_read(&em->refs) > 2)
++ return;
++
+ if (em->start != 0) {
+ rb = rb_prev(&em->rb_node);
+ if (rb)
--- /dev/null
+From 10a3a3edc5b89a8cd095bc63495fb1e0f42047d9 Mon Sep 17 00:00:00 2001
+From: David Sterba <dsterba@suse.com>
+Date: Wed, 5 Feb 2020 17:12:28 +0100
+Subject: btrfs: log message when rw remount is attempted with unclean tree-log
+
+From: David Sterba <dsterba@suse.com>
+
+commit 10a3a3edc5b89a8cd095bc63495fb1e0f42047d9 upstream.
+
+A remount to a read-write filesystem is not safe when there's tree-log
+to be replayed. Files that could be opened until now might be affected
+by the changes in the tree-log.
+
+A regular mount is needed to replay the log so the filesystem presents
+the consistent view with the pending changes included.
+
+CC: stable@vger.kernel.org # 4.4+
+Reviewed-by: Anand Jain <anand.jain@oracle.com>
+Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/super.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/fs/btrfs/super.c
++++ b/fs/btrfs/super.c
+@@ -1702,6 +1702,8 @@ static int btrfs_remount(struct super_bl
+ }
+
+ if (btrfs_super_log_root(fs_info->super_copy) != 0) {
++ btrfs_warn(fs_info,
++ "mount required to replay tree-log, cannot remount read-write");
+ ret = -EINVAL;
+ goto restore;
+ }
--- /dev/null
+From 48a34311953d921235f4d7bbd2111690d2e469cf Mon Sep 17 00:00:00 2001
+From: Jan Kara <jack@suse.cz>
+Date: Mon, 10 Feb 2020 15:43:16 +0100
+Subject: ext4: fix checksum errors with indexed dirs
+
+From: Jan Kara <jack@suse.cz>
+
+commit 48a34311953d921235f4d7bbd2111690d2e469cf upstream.
+
+DIR_INDEX has been introduced as a compat ext4 feature. That means that
+even kernels / tools that don't understand the feature may modify the
+filesystem. This works because for kernels not understanding indexed dir
+format, internal htree nodes appear just as empty directory entries.
+Index dir aware kernels then check the htree structure is still
+consistent before using the data. This all worked reasonably well until
+metadata checksums were introduced. The problem is that these
+effectively made DIR_INDEX only ro-compatible because internal htree
+nodes store checksums in a different place than normal directory blocks.
+Thus any modification ignorant to DIR_INDEX (or just clearing
+EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch
+and trigger kernel errors. So we have to be more careful when dealing
+with indexed directories on filesystems with checksumming enabled.
+
+1) We just disallow loading any directory inodes with EXT4_INDEX_FL when
+DIR_INDEX is not enabled. This is harsh but it should be very rare (it
+means someone disabled DIR_INDEX on existing filesystem and didn't run
+e2fsck), e2fsck can fix the problem, and we don't want to answer the
+difficult question: "Should we rather corrupt the directory more or
+should we ignore that DIR_INDEX feature is not set?"
+
+2) When we find out htree structure is corrupted (but the filesystem and
+the directory should in support htrees), we continue just ignoring htree
+information for reading but we refuse to add new entries to the
+directory to avoid corrupting it more.
+
+Link: https://lore.kernel.org/r/20200210144316.22081-1-jack@suse.cz
+Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes")
+Reviewed-by: Andreas Dilger <adilger@dilger.ca>
+Signed-off-by: Jan Kara <jack@suse.cz>
+Signed-off-by: Theodore Ts'o <tytso@mit.edu>
+Cc: stable@kernel.org
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/ext4/dir.c | 14 ++++++++------
+ fs/ext4/ext4.h | 5 ++++-
+ fs/ext4/inode.c | 12 ++++++++++++
+ fs/ext4/namei.c | 7 +++++++
+ 4 files changed, 31 insertions(+), 7 deletions(-)
+
+--- a/fs/ext4/dir.c
++++ b/fs/ext4/dir.c
+@@ -125,12 +125,14 @@ static int ext4_readdir(struct file *fil
+ if (err != ERR_BAD_DX_DIR) {
+ return err;
+ }
+- /*
+- * We don't set the inode dirty flag since it's not
+- * critical that it get flushed back to the disk.
+- */
+- ext4_clear_inode_flag(file_inode(file),
+- EXT4_INODE_INDEX);
++ /* Can we just clear INDEX flag to ignore htree information? */
++ if (!ext4_has_metadata_csum(sb)) {
++ /*
++ * We don't set the inode dirty flag since it's not
++ * critical that it gets flushed back to the disk.
++ */
++ ext4_clear_inode_flag(inode, EXT4_INODE_INDEX);
++ }
+ }
+
+ if (ext4_has_inline_data(inode)) {
+--- a/fs/ext4/ext4.h
++++ b/fs/ext4/ext4.h
+@@ -2381,8 +2381,11 @@ int ext4_insert_dentry(struct inode *dir
+ struct ext4_filename *fname);
+ static inline void ext4_update_dx_flag(struct inode *inode)
+ {
+- if (!ext4_has_feature_dir_index(inode->i_sb))
++ if (!ext4_has_feature_dir_index(inode->i_sb)) {
++ /* ext4_iget() should have caught this... */
++ WARN_ON_ONCE(ext4_has_feature_metadata_csum(inode->i_sb));
+ ext4_clear_inode_flag(inode, EXT4_INODE_INDEX);
++ }
+ }
+ static unsigned char ext4_filetype_table[] = {
+ DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
+--- a/fs/ext4/inode.c
++++ b/fs/ext4/inode.c
+@@ -4325,6 +4325,18 @@ struct inode *ext4_iget(struct super_blo
+ ret = -EFSCORRUPTED;
+ goto bad_inode;
+ }
++ /*
++ * If dir_index is not enabled but there's dir with INDEX flag set,
++ * we'd normally treat htree data as empty space. But with metadata
++ * checksumming that corrupts checksums so forbid that.
++ */
++ if (!ext4_has_feature_dir_index(sb) && ext4_has_metadata_csum(sb) &&
++ ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) {
++ ext4_error_inode(inode, function, line, 0,
++ "iget: Dir with htree data on filesystem without dir_index feature.");
++ ret = -EFSCORRUPTED;
++ goto bad_inode;
++ }
+ ei->i_disksize = inode->i_size;
+ #ifdef CONFIG_QUOTA
+ ei->i_reserved_quota = 0;
+--- a/fs/ext4/namei.c
++++ b/fs/ext4/namei.c
+@@ -2121,6 +2121,13 @@ static int ext4_add_entry(handle_t *hand
+ retval = ext4_dx_add_entry(handle, &fname, dentry, inode);
+ if (!retval || (retval != ERR_BAD_DX_DIR))
+ goto out;
++ /* Can we just ignore htree data? */
++ if (ext4_has_metadata_csum(sb)) {
++ EXT4_ERROR_INODE(dir,
++ "Directory has corrupted htree index.");
++ retval = -EFSCORRUPTED;
++ goto out;
++ }
+ ext4_clear_inode_flag(dir, EXT4_INODE_INDEX);
+ dx_fallback++;
+ ext4_mark_inode_dirty(handle, dir);
alsa-usb-audio-apply-sample-rate-quirk-for-audioengine-d1.patch
ubifs-fix-deadlock-in-concurrent-bulk-read-and-writepage.patch
+ext4-fix-checksum-errors-with-indexed-dirs.patch
+btrfs-fix-race-between-using-extent-maps-and-merging-them.patch
+btrfs-log-message-when-rw-remount-is-attempted-with-unclean-tree-log.patch