From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Thu, 18 Jun 2020 14:55:22 +0000 (+0200)
Subject: 5.7-stable patches
X-Git-Tag: v4.4.228~53
X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=d80b84061a964f0d259e6287d02b197cd3a20e8d;p=thirdparty%2Fkernel%2Fstable-queue.git

5.7-stable patches

added patches:
	btrfs-fix-a-race-between-scrub-and-block-group-removal-allocation.patch
	btrfs-fix-corrupt-log-due-to-concurrent-fsync-of-inodes-with-shared-extents.patch
	btrfs-fix-error-handling-when-submitting-direct-i-o-bio.patch
	btrfs-fix-space_info-bytes_may_use-underflow-after-nocow-buffered-write.patch
	btrfs-fix-space_info-bytes_may_use-underflow-during-space-cache-writeout.patch
	btrfs-fix-wrong-file-range-cleanup-after-an-error-filling-dealloc-range.patch
	btrfs-force-chunk-allocation-if-our-global-rsv-is-larger-than-metadata.patch
	btrfs-free-alien-device-after-device-add.patch
	btrfs-include-non-missing-as-a-qualifier-for-the-latest_bdev.patch
	btrfs-reloc-fix-reloc-root-leak-and-null-pointer-dereference.patch
	btrfs-send-emit-file-capabilities-after-chown.patch
---

diff --git a/queue-5.7/btrfs-fix-a-race-between-scrub-and-block-group-removal-allocation.patch b/queue-5.7/btrfs-fix-a-race-between-scrub-and-block-group-removal-allocation.patch
new file mode 100644
index 00000000000..3b3dced7fad
--- /dev/null
+++ b/queue-5.7/btrfs-fix-a-race-between-scrub-and-block-group-removal-allocation.patch
@@ -0,0 +1,270 @@
+From 2473d24f2b77da0ffabcbb916793e58e7f57440b Mon Sep 17 00:00:00 2001
+From: Filipe Manana <fdmanana@suse.com>
+Date: Fri, 8 May 2020 11:01:10 +0100
+Subject: btrfs: fix a race between scrub and block group removal/allocation
+
+From: Filipe Manana <fdmanana@suse.com>
+
+commit 2473d24f2b77da0ffabcbb916793e58e7f57440b upstream.
+
+When scrub is verifying the extents of a block group for a device, it is
+possible that the corresponding block group gets removed and its logical
+address and device extents get used for a new block group allocation.
+When this happens scrub incorrectly reports that errors were detected
+and, if the the new block group has a different profile then the old one,
+deleted block group, we can crash due to a null pointer dereference.
+Possibly other unexpected and weird consequences can happen as well.
+
+Consider the following sequence of actions that leads to the null pointer
+dereference crash when scrub is running in parallel with balance:
+
+1) Balance sets block group X to read-only mode and starts relocating it.
+   Block group X is a metadata block group, has a raid1 profile (two
+   device extents, each one in a different device) and a logical address
+   of 19424870400;
+
+2) Scrub is running and finds device extent E, which belongs to block
+   group X. It enters scrub_stripe() to find all extents allocated to
+   block group X, the search is done using the extent tree;
+
+3) Balance finishes relocating block group X and removes block group X;
+
+4) Balance starts relocating another block group and when trying to
+   commit the current transaction as part of the preparation step
+   (prepare_to_relocate()), it blocks because scrub is running;
+
+5) The scrub task finds the metadata extent at the logical address
+   19425001472 and marks the pages of the extent to be read by a bio
+   (struct scrub_bio). The extent item's flags, which have the bit
+   BTRFS_EXTENT_FLAG_TREE_BLOCK set, are added to each page (struct
+   scrub_page). It is these flags in the scrub pages that tells the
+   bio's end io function (scrub_bio_end_io_worker) which type of extent
+   it is dealing with. At this point we end up with 4 pages in a bio
+   which is ready for submission (the metadata extent has a size of
+   16Kb, so that gives 4 pages on x86);
+
+6) At the next iteration of scrub_stripe(), scrub checks that there is a
+   pause request from the relocation task trying to commit a transaction,
+   therefore it submits the pending bio and pauses, waiting for the
+   transaction commit to complete before resuming;
+
+7) The relocation task commits the transaction. The device extent E, that
+   was used by our block group X, is now available for allocation, since
+   the commit root for the device tree was swapped by the transaction
+   commit;
+
+8) Another task doing a direct IO write allocates a new data block group Y
+   which ends using device extent E. This new block group Y also ends up
+   getting the same logical address that block group X had: 19424870400.
+   This happens because block group X was the block group with the highest
+   logical address and, when allocating Y, find_next_chunk() returns the
+   end offset of the current last block group to be used as the logical
+   address for the new block group, which is
+
+        18351128576 + 1073741824 = 19424870400
+
+   So our new block group Y has the same logical address and device extent
+   that block group X had. However Y is a data block group, while X was
+   a metadata one, and Y has a raid0 profile, while X had a raid1 profile;
+
+9) After allocating block group Y, the direct IO submits a bio to write
+   to device extent E;
+
+10) The read bio submitted by scrub reads the 4 pages (16Kb) from device
+    extent E, which now correspond to the data written by the task that
+    did a direct IO write. Then at the end io function associated with
+    the bio, scrub_bio_end_io_worker(), we call scrub_block_complete()
+    which calls scrub_checksum(). This later function checks the flags
+    of the first page, and sees that the bit BTRFS_EXTENT_FLAG_TREE_BLOCK
+    is set in the flags, so it assumes it has a metadata extent and
+    then calls scrub_checksum_tree_block(). That functions returns an
+    error, since interpreting data as a metadata extent causes the
+    checksum verification to fail.
+
+    So this makes scrub_checksum() call scrub_handle_errored_block(),
+    which determines 'failed_mirror_index' to be 1, since the device
+    extent E was allocated as the second mirror of block group X.
+
+    It allocates BTRFS_MAX_MIRRORS scrub_block structures as an array at
+    'sblocks_for_recheck', and all the memory is initialized to zeroes by
+    kcalloc().
+
+    After that it calls scrub_setup_recheck_block(), which is responsible
+    for filling each of those structures. However, when that function
+    calls btrfs_map_sblock() against the logical address of the metadata
+    extent, 19425001472, it gets a struct btrfs_bio ('bbio') that matches
+    the current block group Y. However block group Y has a raid0 profile
+    and not a raid1 profile like X had, so the following call returns 1:
+
+       scrub_nr_raid_mirrors(bbio)
+
+    And as a result scrub_setup_recheck_block() only initializes the
+    first (index 0) scrub_block structure in 'sblocks_for_recheck'.
+
+    Then scrub_recheck_block() is called by scrub_handle_errored_block()
+    with the second (index 1) scrub_block structure as the argument,
+    because 'failed_mirror_index' was previously set to 1.
+    This scrub_block was not initialized by scrub_setup_recheck_block(),
+    so it has zero pages, its 'page_count' member is 0 and its 'pagev'
+    page array has all members pointing to NULL.
+
+    Finally when scrub_recheck_block() calls scrub_recheck_block_checksum()
+    we have a NULL pointer dereference when accessing the flags of the first
+    page, as pavev[0] is NULL:
+
+    static void scrub_recheck_block_checksum(struct scrub_block *sblock)
+    {
+        (...)
+        if (sblock->pagev[0]->flags & BTRFS_EXTENT_FLAG_DATA)
+            scrub_checksum_data(sblock);
+        (...)
+    }
+
+    Producing a stack trace like the following:
+
+    [542998.008985] BUG: kernel NULL pointer dereference, address: 0000000000000028
+    [542998.010238] #PF: supervisor read access in kernel mode
+    [542998.010878] #PF: error_code(0x0000) - not-present page
+    [542998.011516] PGD 0 P4D 0
+    [542998.011929] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
+    [542998.012786] CPU: 3 PID: 4846 Comm: kworker/u8:1 Tainted: G    B   W         5.6.0-rc7-btrfs-next-58 #1
+    [542998.014524] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
+    [542998.016065] Workqueue: btrfs-scrub btrfs_work_helper [btrfs]
+    [542998.017255] RIP: 0010:scrub_recheck_block_checksum+0xf/0x20 [btrfs]
+    [542998.018474] Code: 4c 89 e6 ...
+    [542998.021419] RSP: 0018:ffffa7af0375fbd8 EFLAGS: 00010202
+    [542998.022120] RAX: 0000000000000000 RBX: ffff9792e674d120 RCX: 0000000000000000
+    [542998.023178] RDX: 0000000000000001 RSI: ffff9792e674d120 RDI: ffff9792e674d120
+    [542998.024465] RBP: 0000000000000000 R08: 0000000000000067 R09: 0000000000000001
+    [542998.025462] R10: ffffa7af0375fa50 R11: 0000000000000000 R12: ffff9791f61fe800
+    [542998.026357] R13: ffff9792e674d120 R14: 0000000000000001 R15: ffffffffc0e3dfc0
+    [542998.027237] FS:  0000000000000000(0000) GS:ffff9792fb200000(0000) knlGS:0000000000000000
+    [542998.028327] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+    [542998.029261] CR2: 0000000000000028 CR3: 00000000b3b18003 CR4: 00000000003606e0
+    [542998.030301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
+    [542998.031316] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
+    [542998.032380] Call Trace:
+    [542998.032752]  scrub_recheck_block+0x162/0x400 [btrfs]
+    [542998.033500]  ? __alloc_pages_nodemask+0x31e/0x460
+    [542998.034228]  scrub_handle_errored_block+0x6f8/0x1920 [btrfs]
+    [542998.035170]  scrub_bio_end_io_worker+0x100/0x520 [btrfs]
+    [542998.035991]  btrfs_work_helper+0xaa/0x720 [btrfs]
+    [542998.036735]  process_one_work+0x26d/0x6a0
+    [542998.037275]  worker_thread+0x4f/0x3e0
+    [542998.037740]  ? process_one_work+0x6a0/0x6a0
+    [542998.038378]  kthread+0x103/0x140
+    [542998.038789]  ? kthread_create_worker_on_cpu+0x70/0x70
+    [542998.039419]  ret_from_fork+0x3a/0x50
+    [542998.039875] Modules linked in: dm_snapshot dm_thin_pool ...
+    [542998.047288] CR2: 0000000000000028
+    [542998.047724] ---[ end trace bde186e176c7f96a ]---
+
+This issue has been around for a long time, possibly since scrub exists.
+The last time I ran into it was over 2 years ago. After recently fixing
+fstests to pass the "--full-balance" command line option to btrfs-progs
+when doing balance, several tests started to more heavily exercise balance
+with fsstress, scrub and other operations in parallel, and therefore
+started to hit this issue again (with btrfs/061 for example).
+
+Fix this by having scrub increment the 'trimming' counter of the block
+group, which pins the block group in such a way that it guarantees neither
+its logical address nor device extents can be reused by future block group
+allocations until we decrement the 'trimming' counter. Also make sure that
+on each iteration of scrub_stripe() we stop scrubbing the block group if
+it was removed already.
+
+A later patch in the series will rename the block group's 'trimming'
+counter and its helpers to a more generic name, since now it is not used
+exclusively for pinning while trimming anymore.
+
+CC: stable@vger.kernel.org # 4.4+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/scrub.c |   38 ++++++++++++++++++++++++++++++++++++--
+ 1 file changed, 36 insertions(+), 2 deletions(-)
+
+--- a/fs/btrfs/scrub.c
++++ b/fs/btrfs/scrub.c
+@@ -3046,7 +3046,8 @@ out:
+ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
+ 					   struct map_lookup *map,
+ 					   struct btrfs_device *scrub_dev,
+-					   int num, u64 base, u64 length)
++					   int num, u64 base, u64 length,
++					   struct btrfs_block_group *cache)
+ {
+ 	struct btrfs_path *path, *ppath;
+ 	struct btrfs_fs_info *fs_info = sctx->fs_info;
+@@ -3284,6 +3285,20 @@ static noinline_for_stack int scrub_stri
+ 				break;
+ 			}
+ 
++			/*
++			 * If our block group was removed in the meanwhile, just
++			 * stop scrubbing since there is no point in continuing.
++			 * Continuing would prevent reusing its device extents
++			 * for new block groups for a long time.
++			 */
++			spin_lock(&cache->lock);
++			if (cache->removed) {
++				spin_unlock(&cache->lock);
++				ret = 0;
++				goto out;
++			}
++			spin_unlock(&cache->lock);
++
+ 			extent = btrfs_item_ptr(l, slot,
+ 						struct btrfs_extent_item);
+ 			flags = btrfs_extent_flags(l, extent);
+@@ -3457,7 +3472,7 @@ static noinline_for_stack int scrub_chun
+ 		if (map->stripes[i].dev->bdev == scrub_dev->bdev &&
+ 		    map->stripes[i].physical == dev_offset) {
+ 			ret = scrub_stripe(sctx, map, scrub_dev, i,
+-					   chunk_offset, length);
++					   chunk_offset, length, cache);
+ 			if (ret)
+ 				goto out;
+ 		}
+@@ -3555,6 +3570,23 @@ int scrub_enumerate_chunks(struct scrub_
+ 			goto skip;
+ 
+ 		/*
++		 * Make sure that while we are scrubbing the corresponding block
++		 * group doesn't get its logical address and its device extents
++		 * reused for another block group, which can possibly be of a
++		 * different type and different profile. We do this to prevent
++		 * false error detections and crashes due to bogus attempts to
++		 * repair extents.
++		 */
++		spin_lock(&cache->lock);
++		if (cache->removed) {
++			spin_unlock(&cache->lock);
++			btrfs_put_block_group(cache);
++			goto skip;
++		}
++		btrfs_get_block_group_trimming(cache);
++		spin_unlock(&cache->lock);
++
++		/*
+ 		 * we need call btrfs_inc_block_group_ro() with scrubs_paused,
+ 		 * to avoid deadlock caused by:
+ 		 * btrfs_inc_block_group_ro()
+@@ -3609,6 +3641,7 @@ int scrub_enumerate_chunks(struct scrub_
+ 		} else {
+ 			btrfs_warn(fs_info,
+ 				   "failed setting block group ro: %d", ret);
++			btrfs_put_block_group_trimming(cache);
+ 			btrfs_put_block_group(cache);
+ 			scrub_pause_off(fs_info);
+ 			break;
+@@ -3695,6 +3728,7 @@ int scrub_enumerate_chunks(struct scrub_
+ 			spin_unlock(&cache->lock);
+ 		}
+ 
++		btrfs_put_block_group_trimming(cache);
+ 		btrfs_put_block_group(cache);
+ 		if (ret)
+ 			break;
diff --git a/queue-5.7/btrfs-fix-corrupt-log-due-to-concurrent-fsync-of-inodes-with-shared-extents.patch b/queue-5.7/btrfs-fix-corrupt-log-due-to-concurrent-fsync-of-inodes-with-shared-extents.patch
new file mode 100644
index 00000000000..d9d3eae6641
--- /dev/null
+++ b/queue-5.7/btrfs-fix-corrupt-log-due-to-concurrent-fsync-of-inodes-with-shared-extents.patch
@@ -0,0 +1,316 @@
+From e289f03ea79bbc6574b78ac25682555423a91cbb Mon Sep 17 00:00:00 2001
+From: Filipe Manana <fdmanana@suse.com>
+Date: Mon, 18 May 2020 12:14:50 +0100
+Subject: btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents
+
+From: Filipe Manana <fdmanana@suse.com>
+
+commit e289f03ea79bbc6574b78ac25682555423a91cbb upstream.
+
+When we have extents shared amongst different inodes in the same subvolume,
+if we fsync them in parallel we can end up with checksum items in the log
+tree that represent ranges which overlap.
+
+For example, consider we have inodes A and B, both sharing an extent that
+covers the logical range from X to X + 64KiB:
+
+1) Task A starts an fsync on inode A;
+
+2) Task B starts an fsync on inode B;
+
+3) Task A calls btrfs_csum_file_blocks(), and the first search in the
+   log tree, through btrfs_lookup_csum(), returns -EFBIG because it
+   finds an existing checksum item that covers the range from X - 64KiB
+   to X;
+
+4) Task A checks that the checksum item has not reached the maximum
+   possible size (MAX_CSUM_ITEMS) and then releases the search path
+   before it does another path search for insertion (through a direct
+   call to btrfs_search_slot());
+
+5) As soon as task A releases the path and before it does the search
+   for insertion, task B calls btrfs_csum_file_blocks() and gets -EFBIG
+   too, because there is an existing checksum item that has an end
+   offset that matches the start offset (X) of the checksum range we want
+   to log;
+
+6) Task B releases the path;
+
+7) Task A does the path search for insertion (through btrfs_search_slot())
+   and then verifies that the checksum item that ends at offset X still
+   exists and extends its size to insert the checksums for the range from
+   X to X + 64KiB;
+
+8) Task A releases the path and returns from btrfs_csum_file_blocks(),
+   having inserted the checksums into an existing checksum item that got
+   its size extended. At this point we have one checksum item in the log
+   tree that covers the logical range from X - 64KiB to X + 64KiB;
+
+9) Task B now does a search for insertion using btrfs_search_slot() too,
+   but it finds that the previous checksum item no longer ends at the
+   offset X, it now ends at an of offset X + 64KiB, so it leaves that item
+   untouched.
+
+   Then it releases the path and calls btrfs_insert_empty_item()
+   that inserts a checksum item with a key offset corresponding to X and
+   a size for inserting a single checksum (4 bytes in case of crc32c).
+   Subsequent iterations end up extending this new checksum item so that
+   it contains the checksums for the range from X to X + 64KiB.
+
+   So after task B returns from btrfs_csum_file_blocks() we end up with
+   two checksum items in the log tree that have overlapping ranges, one
+   for the range from X - 64KiB to X + 64KiB, and another for the range
+   from X to X + 64KiB.
+
+Having checksum items that represent ranges which overlap, regardless of
+being in the log tree or in the chekcsums tree, can lead to problems where
+checksums for a file range end up not being found. This type of problem
+has happened a few times in the past and the following commits fixed them
+and explain in detail why having checksum items with overlapping ranges is
+problematic:
+
+  27b9a8122ff71a "Btrfs: fix csum tree corruption, duplicate and outdated checksums"
+  b84b8390d6009c "Btrfs: fix file read corruption after extent cloning and fsync"
+  40e046acbd2f36 "Btrfs: fix missing data checksums after replaying a log tree"
+
+Since this specific instance of the problem can only happen when logging
+inodes, because it is the only case where concurrent attempts to insert
+checksums for the same range can happen, fix the issue by using an extent
+io tree as a range lock to serialize checksum insertion during inode
+logging.
+
+This issue could often be reproduced by the test case generic/457 from
+fstests. When it happens it produces the following trace:
+
+ BTRFS critical (device dm-0): corrupt leaf: root=18446744073709551610 block=30625792 slot=42, csum end range (15020032) goes beyond the start range (15015936) of the next csum item
+ BTRFS info (device dm-0): leaf 30625792 gen 7 total ptrs 49 free space 2402 owner 18446744073709551610
+ BTRFS info (device dm-0): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 15884
+      item 0 key (18446744073709551606 128 13979648) itemoff 3991 itemsize 4
+      item 1 key (18446744073709551606 128 13983744) itemoff 3987 itemsize 4
+      item 2 key (18446744073709551606 128 13987840) itemoff 3983 itemsize 4
+      item 3 key (18446744073709551606 128 13991936) itemoff 3979 itemsize 4
+      item 4 key (18446744073709551606 128 13996032) itemoff 3975 itemsize 4
+      item 5 key (18446744073709551606 128 14000128) itemoff 3971 itemsize 4
+ (...)
+ BTRFS error (device dm-0): block=30625792 write time tree block corruption detected
+ ------------[ cut here ]------------
+ WARNING: CPU: 1 PID: 15884 at fs/btrfs/disk-io.c:539 btree_csum_one_bio+0x268/0x2d0 [btrfs]
+ Modules linked in: btrfs dm_thin_pool ...
+ CPU: 1 PID: 15884 Comm: fsx Tainted: G        W         5.6.0-rc7-btrfs-next-58 #1
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
+ RIP: 0010:btree_csum_one_bio+0x268/0x2d0 [btrfs]
+ Code: c7 c7 ...
+ RSP: 0018:ffffbb0109e6f8e0 EFLAGS: 00010296
+ RAX: 0000000000000000 RBX: ffffe1c0847b6080 RCX: 0000000000000000
+ RDX: 0000000000000000 RSI: ffffffffaa963988 RDI: 0000000000000001
+ RBP: ffff956a4f4d2000 R08: 0000000000000000 R09: 0000000000000001
+ R10: 0000000000000526 R11: 0000000000000000 R12: ffff956a5cd28bb0
+ R13: 0000000000000000 R14: ffff956a649c9388 R15: 000000011ed82000
+ FS:  00007fb419959e80(0000) GS:ffff956a7aa00000(0000) knlGS:0000000000000000
+ CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ CR2: 0000000000fe6d54 CR3: 0000000138696005 CR4: 00000000003606e0
+ DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
+ DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
+ Call Trace:
+  btree_submit_bio_hook+0x67/0xc0 [btrfs]
+  submit_one_bio+0x31/0x50 [btrfs]
+  btree_write_cache_pages+0x2db/0x4b0 [btrfs]
+  ? __filemap_fdatawrite_range+0xb1/0x110
+  do_writepages+0x23/0x80
+  __filemap_fdatawrite_range+0xd2/0x110
+  btrfs_write_marked_extents+0x15e/0x180 [btrfs]
+  btrfs_sync_log+0x206/0x10a0 [btrfs]
+  ? kmem_cache_free+0x315/0x3b0
+  ? btrfs_log_inode+0x1e8/0xf90 [btrfs]
+  ? __mutex_unlock_slowpath+0x45/0x2a0
+  ? lockref_put_or_lock+0x9/0x30
+  ? dput+0x2d/0x580
+  ? dput+0xb5/0x580
+  ? btrfs_sync_file+0x464/0x4d0 [btrfs]
+  btrfs_sync_file+0x464/0x4d0 [btrfs]
+  do_fsync+0x38/0x60
+  __x64_sys_fsync+0x10/0x20
+  do_syscall_64+0x5c/0x280
+  entry_SYSCALL_64_after_hwframe+0x49/0xbe
+ RIP: 0033:0x7fb41953a6d0
+ Code: 48 3d ...
+ RSP: 002b:00007ffcc86bd218 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
+ RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fb41953a6d0
+ RDX: 0000000000000009 RSI: 0000000000040000 RDI: 0000000000000003
+ RBP: 0000000000040000 R08: 0000000000000001 R09: 0000000000000009
+ R10: 0000000000000064 R11: 0000000000000246 R12: 0000556cf4b2c060
+ R13: 0000000000000100 R14: 0000000000000000 R15: 0000556cf322b420
+ irq event stamp: 0
+ hardirqs last  enabled at (0): [<0000000000000000>] 0x0
+ hardirqs last disabled at (0): [<ffffffffa96bdedf>] copy_process+0x74f/0x2020
+ softirqs last  enabled at (0): [<ffffffffa96bdedf>] copy_process+0x74f/0x2020
+ softirqs last disabled at (0): [<0000000000000000>] 0x0
+ ---[ end trace d543fc76f5ad7fd8 ]---
+
+In that trace the tree checker detected the overlapping checksum items at
+the time when we triggered writeback for the log tree when syncing the
+log.
+
+Another trace that can happen is due to BUG_ON() when deleting checksum
+items while logging an inode:
+
+ BTRFS critical (device dm-0): slot 81 key (18446744073709551606 128 13635584) new key (18446744073709551606 128 13635584)
+ BTRFS info (device dm-0): leaf 30949376 gen 7 total ptrs 98 free space 8527 owner 18446744073709551610
+ BTRFS info (device dm-0): refs 4 lock (w:1 r:0 bw:0 br:0 sw:1 sr:0) lock_owner 13473 current 13473
+  item 0 key (257 1 0) itemoff 16123 itemsize 160
+          inode generation 7 size 262144 mode 100600
+  item 1 key (257 12 256) itemoff 16103 itemsize 20
+  item 2 key (257 108 0) itemoff 16050 itemsize 53
+          extent data disk bytenr 13631488 nr 4096
+          extent data offset 0 nr 131072 ram 131072
+ (...)
+ ------------[ cut here ]------------
+ kernel BUG at fs/btrfs/ctree.c:3153!
+ invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
+ CPU: 1 PID: 13473 Comm: fsx Not tainted 5.6.0-rc7-btrfs-next-58 #1
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
+ RIP: 0010:btrfs_set_item_key_safe+0x1ea/0x270 [btrfs]
+ Code: 0f b6 ...
+ RSP: 0018:ffff95e3889179d0 EFLAGS: 00010282
+ RAX: 0000000000000000 RBX: 0000000000000051 RCX: 0000000000000000
+ RDX: 0000000000000000 RSI: ffffffffb7763988 RDI: 0000000000000001
+ RBP: fffffffffffffff6 R08: 0000000000000000 R09: 0000000000000001
+ R10: 00000000000009ef R11: 0000000000000000 R12: ffff8912a8ba5a08
+ R13: ffff95e388917a06 R14: ffff89138dcf68c8 R15: ffff95e388917ace
+ FS:  00007fe587084e80(0000) GS:ffff8913baa00000(0000) knlGS:0000000000000000
+ CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ CR2: 00007fe587091000 CR3: 0000000126dac005 CR4: 00000000003606e0
+ DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
+ DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
+ Call Trace:
+  btrfs_del_csums+0x2f4/0x540 [btrfs]
+  copy_items+0x4b5/0x560 [btrfs]
+  btrfs_log_inode+0x910/0xf90 [btrfs]
+  btrfs_log_inode_parent+0x2a0/0xe40 [btrfs]
+  ? dget_parent+0x5/0x370
+  btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
+  btrfs_sync_file+0x42b/0x4d0 [btrfs]
+  __x64_sys_msync+0x199/0x200
+  do_syscall_64+0x5c/0x280
+  entry_SYSCALL_64_after_hwframe+0x49/0xbe
+ RIP: 0033:0x7fe586c65760
+ Code: 00 f7 ...
+ RSP: 002b:00007ffe250f98b8 EFLAGS: 00000246 ORIG_RAX: 000000000000001a
+ RAX: ffffffffffffffda RBX: 00000000000040e1 RCX: 00007fe586c65760
+ RDX: 0000000000000004 RSI: 0000000000006b51 RDI: 00007fe58708b000
+ RBP: 0000000000006a70 R08: 0000000000000003 R09: 00007fe58700cb61
+ R10: 0000000000000100 R11: 0000000000000246 R12: 00000000000000e1
+ R13: 00007fe58708b000 R14: 0000000000006b51 R15: 0000558de021a420
+ Modules linked in: dm_log_writes ...
+ ---[ end trace c92a7f447a8515f5 ]---
+
+CC: stable@vger.kernel.org # 4.4+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/ctree.h             |    3 +++
+ fs/btrfs/disk-io.c           |    5 ++++-
+ fs/btrfs/extent-io-tree.h    |    1 +
+ fs/btrfs/tree-log.c          |   22 +++++++++++++++++++---
+ include/trace/events/btrfs.h |    1 +
+ 5 files changed, 28 insertions(+), 4 deletions(-)
+
+--- a/fs/btrfs/ctree.h
++++ b/fs/btrfs/ctree.h
+@@ -1146,6 +1146,9 @@ struct btrfs_root {
+ 	/* Record pairs of swapped blocks for qgroup */
+ 	struct btrfs_qgroup_swapped_blocks swapped_blocks;
+ 
++	/* Used only by log trees, when logging csum items */
++	struct extent_io_tree log_csum_range;
++
+ #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
+ 	u64 alloc_bytenr;
+ #endif
+--- a/fs/btrfs/disk-io.c
++++ b/fs/btrfs/disk-io.c
+@@ -1137,9 +1137,12 @@ static void __setup_root(struct btrfs_ro
+ 	root->log_transid = 0;
+ 	root->log_transid_committed = -1;
+ 	root->last_log_commit = 0;
+-	if (!dummy)
++	if (!dummy) {
+ 		extent_io_tree_init(fs_info, &root->dirty_log_pages,
+ 				    IO_TREE_ROOT_DIRTY_LOG_PAGES, NULL);
++		extent_io_tree_init(fs_info, &root->log_csum_range,
++				    IO_TREE_LOG_CSUM_RANGE, NULL);
++	}
+ 
+ 	memset(&root->root_key, 0, sizeof(root->root_key));
+ 	memset(&root->root_item, 0, sizeof(root->root_item));
+--- a/fs/btrfs/extent-io-tree.h
++++ b/fs/btrfs/extent-io-tree.h
+@@ -44,6 +44,7 @@ enum {
+ 	IO_TREE_TRANS_DIRTY_PAGES,
+ 	IO_TREE_ROOT_DIRTY_LOG_PAGES,
+ 	IO_TREE_INODE_FILE_EXTENT,
++	IO_TREE_LOG_CSUM_RANGE,
+ 	IO_TREE_SELFTEST,
+ };
+ 
+--- a/fs/btrfs/tree-log.c
++++ b/fs/btrfs/tree-log.c
+@@ -3299,6 +3299,7 @@ static void free_log_tree(struct btrfs_t
+ 
+ 	clear_extent_bits(&log->dirty_log_pages, 0, (u64)-1,
+ 			  EXTENT_DIRTY | EXTENT_NEW | EXTENT_NEED_WAIT);
++	extent_io_tree_release(&log->log_csum_range);
+ 	btrfs_put_root(log);
+ }
+ 
+@@ -3916,9 +3917,21 @@ static int log_csums(struct btrfs_trans_
+ 		     struct btrfs_root *log_root,
+ 		     struct btrfs_ordered_sum *sums)
+ {
++	const u64 lock_end = sums->bytenr + sums->len - 1;
++	struct extent_state *cached_state = NULL;
+ 	int ret;
+ 
+ 	/*
++	 * Serialize logging for checksums. This is to avoid racing with the
++	 * same checksum being logged by another task that is logging another
++	 * file which happens to refer to the same extent as well. Such races
++	 * can leave checksum items in the log with overlapping ranges.
++	 */
++	ret = lock_extent_bits(&log_root->log_csum_range, sums->bytenr,
++			       lock_end, &cached_state);
++	if (ret)
++		return ret;
++	/*
+ 	 * Due to extent cloning, we might have logged a csum item that covers a
+ 	 * subrange of a cloned extent, and later we can end up logging a csum
+ 	 * item for a larger subrange of the same extent or the entire range.
+@@ -3928,10 +3941,13 @@ static int log_csums(struct btrfs_trans_
+ 	 * trim and adjust) any existing csum items in the log for this range.
+ 	 */
+ 	ret = btrfs_del_csums(trans, log_root, sums->bytenr, sums->len);
+-	if (ret)
+-		return ret;
++	if (!ret)
++		ret = btrfs_csum_file_blocks(trans, log_root, sums);
+ 
+-	return btrfs_csum_file_blocks(trans, log_root, sums);
++	unlock_extent_cached(&log_root->log_csum_range, sums->bytenr, lock_end,
++			     &cached_state);
++
++	return ret;
+ }
+ 
+ static noinline int copy_items(struct btrfs_trans_handle *trans,
+--- a/include/trace/events/btrfs.h
++++ b/include/trace/events/btrfs.h
+@@ -89,6 +89,7 @@ TRACE_DEFINE_ENUM(COMMIT_TRANS);
+ 		{ IO_TREE_TRANS_DIRTY_PAGES,	  "TRANS_DIRTY_PAGES" },       \
+ 		{ IO_TREE_ROOT_DIRTY_LOG_PAGES,	  "ROOT_DIRTY_LOG_PAGES" },    \
+ 		{ IO_TREE_INODE_FILE_EXTENT,	  "INODE_FILE_EXTENT" },       \
++		{ IO_TREE_LOG_CSUM_RANGE,	  "LOG_CSUM_RANGE" },          \
+ 		{ IO_TREE_SELFTEST,		  "SELFTEST" })
+ 
+ #define BTRFS_GROUP_FLAGS	\
diff --git a/queue-5.7/btrfs-fix-error-handling-when-submitting-direct-i-o-bio.patch b/queue-5.7/btrfs-fix-error-handling-when-submitting-direct-i-o-bio.patch
new file mode 100644
index 00000000000..dd8cefb2c7b
--- /dev/null
+++ b/queue-5.7/btrfs-fix-error-handling-when-submitting-direct-i-o-bio.patch
@@ -0,0 +1,63 @@
+From 6d3113a193e3385c72240096fe397618ecab6e43 Mon Sep 17 00:00:00 2001
+From: Omar Sandoval <osandov@fb.com>
+Date: Thu, 16 Apr 2020 14:46:12 -0700
+Subject: btrfs: fix error handling when submitting direct I/O bio
+
+From: Omar Sandoval <osandov@fb.com>
+
+commit 6d3113a193e3385c72240096fe397618ecab6e43 upstream.
+
+In btrfs_submit_direct_hook(), if a direct I/O write doesn't span a RAID
+stripe or chunk, we submit orig_bio without cloning it. In this case, we
+don't increment pending_bios. Then, if btrfs_submit_dio_bio() fails, we
+decrement pending_bios to -1, and we never complete orig_bio. Fix it by
+initializing pending_bios to 1 instead of incrementing later.
+
+Fixing this exposes another bug: we put orig_bio prematurely and then
+put it again from end_io. Fix it by not putting orig_bio.
+
+After this change, pending_bios is really more of a reference count, but
+I'll leave that cleanup separate to keep the fix small.
+
+Fixes: e65e15355429 ("btrfs: fix panic caused by direct IO")
+CC: stable@vger.kernel.org # 4.4+
+Reviewed-by: Nikolay Borisov <nborisov@suse.com>
+Reviewed-by: Josef Bacik <josef@toxicpanda.com>
+Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
+Signed-off-by: Omar Sandoval <osandov@fb.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/inode.c |    6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+--- a/fs/btrfs/inode.c
++++ b/fs/btrfs/inode.c
+@@ -7939,7 +7939,6 @@ static int btrfs_submit_direct_hook(stru
+ 
+ 	/* bio split */
+ 	ASSERT(geom.len <= INT_MAX);
+-	atomic_inc(&dip->pending_bios);
+ 	do {
+ 		clone_len = min_t(int, submit_len, geom.len);
+ 
+@@ -7989,7 +7988,8 @@ submit:
+ 	if (!status)
+ 		return 0;
+ 
+-	bio_put(bio);
++	if (bio != orig_bio)
++		bio_put(bio);
+ out_err:
+ 	dip->errors = 1;
+ 	/*
+@@ -8030,7 +8030,7 @@ static void btrfs_submit_direct(struct b
+ 	bio->bi_private = dip;
+ 	dip->orig_bio = bio;
+ 	dip->dio_bio = dio_bio;
+-	atomic_set(&dip->pending_bios, 0);
++	atomic_set(&dip->pending_bios, 1);
+ 	io_bio = btrfs_io_bio(bio);
+ 	io_bio->logical = file_offset;
+ 
diff --git a/queue-5.7/btrfs-fix-space_info-bytes_may_use-underflow-after-nocow-buffered-write.patch b/queue-5.7/btrfs-fix-space_info-bytes_may_use-underflow-after-nocow-buffered-write.patch
new file mode 100644
index 00000000000..e1841e16f07
--- /dev/null
+++ b/queue-5.7/btrfs-fix-space_info-bytes_may_use-underflow-after-nocow-buffered-write.patch
@@ -0,0 +1,197 @@
+From 467dc47ea99c56e966e99d09dae54869850abeeb Mon Sep 17 00:00:00 2001
+From: Filipe Manana <fdmanana@suse.com>
+Date: Wed, 27 May 2020 11:16:07 +0100
+Subject: btrfs: fix space_info bytes_may_use underflow after nocow buffered write
+
+From: Filipe Manana <fdmanana@suse.com>
+
+commit 467dc47ea99c56e966e99d09dae54869850abeeb upstream.
+
+When doing a buffered write we always try to reserve data space for it,
+even when the file has the NOCOW bit set or the write falls into a file
+range covered by a prealloc extent. This is done both because it is
+expensive to check if we can do a nocow write (checking if an extent is
+shared through reflinks or if there's a hole in the range for example),
+and because when writeback starts we might actually need to fallback to
+COW mode (for example the block group containing the target extents was
+turned into RO mode due to a scrub or balance).
+
+When we are unable to reserve data space we check if we can do a nocow
+write, and if we can, we proceed with dirtying the pages and setting up
+the range for delalloc. In this case the bytes_may_use counter of the
+data space_info object is not incremented, unlike in the case where we
+are able to reserve data space (done through btrfs_check_data_free_space()
+which calls btrfs_alloc_data_chunk_ondemand()).
+
+Later when running delalloc we attempt to start writeback in nocow mode
+but we might revert back to cow mode, for example because in the meanwhile
+a block group was turned into RO mode by a scrub or relocation. The cow
+path after successfully allocating an extent ends up calling
+btrfs_add_reserved_bytes(), which expects the bytes_may_use counter of
+the data space_info object to have been incremented before - but we did
+not do it when the buffered write started, since there was not enough
+available data space. So btrfs_add_reserved_bytes() ends up decrementing
+the bytes_may_use counter anyway, and when the counter's current value
+is smaller then the size of the allocated extent we get a stack trace
+like the following:
+
+ ------------[ cut here ]------------
+ WARNING: CPU: 0 PID: 20138 at fs/btrfs/space-info.h:115 btrfs_add_reserved_bytes+0x3d6/0x4e0 [btrfs]
+ Modules linked in: btrfs blake2b_generic xor raid6_pq libcrc32c (...)
+ CPU: 0 PID: 20138 Comm: kworker/u8:15 Not tainted 5.6.0-rc7-btrfs-next-58 #5
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
+ Workqueue: writeback wb_workfn (flush-btrfs-1754)
+ RIP: 0010:btrfs_add_reserved_bytes+0x3d6/0x4e0 [btrfs]
+ Code: ff ff 48 (...)
+ RSP: 0018:ffffbda18a4b3568 EFLAGS: 00010287
+ RAX: 0000000000000000 RBX: ffff9ca076f5d800 RCX: 0000000000000000
+ RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9ca068470410
+ RBP: fffffffffffff000 R08: 0000000000000001 R09: 0000000000000000
+ R10: ffff9ca079d58040 R11: 0000000000000000 R12: ffff9ca068470400
+ R13: ffff9ca0408b2000 R14: 0000000000001000 R15: ffff9ca076f5d800
+ FS:  0000000000000000(0000) GS:ffff9ca07a600000(0000) knlGS:0000000000000000
+ CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ CR2: 00005605dbfe7048 CR3: 0000000138570006 CR4: 00000000003606f0
+ DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
+ DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
+ Call Trace:
+  find_free_extent+0x4a0/0x16c0 [btrfs]
+  btrfs_reserve_extent+0x91/0x180 [btrfs]
+  cow_file_range+0x12d/0x490 [btrfs]
+  run_delalloc_nocow+0x341/0xa40 [btrfs]
+  btrfs_run_delalloc_range+0x1ea/0x6d0 [btrfs]
+  ? find_lock_delalloc_range+0x221/0x250 [btrfs]
+  writepage_delalloc+0xe8/0x150 [btrfs]
+  __extent_writepage+0xe8/0x4c0 [btrfs]
+  extent_write_cache_pages+0x237/0x530 [btrfs]
+  ? btrfs_wq_submit_bio+0x9f/0xc0 [btrfs]
+  extent_writepages+0x44/0xa0 [btrfs]
+  do_writepages+0x23/0x80
+  __writeback_single_inode+0x59/0x700
+  writeback_sb_inodes+0x267/0x5f0
+  __writeback_inodes_wb+0x87/0xe0
+  wb_writeback+0x382/0x590
+  ? wb_workfn+0x4a2/0x6c0
+  wb_workfn+0x4a2/0x6c0
+  process_one_work+0x26d/0x6a0
+  worker_thread+0x4f/0x3e0
+  ? process_one_work+0x6a0/0x6a0
+  kthread+0x103/0x140
+  ? kthread_create_worker_on_cpu+0x70/0x70
+  ret_from_fork+0x3a/0x50
+ irq event stamp: 0
+ hardirqs last  enabled at (0): [<0000000000000000>] 0x0
+ hardirqs last disabled at (0): [<ffffffff94ebdedf>] copy_process+0x74f/0x2020
+ softirqs last  enabled at (0): [<ffffffff94ebdedf>] copy_process+0x74f/0x2020
+ softirqs last disabled at (0): [<0000000000000000>] 0x0
+ ---[ end trace f9f6ef8ec4cd8ec9 ]---
+
+So to fix this, when falling back into cow mode check if space was not
+reserved, by testing for the bit EXTENT_NORESERVE in the respective file
+range, and if not, increment the bytes_may_use counter for the data
+space_info object. Also clear the EXTENT_NORESERVE bit from the range, so
+that if the cow path fails it decrements the bytes_may_use counter when
+clearing the delalloc range (through the btrfs_clear_delalloc_extent()
+callback).
+
+Fixes: 7ee9e4405f264e ("Btrfs: check if we can nocow if we don't have data space")
+CC: stable@vger.kernel.org # 4.4+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/inode.c |   61 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
+ 1 file changed, 56 insertions(+), 5 deletions(-)
+
+--- a/fs/btrfs/inode.c
++++ b/fs/btrfs/inode.c
+@@ -49,6 +49,7 @@
+ #include "qgroup.h"
+ #include "delalloc-space.h"
+ #include "block-group.h"
++#include "space-info.h"
+ 
+ struct btrfs_iget_args {
+ 	struct btrfs_key *location;
+@@ -1355,6 +1356,56 @@ static noinline int csum_exist_in_range(
+ 	return 1;
+ }
+ 
++static int fallback_to_cow(struct inode *inode, struct page *locked_page,
++			   const u64 start, const u64 end,
++			   int *page_started, unsigned long *nr_written)
++{
++	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
++	u64 range_start = start;
++	u64 count;
++
++	/*
++	 * If EXTENT_NORESERVE is set it means that when the buffered write was
++	 * made we had not enough available data space and therefore we did not
++	 * reserve data space for it, since we though we could do NOCOW for the
++	 * respective file range (either there is prealloc extent or the inode
++	 * has the NOCOW bit set).
++	 *
++	 * However when we need to fallback to COW mode (because for example the
++	 * block group for the corresponding extent was turned to RO mode by a
++	 * scrub or relocation) we need to do the following:
++	 *
++	 * 1) We increment the bytes_may_use counter of the data space info.
++	 *    If COW succeeds, it allocates a new data extent and after doing
++	 *    that it decrements the space info's bytes_may_use counter and
++	 *    increments its bytes_reserved counter by the same amount (we do
++	 *    this at btrfs_add_reserved_bytes()). So we need to increment the
++	 *    bytes_may_use counter to compensate (when space is reserved at
++	 *    buffered write time, the bytes_may_use counter is incremented);
++	 *
++	 * 2) We clear the EXTENT_NORESERVE bit from the range. We do this so
++	 *    that if the COW path fails for any reason, it decrements (through
++	 *    extent_clear_unlock_delalloc()) the bytes_may_use counter of the
++	 *    data space info, which we incremented in the step above.
++	 */
++	count = count_range_bits(io_tree, &range_start, end, end + 1 - start,
++				 EXTENT_NORESERVE, 0);
++	if (count > 0) {
++		struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
++		struct btrfs_space_info *sinfo = fs_info->data_sinfo;
++
++		spin_lock(&sinfo->lock);
++		btrfs_space_info_update_bytes_may_use(fs_info, sinfo, count);
++		spin_unlock(&sinfo->lock);
++
++		clear_extent_bit(io_tree, start, end, EXTENT_NORESERVE, 0, 0,
++				 NULL);
++	}
++
++	return cow_file_range(inode, locked_page, start, end, page_started,
++			      nr_written, 1);
++}
++
+ /*
+  * when nowcow writeback call back.  This checks for snapshots or COW copies
+  * of the extents that exist in the file, and COWs the file as required.
+@@ -1602,9 +1653,9 @@ out_check:
+ 		 * NOCOW, following one which needs to be COW'ed
+ 		 */
+ 		if (cow_start != (u64)-1) {
+-			ret = cow_file_range(inode, locked_page,
+-					     cow_start, found_key.offset - 1,
+-					     page_started, nr_written, 1);
++			ret = fallback_to_cow(inode, locked_page, cow_start,
++					      found_key.offset - 1,
++					      page_started, nr_written);
+ 			if (ret) {
+ 				if (nocow)
+ 					btrfs_dec_nocow_writers(fs_info,
+@@ -1693,8 +1744,8 @@ out_check:
+ 
+ 	if (cow_start != (u64)-1) {
+ 		cur_offset = end;
+-		ret = cow_file_range(inode, locked_page, cow_start, end,
+-				     page_started, nr_written, 1);
++		ret = fallback_to_cow(inode, locked_page, cow_start, end,
++				      page_started, nr_written);
+ 		if (ret)
+ 			goto error;
+ 	}
diff --git a/queue-5.7/btrfs-fix-space_info-bytes_may_use-underflow-during-space-cache-writeout.patch b/queue-5.7/btrfs-fix-space_info-bytes_may_use-underflow-during-space-cache-writeout.patch
new file mode 100644
index 00000000000..4339934d574
--- /dev/null
+++ b/queue-5.7/btrfs-fix-space_info-bytes_may_use-underflow-during-space-cache-writeout.patch
@@ -0,0 +1,146 @@
+From 2166e5edce9ac1edf3b113d6091ef72fcac2d6c4 Mon Sep 17 00:00:00 2001
+From: Filipe Manana <fdmanana@suse.com>
+Date: Wed, 27 May 2020 11:16:19 +0100
+Subject: btrfs: fix space_info bytes_may_use underflow during space cache writeout
+
+From: Filipe Manana <fdmanana@suse.com>
+
+commit 2166e5edce9ac1edf3b113d6091ef72fcac2d6c4 upstream.
+
+We always preallocate a data extent for writing a free space cache, which
+causes writeback to always try the nocow path first, since the free space
+inode has the prealloc bit set in its flags.
+
+However if the block group that contains the data extent for the space
+cache has been turned to RO mode due to a running scrub or balance for
+example, we have to fallback to the cow path. In that case once a new data
+extent is allocated we end up calling btrfs_add_reserved_bytes(), which
+decrements the counter named bytes_may_use from the data space_info object
+with the expection that this counter was previously incremented with the
+same amount (the size of the data extent).
+
+However when we started writeout of the space cache at cache_save_setup(),
+we incremented the value of the bytes_may_use counter through a call to
+btrfs_check_data_free_space() and then decremented it through a call to
+btrfs_prealloc_file_range_trans() immediately after. So when starting the
+writeback if we fallback to cow mode we have to increment the counter
+bytes_may_use of the data space_info again to compensate for the extent
+allocation done by the cow path.
+
+When this issue happens we are incorrectly decrementing the bytes_may_use
+counter and when its current value is smaller then the amount we try to
+subtract we end up with the following warning:
+
+ ------------[ cut here ]------------
+ WARNING: CPU: 3 PID: 657 at fs/btrfs/space-info.h:115 btrfs_add_reserved_bytes+0x3d6/0x4e0 [btrfs]
+ Modules linked in: btrfs blake2b_generic xor raid6_pq libcrc32c (...)
+ CPU: 3 PID: 657 Comm: kworker/u8:7 Tainted: G        W         5.6.0-rc7-btrfs-next-58 #5
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
+ Workqueue: writeback wb_workfn (flush-btrfs-1591)
+ RIP: 0010:btrfs_add_reserved_bytes+0x3d6/0x4e0 [btrfs]
+ Code: ff ff 48 (...)
+ RSP: 0000:ffffa41608f13660 EFLAGS: 00010287
+ RAX: 0000000000001000 RBX: ffff9615b93ae400 RCX: 0000000000000000
+ RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff9615b96ab410
+ RBP: fffffffffffee000 R08: 0000000000000001 R09: 0000000000000000
+ R10: ffff961585e62a40 R11: 0000000000000000 R12: ffff9615b96ab400
+ R13: ffff9615a1a2a000 R14: 0000000000012000 R15: ffff9615b93ae400
+ FS:  0000000000000000(0000) GS:ffff9615bb200000(0000) knlGS:0000000000000000
+ CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ CR2: 000055cbbc2ae178 CR3: 0000000115794006 CR4: 00000000003606e0
+ DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
+ DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
+ Call Trace:
+  find_free_extent+0x4a0/0x16c0 [btrfs]
+  btrfs_reserve_extent+0x91/0x180 [btrfs]
+  cow_file_range+0x12d/0x490 [btrfs]
+  btrfs_run_delalloc_range+0x9f/0x6d0 [btrfs]
+  ? find_lock_delalloc_range+0x221/0x250 [btrfs]
+  writepage_delalloc+0xe8/0x150 [btrfs]
+  __extent_writepage+0xe8/0x4c0 [btrfs]
+  extent_write_cache_pages+0x237/0x530 [btrfs]
+  extent_writepages+0x44/0xa0 [btrfs]
+  do_writepages+0x23/0x80
+  __writeback_single_inode+0x59/0x700
+  writeback_sb_inodes+0x267/0x5f0
+  __writeback_inodes_wb+0x87/0xe0
+  wb_writeback+0x382/0x590
+  ? wb_workfn+0x4a2/0x6c0
+  wb_workfn+0x4a2/0x6c0
+  process_one_work+0x26d/0x6a0
+  worker_thread+0x4f/0x3e0
+  ? process_one_work+0x6a0/0x6a0
+  kthread+0x103/0x140
+  ? kthread_create_worker_on_cpu+0x70/0x70
+  ret_from_fork+0x3a/0x50
+ irq event stamp: 0
+ hardirqs last  enabled at (0): [<0000000000000000>] 0x0
+ hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
+ softirqs last  enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
+ softirqs last disabled at (0): [<0000000000000000>] 0x0
+ ---[ end trace bd7c03622e0b0a52 ]---
+ ------------[ cut here ]------------
+
+So fix this by incrementing the bytes_may_use counter of the data
+space_info when we fallback to the cow path. If the cow path is successful
+the counter is decremented after extent allocation (by
+btrfs_add_reserved_bytes()), if it fails it ends up being decremented as
+well when clearing the delalloc range (extent_clear_unlock_delalloc()).
+
+This could be triggered sporadically by the test case btrfs/061 from
+fstests.
+
+Fixes: 82d5902d9c681b ("Btrfs: Support reading/writing on disk free ino cache")
+CC: stable@vger.kernel.org # 4.4+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/inode.c |   20 +++++++++++++++-----
+ 1 file changed, 15 insertions(+), 5 deletions(-)
+
+--- a/fs/btrfs/inode.c
++++ b/fs/btrfs/inode.c
+@@ -1360,6 +1360,8 @@ static int fallback_to_cow(struct inode
+ 			   const u64 start, const u64 end,
+ 			   int *page_started, unsigned long *nr_written)
+ {
++	const bool is_space_ino = btrfs_is_free_space_inode(BTRFS_I(inode));
++	const u64 range_bytes = end + 1 - start;
+ 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+ 	u64 range_start = start;
+ 	u64 count;
+@@ -1387,19 +1389,27 @@ static int fallback_to_cow(struct inode
+ 	 *    that if the COW path fails for any reason, it decrements (through
+ 	 *    extent_clear_unlock_delalloc()) the bytes_may_use counter of the
+ 	 *    data space info, which we incremented in the step above.
++	 *
++	 * If we need to fallback to cow and the inode corresponds to a free
++	 * space cache inode, we must also increment bytes_may_use of the data
++	 * space_info for the same reason. Space caches always get a prealloc
++	 * extent for them, however scrub or balance may have set the block
++	 * group that contains that extent to RO mode.
+ 	 */
+-	count = count_range_bits(io_tree, &range_start, end, end + 1 - start,
++	count = count_range_bits(io_tree, &range_start, end, range_bytes,
+ 				 EXTENT_NORESERVE, 0);
+-	if (count > 0) {
++	if (count > 0 || is_space_ino) {
++		const u64 bytes = is_space_ino ? range_bytes : count;
+ 		struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+ 		struct btrfs_space_info *sinfo = fs_info->data_sinfo;
+ 
+ 		spin_lock(&sinfo->lock);
+-		btrfs_space_info_update_bytes_may_use(fs_info, sinfo, count);
++		btrfs_space_info_update_bytes_may_use(fs_info, sinfo, bytes);
+ 		spin_unlock(&sinfo->lock);
+ 
+-		clear_extent_bit(io_tree, start, end, EXTENT_NORESERVE, 0, 0,
+-				 NULL);
++		if (count > 0)
++			clear_extent_bit(io_tree, start, end, EXTENT_NORESERVE,
++					 0, 0, NULL);
+ 	}
+ 
+ 	return cow_file_range(inode, locked_page, start, end, page_started,
diff --git a/queue-5.7/btrfs-fix-wrong-file-range-cleanup-after-an-error-filling-dealloc-range.patch b/queue-5.7/btrfs-fix-wrong-file-range-cleanup-after-an-error-filling-dealloc-range.patch
new file mode 100644
index 00000000000..b75bc5a5967
--- /dev/null
+++ b/queue-5.7/btrfs-fix-wrong-file-range-cleanup-after-an-error-filling-dealloc-range.patch
@@ -0,0 +1,39 @@
+From e2c8e92d1140754073ad3799eb6620c76bab2078 Mon Sep 17 00:00:00 2001
+From: Filipe Manana <fdmanana@suse.com>
+Date: Wed, 27 May 2020 11:15:53 +0100
+Subject: btrfs: fix wrong file range cleanup after an error filling dealloc range
+
+From: Filipe Manana <fdmanana@suse.com>
+
+commit e2c8e92d1140754073ad3799eb6620c76bab2078 upstream.
+
+If an error happens while running dellaloc in COW mode for a range, we can
+end up calling extent_clear_unlock_delalloc() for a range that goes beyond
+our range's end offset by 1 byte, which affects 1 extra page. This results
+in clearing bits and doing page operations (such as a page unlock) outside
+our target range.
+
+Fix that by calling extent_clear_unlock_delalloc() with an inclusive end
+offset, instead of an exclusive end offset, at cow_file_range().
+
+Fixes: a315e68f6e8b30 ("Btrfs: fix invalid attempt to free reserved space on failure to cow range")
+CC: stable@vger.kernel.org # 4.14+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/inode.c |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/fs/btrfs/inode.c
++++ b/fs/btrfs/inode.c
+@@ -1142,7 +1142,7 @@ out_unlock:
+ 	 */
+ 	if (extent_reserved) {
+ 		extent_clear_unlock_delalloc(inode, start,
+-					     start + cur_alloc_size,
++					     start + cur_alloc_size - 1,
+ 					     locked_page,
+ 					     clear_bits,
+ 					     page_ops);
diff --git a/queue-5.7/btrfs-force-chunk-allocation-if-our-global-rsv-is-larger-than-metadata.patch b/queue-5.7/btrfs-force-chunk-allocation-if-our-global-rsv-is-larger-than-metadata.patch
new file mode 100644
index 00000000000..0a877ff412a
--- /dev/null
+++ b/queue-5.7/btrfs-force-chunk-allocation-if-our-global-rsv-is-larger-than-metadata.patch
@@ -0,0 +1,119 @@
+From 9c343784c4328781129bcf9e671645f69fe4b38a Mon Sep 17 00:00:00 2001
+From: Josef Bacik <josef@toxicpanda.com>
+Date: Fri, 13 Mar 2020 15:28:48 -0400
+Subject: btrfs: force chunk allocation if our global rsv is larger than metadata
+
+From: Josef Bacik <josef@toxicpanda.com>
+
+commit 9c343784c4328781129bcf9e671645f69fe4b38a upstream.
+
+Nikolay noticed a bunch of test failures with my global rsv steal
+patches.  At first he thought they were introduced by them, but they've
+been failing for a while with 64k nodes.
+
+The problem is with 64k nodes we have a global reserve that calculates
+out to 13MiB on a freshly made file system, which only has 8MiB of
+metadata space.  Because of changes I previously made we no longer
+account for the global reserve in the overcommit logic, which means we
+correctly allow overcommit to happen even though we are already
+overcommitted.
+
+However in some corner cases, for example btrfs/170, we will allocate
+the entire file system up with data chunks before we have enough space
+pressure to allocate a metadata chunk.  Then once the fs is full we
+ENOSPC out because we cannot overcommit and the global reserve is taking
+up all of the available space.
+
+The most ideal way to deal with this is to change our space reservation
+stuff to take into account the height of the tree's that we're
+modifying, so that our global reserve calculation does not end up so
+obscenely large.
+
+However that is a huge undertaking.  Instead fix this by forcing a chunk
+allocation if the global reserve is larger than the total metadata
+space.  This gives us essentially the same behavior that happened
+before, we get a chunk allocated and these tests can pass.
+
+This is meant to be a stop-gap measure until we can tackle the "tree
+height only" project.
+
+Fixes: 0096420adb03 ("btrfs: do not account global reserve in can_overcommit")
+CC: stable@vger.kernel.org # 5.4+
+Reviewed-by: Nikolay Borisov <nborisov@suse.com>
+Tested-by: Nikolay Borisov <nborisov@suse.com>
+Signed-off-by: Josef Bacik <josef@toxicpanda.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/block-rsv.c   |    3 +++
+ fs/btrfs/transaction.c |   18 ++++++++++++++++++
+ 2 files changed, 21 insertions(+)
+
+--- a/fs/btrfs/block-rsv.c
++++ b/fs/btrfs/block-rsv.c
+@@ -5,6 +5,7 @@
+ #include "block-rsv.h"
+ #include "space-info.h"
+ #include "transaction.h"
++#include "block-group.h"
+ 
+ /*
+  * HOW DO BLOCK RESERVES WORK
+@@ -405,6 +406,8 @@ void btrfs_update_global_block_rsv(struc
+ 	else
+ 		block_rsv->full = 0;
+ 
++	if (block_rsv->size >= sinfo->total_bytes)
++		sinfo->force_alloc = CHUNK_ALLOC_FORCE;
+ 	spin_unlock(&block_rsv->lock);
+ 	spin_unlock(&sinfo->lock);
+ }
+--- a/fs/btrfs/transaction.c
++++ b/fs/btrfs/transaction.c
+@@ -21,6 +21,7 @@
+ #include "dev-replace.h"
+ #include "qgroup.h"
+ #include "block-group.h"
++#include "space-info.h"
+ 
+ #define BTRFS_ROOT_TRANS_TAG 0
+ 
+@@ -523,6 +524,7 @@ start_transaction(struct btrfs_root *roo
+ 	u64 num_bytes = 0;
+ 	u64 qgroup_reserved = 0;
+ 	bool reloc_reserved = false;
++	bool do_chunk_alloc = false;
+ 	int ret;
+ 
+ 	/* Send isn't supposed to start transactions. */
+@@ -585,6 +587,9 @@ start_transaction(struct btrfs_root *roo
+ 							  delayed_refs_bytes);
+ 			num_bytes -= delayed_refs_bytes;
+ 		}
++
++		if (rsv->space_info->force_alloc)
++			do_chunk_alloc = true;
+ 	} else if (num_items == 0 && flush == BTRFS_RESERVE_FLUSH_ALL &&
+ 		   !delayed_refs_rsv->full) {
+ 		/*
+@@ -667,6 +672,19 @@ got_it:
+ 		current->journal_info = h;
+ 
+ 	/*
++	 * If the space_info is marked ALLOC_FORCE then we'll get upgraded to
++	 * ALLOC_FORCE the first run through, and then we won't allocate for
++	 * anybody else who races in later.  We don't care about the return
++	 * value here.
++	 */
++	if (do_chunk_alloc && num_bytes) {
++		u64 flags = h->block_rsv->space_info->flags;
++
++		btrfs_chunk_alloc(h, btrfs_get_alloc_profile(fs_info, flags),
++				  CHUNK_ALLOC_NO_FORCE);
++	}
++
++	/*
+ 	 * btrfs_record_root_in_trans() needs to alloc new extents, and may
+ 	 * call btrfs_join_transaction() while we're also starting a
+ 	 * transaction.
diff --git a/queue-5.7/btrfs-free-alien-device-after-device-add.patch b/queue-5.7/btrfs-free-alien-device-after-device-add.patch
new file mode 100644
index 00000000000..078fccdee9f
--- /dev/null
+++ b/queue-5.7/btrfs-free-alien-device-after-device-add.patch
@@ -0,0 +1,63 @@
+From 7f551d969037cc128eca60688d9c5a300d84e665 Mon Sep 17 00:00:00 2001
+From: Anand Jain <anand.jain@oracle.com>
+Date: Tue, 5 May 2020 02:58:26 +0800
+Subject: btrfs: free alien device after device add
+
+From: Anand Jain <anand.jain@oracle.com>
+
+commit 7f551d969037cc128eca60688d9c5a300d84e665 upstream.
+
+When an old device has new fsid through 'btrfs device add -f <dev>' our
+fs_devices list has an alien device in one of the fs_devices lists.
+
+By having an alien device in fs_devices, we have two issues so far
+
+1. missing device does not not show as missing in the userland
+
+2. degraded mount will fail
+
+Both issues are caused by the fact that there's an alien device in the
+fs_devices list. (Alien means that it does not belong to the filesystem,
+identified by fsid, or does not contain btrfs filesystem at all, eg. due
+to overwrite).
+
+A device can be scanned/added through the control device ioctls
+SCAN_DEV, DEVICES_READY or by ADD_DEV.
+
+And device coming through the control device is checked against the all
+other devices in the lists, but this was not the case for ADD_DEV.
+
+This patch fixes both issues above by removing the alien device.
+
+CC: stable@vger.kernel.org # 5.4+
+Signed-off-by: Anand Jain <anand.jain@oracle.com>
+Reviewed-by: David Sterba <dsterba@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/volumes.c |   12 +++++++++++-
+ 1 file changed, 11 insertions(+), 1 deletion(-)
+
+--- a/fs/btrfs/volumes.c
++++ b/fs/btrfs/volumes.c
+@@ -2663,8 +2663,18 @@ int btrfs_init_new_device(struct btrfs_f
+ 		ret = btrfs_commit_transaction(trans);
+ 	}
+ 
+-	/* Update ctime/mtime for libblkid */
++	/*
++	 * Now that we have written a new super block to this device, check all
++	 * other fs_devices list if device_path alienates any other scanned
++	 * device.
++	 * We can ignore the return value as it typically returns -EINVAL and
++	 * only succeeds if the device was an alien.
++	 */
++	btrfs_forget_devices(device_path);
++
++	/* Update ctime/mtime for blkid or udev */
+ 	update_dev_time(device_path);
++
+ 	return ret;
+ 
+ error_sysfs:
diff --git a/queue-5.7/btrfs-include-non-missing-as-a-qualifier-for-the-latest_bdev.patch b/queue-5.7/btrfs-include-non-missing-as-a-qualifier-for-the-latest_bdev.patch
new file mode 100644
index 00000000000..a24a4abff13
--- /dev/null
+++ b/queue-5.7/btrfs-include-non-missing-as-a-qualifier-for-the-latest_bdev.patch
@@ -0,0 +1,77 @@
+From 998a0671961f66e9fad4990ed75f80ba3088c2f1 Mon Sep 17 00:00:00 2001
+From: Anand Jain <anand.jain@oracle.com>
+Date: Tue, 5 May 2020 02:58:25 +0800
+Subject: btrfs: include non-missing as a qualifier for the latest_bdev
+
+From: Anand Jain <anand.jain@oracle.com>
+
+commit 998a0671961f66e9fad4990ed75f80ba3088c2f1 upstream.
+
+btrfs_free_extra_devids() updates fs_devices::latest_bdev to point to
+the bdev with greatest device::generation number.  For a typical-missing
+device the generation number is zero so fs_devices::latest_bdev will
+never point to it.
+
+But if the missing device is due to alienation [1], then
+device::generation is not zero and if it is greater or equal to the rest
+of device  generations in the list, then fs_devices::latest_bdev ends up
+pointing to the missing device and reports the error like [2].
+
+[1] We maintain devices of a fsid (as in fs_device::fsid) in the
+fs_devices::devices list, a device is considered as an alien device
+if its fsid does not match with the fs_device::fsid
+
+Consider a working filesystem with raid1:
+
+  $ mkfs.btrfs -f -d raid1 -m raid1 /dev/sda /dev/sdb
+  $ mount /dev/sda /mnt-raid1
+  $ umount /mnt-raid1
+
+While mnt-raid1 was unmounted the user force-adds one of its devices to
+another btrfs filesystem:
+
+  $ mkfs.btrfs -f /dev/sdc
+  $ mount /dev/sdc /mnt-single
+  $ btrfs dev add -f /dev/sda /mnt-single
+
+Now the original mnt-raid1 fails to mount in degraded mode, because
+fs_devices::latest_bdev is pointing to the alien device.
+
+  $ mount -o degraded /dev/sdb /mnt-raid1
+
+[2]
+mount: wrong fs type, bad option, bad superblock on /dev/sdb,
+       missing codepage or helper program, or other error
+
+       In some cases useful info is found in syslog - try
+       dmesg | tail or so.
+
+  kernel: BTRFS warning (device sdb): devid 1 uuid 072a0192-675b-4d5a-8640-a5cf2b2c704d is missing
+  kernel: BTRFS error (device sdb): failed to read devices
+  kernel: BTRFS error (device sdb): open_ctree failed
+
+Fix the root cause by checking if the device is not missing before it
+can be considered for the fs_devices::latest_bdev.
+
+CC: stable@vger.kernel.org # 4.19+
+Reviewed-by: Josef Bacik <josef@toxicpanda.com>
+Signed-off-by: Anand Jain <anand.jain@oracle.com>
+Reviewed-by: David Sterba <dsterba@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/volumes.c |    2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/fs/btrfs/volumes.c
++++ b/fs/btrfs/volumes.c
+@@ -1042,6 +1042,8 @@ again:
+ 							&device->dev_state)) {
+ 			if (!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
+ 			     &device->dev_state) &&
++			    !test_bit(BTRFS_DEV_STATE_MISSING,
++				      &device->dev_state) &&
+ 			     (!latest_dev ||
+ 			      device->generation > latest_dev->generation)) {
+ 				latest_dev = device;
diff --git a/queue-5.7/btrfs-reloc-fix-reloc-root-leak-and-null-pointer-dereference.patch b/queue-5.7/btrfs-reloc-fix-reloc-root-leak-and-null-pointer-dereference.patch
new file mode 100644
index 00000000000..b663bf7c1a9
--- /dev/null
+++ b/queue-5.7/btrfs-reloc-fix-reloc-root-leak-and-null-pointer-dereference.patch
@@ -0,0 +1,134 @@
+From 51415b6c1b117e223bc083e30af675cb5c5498f3 Mon Sep 17 00:00:00 2001
+From: Qu Wenruo <wqu@suse.com>
+Date: Tue, 19 May 2020 10:13:20 +0800
+Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference
+
+From: Qu Wenruo <wqu@suse.com>
+
+commit 51415b6c1b117e223bc083e30af675cb5c5498f3 upstream.
+
+[BUG]
+When balance is canceled, there is a pretty high chance that unmounting
+the fs can lead to lead the NULL pointer dereference:
+
+  BTRFS warning (device dm-3): page private not zero on page 223158272
+  ...
+  BTRFS warning (device dm-3): page private not zero on page 223162368
+  BTRFS error (device dm-3): leaked root 18446744073709551608-304 refcount 1
+  BUG: kernel NULL pointer dereference, address: 0000000000000168
+  #PF: supervisor read access in kernel mode
+  #PF: error_code(0x0000) - not-present page
+  PGD 0 P4D 0
+  Oops: 0000 [#1] PREEMPT SMP NOPTI
+  CPU: 2 PID: 5793 Comm: umount Tainted: G           O      5.7.0-rc5-custom+ #53
+  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
+  RIP: 0010:__lock_acquire+0x5dc/0x24c0
+  Call Trace:
+   lock_acquire+0xab/0x390
+   _raw_spin_lock+0x39/0x80
+   btrfs_release_extent_buffer_pages+0xd7/0x200 [btrfs]
+   release_extent_buffer+0xb2/0x170 [btrfs]
+   free_extent_buffer+0x66/0xb0 [btrfs]
+   btrfs_put_root+0x8e/0x130 [btrfs]
+   btrfs_check_leaked_roots.cold+0x5/0x5d [btrfs]
+   btrfs_free_fs_info+0xe5/0x120 [btrfs]
+   btrfs_kill_super+0x1f/0x30 [btrfs]
+   deactivate_locked_super+0x3b/0x80
+   deactivate_super+0x3e/0x50
+   cleanup_mnt+0x109/0x160
+   __cleanup_mnt+0x12/0x20
+   task_work_run+0x67/0xa0
+   exit_to_usermode_loop+0xc5/0xd0
+   syscall_return_slowpath+0x205/0x360
+   do_syscall_64+0x6e/0xb0
+   entry_SYSCALL_64_after_hwframe+0x49/0xb3
+  RIP: 0033:0x7fd028ef740b
+
+[CAUSE]
+When balance is canceled, all reloc roots are marked as orphan, and
+orphan reloc roots are going to be cleaned up.
+
+However for orphan reloc roots and merged reloc roots, their lifespan
+are quite different:
+
+	Merged reloc roots	|	Orphan reloc roots by cancel
+--------------------------------------------------------------------
+create_reloc_root()		| create_reloc_root()
+|- refs == 1			| |- refs == 1
+				|
+btrfs_grab_root(reloc_root);	| btrfs_grab_root(reloc_root);
+|- refs == 2			| |- refs == 2
+				|
+root->reloc_root = reloc_root;	| root->reloc_root = reloc_root;
+		>>> No difference so far <<<
+				|
+prepare_to_merge()		| prepare_to_merge()
+|- btrfs_set_root_refs(item, 1);| |- if (!err) (err == -EINTR)
+				|
+merge_reloc_roots()		| merge_reloc_roots()
+|- merge_reloc_root()		| |- Doing nothing to put reloc root
+   |- insert_dirty_subvol()	| |- refs == 2
+      |- __del_reloc_root()	|
+         |- btrfs_put_root()	|
+            |- refs == 1	|
+		>>> Now orphan reloc roots still have refs 2 <<<
+				|
+clean_dirty_subvols()		| clean_dirty_subvols()
+|- btrfs_drop_snapshot()	| |- btrfS_drop_snapshot()
+   |- reloc_root get freed	|    |- reloc_root still has refs 2
+				|	related ebs get freed, but
+				|	reloc_root still recorded in
+				|	allocated_roots
+btrfs_check_leaked_roots()	| btrfs_check_leaked_roots()
+|- No leaked roots		| |- Leaked reloc_roots detected
+				| |- btrfs_put_root()
+				|    |- free_extent_buffer(root->node);
+				|       |- eb already freed, caused NULL
+				|	   pointer dereference
+
+[FIX]
+The fix is to clear fs_root->reloc_root and put it at
+merge_reloc_roots() time, so that we won't leak reloc roots.
+
+Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
+CC: stable@vger.kernel.org # 5.1+
+Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
+Signed-off-by: Qu Wenruo <wqu@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/relocation.c |   12 +++++++++---
+ 1 file changed, 9 insertions(+), 3 deletions(-)
+
+--- a/fs/btrfs/relocation.c
++++ b/fs/btrfs/relocation.c
+@@ -2624,12 +2624,10 @@ again:
+ 		reloc_root = list_entry(reloc_roots.next,
+ 					struct btrfs_root, root_list);
+ 
++		root = read_fs_root(fs_info, reloc_root->root_key.offset);
+ 		if (btrfs_root_refs(&reloc_root->root_item) > 0) {
+-			root = read_fs_root(fs_info,
+-					    reloc_root->root_key.offset);
+ 			BUG_ON(IS_ERR(root));
+ 			BUG_ON(root->reloc_root != reloc_root);
+-
+ 			ret = merge_reloc_root(rc, root);
+ 			btrfs_put_root(root);
+ 			if (ret) {
+@@ -2639,6 +2637,14 @@ again:
+ 				goto out;
+ 			}
+ 		} else {
++			if (!IS_ERR(root)) {
++				if (root->reloc_root == reloc_root) {
++					root->reloc_root = NULL;
++					btrfs_put_root(reloc_root);
++				}
++				btrfs_put_root(root);
++			}
++
+ 			list_del_init(&reloc_root->root_list);
+ 			/* Don't forget to queue this reloc root for cleanup */
+ 			list_add_tail(&reloc_root->reloc_dirty_list,
diff --git a/queue-5.7/btrfs-send-emit-file-capabilities-after-chown.patch b/queue-5.7/btrfs-send-emit-file-capabilities-after-chown.patch
new file mode 100644
index 00000000000..df51c7022cd
--- /dev/null
+++ b/queue-5.7/btrfs-send-emit-file-capabilities-after-chown.patch
@@ -0,0 +1,154 @@
+From 89efda52e6b6930f80f5adda9c3c9edfb1397191 Mon Sep 17 00:00:00 2001
+From: Marcos Paulo de Souza <mpdesouza@suse.com>
+Date: Sun, 10 May 2020 23:15:07 -0300
+Subject: btrfs: send: emit file capabilities after chown
+
+From: Marcos Paulo de Souza <mpdesouza@suse.com>
+
+commit 89efda52e6b6930f80f5adda9c3c9edfb1397191 upstream.
+
+Whenever a chown is executed, all capabilities of the file being touched
+are lost.  When doing incremental send with a file with capabilities,
+there is a situation where the capability can be lost on the receiving
+side. The sequence of actions bellow shows the problem:
+
+  $ mount /dev/sda fs1
+  $ mount /dev/sdb fs2
+
+  $ touch fs1/foo.bar
+  $ setcap cap_sys_nice+ep fs1/foo.bar
+  $ btrfs subvolume snapshot -r fs1 fs1/snap_init
+  $ btrfs send fs1/snap_init | btrfs receive fs2
+
+  $ chgrp adm fs1/foo.bar
+  $ setcap cap_sys_nice+ep fs1/foo.bar
+
+  $ btrfs subvolume snapshot -r fs1 fs1/snap_complete
+  $ btrfs subvolume snapshot -r fs1 fs1/snap_incremental
+
+  $ btrfs send fs1/snap_complete | btrfs receive fs2
+  $ btrfs send -p fs1/snap_init fs1/snap_incremental | btrfs receive fs2
+
+At this point, only a chown was emitted by "btrfs send" since only the
+group was changed. This makes the cap_sys_nice capability to be dropped
+from fs2/snap_incremental/foo.bar
+
+To fix that, only emit capabilities after chown is emitted. The current
+code first checks for xattrs that are new/changed, emits them, and later
+emit the chown. Now, __process_new_xattr skips capabilities, letting
+only finish_inode_if_needed to emit them, if they exist, for the inode
+being processed.
+
+This behavior was being worked around in "btrfs receive" side by caching
+the capability and only applying it after chown. Now, xattrs are only
+emmited _after_ chown, making that workaround not needed anymore.
+
+Link: https://github.com/kdave/btrfs-progs/issues/202
+CC: stable@vger.kernel.org # 4.4+
+Suggested-by: Filipe Manana <fdmanana@suse.com>
+Reviewed-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ fs/btrfs/send.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 67 insertions(+)
+
+--- a/fs/btrfs/send.c
++++ b/fs/btrfs/send.c
+@@ -23,6 +23,7 @@
+ #include "btrfs_inode.h"
+ #include "transaction.h"
+ #include "compression.h"
++#include "xattr.h"
+ 
+ /*
+  * Maximum number of references an extent can have in order for us to attempt to
+@@ -4545,6 +4546,10 @@ static int __process_new_xattr(int num,
+ 	struct fs_path *p;
+ 	struct posix_acl_xattr_header dummy_acl;
+ 
++	/* Capabilities are emitted by finish_inode_if_needed */
++	if (!strncmp(name, XATTR_NAME_CAPS, name_len))
++		return 0;
++
+ 	p = fs_path_alloc();
+ 	if (!p)
+ 		return -ENOMEM;
+@@ -5107,6 +5112,64 @@ static int send_extent_data(struct send_
+ 	return 0;
+ }
+ 
++/*
++ * Search for a capability xattr related to sctx->cur_ino. If the capability is
++ * found, call send_set_xattr function to emit it.
++ *
++ * Return 0 if there isn't a capability, or when the capability was emitted
++ * successfully, or < 0 if an error occurred.
++ */
++static int send_capabilities(struct send_ctx *sctx)
++{
++	struct fs_path *fspath = NULL;
++	struct btrfs_path *path;
++	struct btrfs_dir_item *di;
++	struct extent_buffer *leaf;
++	unsigned long data_ptr;
++	char *buf = NULL;
++	int buf_len;
++	int ret = 0;
++
++	path = alloc_path_for_send();
++	if (!path)
++		return -ENOMEM;
++
++	di = btrfs_lookup_xattr(NULL, sctx->send_root, path, sctx->cur_ino,
++				XATTR_NAME_CAPS, strlen(XATTR_NAME_CAPS), 0);
++	if (!di) {
++		/* There is no xattr for this inode */
++		goto out;
++	} else if (IS_ERR(di)) {
++		ret = PTR_ERR(di);
++		goto out;
++	}
++
++	leaf = path->nodes[0];
++	buf_len = btrfs_dir_data_len(leaf, di);
++
++	fspath = fs_path_alloc();
++	buf = kmalloc(buf_len, GFP_KERNEL);
++	if (!fspath || !buf) {
++		ret = -ENOMEM;
++		goto out;
++	}
++
++	ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, fspath);
++	if (ret < 0)
++		goto out;
++
++	data_ptr = (unsigned long)(di + 1) + btrfs_dir_name_len(leaf, di);
++	read_extent_buffer(leaf, buf, data_ptr, buf_len);
++
++	ret = send_set_xattr(sctx, fspath, XATTR_NAME_CAPS,
++			strlen(XATTR_NAME_CAPS), buf, buf_len);
++out:
++	kfree(buf);
++	fs_path_free(fspath);
++	btrfs_free_path(path);
++	return ret;
++}
++
+ static int clone_range(struct send_ctx *sctx,
+ 		       struct clone_root *clone_root,
+ 		       const u64 disk_byte,
+@@ -5972,6 +6035,10 @@ static int finish_inode_if_needed(struct
+ 			goto out;
+ 	}
+ 
++	ret = send_capabilities(sctx);
++	if (ret < 0)
++		goto out;
++
+ 	/*
+ 	 * If other directory inodes depended on our current directory
+ 	 * inode's move/rename, now do their move/rename operations.
diff --git a/queue-5.7/series b/queue-5.7/series
index 78e56fe6d65..b04a7e18d75 100644
--- a/queue-5.7/series
+++ b/queue-5.7/series
@@ -253,3 +253,14 @@ bpf-fix-up-bpf_skb_adjust_room-helper-s-skb-csum-set.patch
 s390-bpf-maintain-8-byte-stack-alignment.patch
 kasan-stop-tests-being-eliminated-as-dead-code-with-.patch
 string.h-fix-incompatibility-between-fortify_source-.patch
+btrfs-free-alien-device-after-device-add.patch
+btrfs-include-non-missing-as-a-qualifier-for-the-latest_bdev.patch
+btrfs-fix-a-race-between-scrub-and-block-group-removal-allocation.patch
+btrfs-send-emit-file-capabilities-after-chown.patch
+btrfs-force-chunk-allocation-if-our-global-rsv-is-larger-than-metadata.patch
+btrfs-reloc-fix-reloc-root-leak-and-null-pointer-dereference.patch
+btrfs-fix-error-handling-when-submitting-direct-i-o-bio.patch
+btrfs-fix-corrupt-log-due-to-concurrent-fsync-of-inodes-with-shared-extents.patch
+btrfs-fix-wrong-file-range-cleanup-after-an-error-filling-dealloc-range.patch
+btrfs-fix-space_info-bytes_may_use-underflow-after-nocow-buffered-write.patch
+btrfs-fix-space_info-bytes_may_use-underflow-during-space-cache-writeout.patch