[WARNING]
With extra warning on dirty extent buffers at umount (aka, the next
patch in the series), test case generic/388 can trigger the following
warning about dirty extent buffers at unmount time:
BTRFS critical (device dm-2 state E): emergency shutdown
BTRFS error (device dm-2 state E): error while writing out transaction: -30
BTRFS warning (device dm-2 state E): Skipping commit of aborted transaction.
BTRFS error (device dm-2 state EA): Transaction 9 aborted (error -30)
BTRFS: error (device dm-2 state EA) in cleanup_transaction:2068: errno=-30 Readonly filesystem
BTRFS info (device dm-2 state EA): forced readonly
BTRFS info (device dm-2 state EA): last unmount of filesystem
4fbf2e15-f941-49a0-bc7c-
716315d2777c
------------[ cut here ]------------
WARNING: disk-io.c:3311 at invalidate_and_check_btree_folios+0xfd/0x1ca [btrfs], CPU#8: umount/914368
CPU: 8 UID: 0 PID: 914368 Comm: umount Tainted: G OE 7.1.0-rc1-custom+ #372 PREEMPT(full)
2de38db8d1deae71fde295430a0ff3ab98ccf596
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
RIP: 0010:invalidate_and_check_btree_folios+0xfd/0x1ca [btrfs]
Call Trace:
<TASK>
close_ctree+0x52e/0x574 [btrfs
d2f0b1cd330d1287e7a9919d112eadfc0e914efd]
generic_shutdown_super+0x89/0x1a0
kill_anon_super+0x16/0x40
btrfs_kill_super+0x16/0x20 [btrfs
d2f0b1cd330d1287e7a9919d112eadfc0e914efd]
deactivate_locked_super+0x2d/0xb0
cleanup_mnt+0xdc/0x140
task_work_run+0x5a/0xa0
exit_to_user_mode_loop+0x123/0x4b0
do_syscall_64+0x243/0x7c0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace
0000000000000000 ]---
BTRFS warning (device dm-2 state EA): unable to release extent buffer
30539776 owner 9 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer
30621696 owner 257 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer
30638080 owner 258 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer
30654464 owner 7 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer
30703616 owner 2 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer
30720000 owner 10 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer
30736384 owner 4 gen 9 refs 2 flags 0x7
BTRFS warning (device dm-2 state EA): unable to release extent buffer
30752768 owner 11 gen 9 refs 2 flags 0x7
I'm using a stripped down version, which seems to trigger the warning
more reliably:
_fsstress_pid=""
workload()
{
dmesg -C
mkfs.btrfs -f -K $dev > /dev/null
echo 1 > /sys/kernel/debug/clear_warn_once
mount $dev $mnt
$fsstress -w -n 1024 -p 4 -d $mnt &
_fsstress_pid=$!
sleep 0
$godown $mnt
pkill --echo -PIPE fsstress > /dev/null
wait $_fsstress_pid
unset _fsstress_pid
umount $mnt
if dmesg | grep -q "WARNING"; then
fail
fi
}
for (( i = 0; i < $runtime; i++ )); do
echo "=== $i/$runtime ==="
workload
done
[CAUSE]
Inside btrfs_write_and_wait_transaction(), we first try to write all
dirty ebs, then wait for them to finish.
After that we call btrfs_extent_io_tree_release() to free all
extent states from dirty_pages io tree.
However if we hit an error from btrfs_write_marked_extent(), then we
still call btrfs_extent_io_tree_release() to clear that dirty_pages io
tree, which may contain dirty records that we haven't yet submitted.
Furthermore, the later transaction cleanup path will utilize that
dirty_pages io tree to properly cleanup those dirty ebs, but since it's
already empty, no dirty ebs are properly cleaned up, thus will later
trigger the warnings inside invalidate_btree_folios().
[FIX]
Normally such dirty ebs won't cause problems, as when the iput() is
called on the btree inode, the dirty ebs will be forcibly written back,
and since the fs is already in an error status, such writeback will not
reach disk and finish immediately.
But it's still better to get rid of such dirty ebs, if we ended up with
dirty ebs but the fs is not in an error status, then such writeback at
iput() time will be too late, as all workers are already stopped but
writeback will utilize workers, which will lead to NULL pointer
dereferences.
Instead of unconditionally calling btrfs_extent_io_tree_release(), only
call it if btrfs_write_and_wait_transaction() finished successfully, so
that @dirty_pages extent io tree is kept untouched for transaction
cleanup.
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>