btrfs: add mount time auto fix for orphan fst entries
[BUG]
Before btrfs-progs v6.16.1 release, mkfs.btrfs can leave free space tree
entries for deleted chunks:
# mkfs.btrfs -f -O fst $dev
# btrfs ins dump-tree -t chunk $dev
btrfs-progs v6.16
chunk tree
leaf
22036480 items 4 free space 15781 generation 8 owner CHUNK_TREE
leaf
22036480 flags 0x1(WRITTEN) backref revision 1
item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM
13631488) itemoff 16105 itemsize 80
^^^ The first chunk is at
13631488
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM
22020096) itemoff 15993 itemsize 112
item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM
30408704) itemoff 15881 itemsize 112
# btrfs ins dump-tree -t free-space-tree $dev
btrfs-progs v6.16
free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
leaf
30556160 items 13 free space 15918 generation 8 owner FREE_SPACE_TREE
leaf
30556160 flags 0x1(WRITTEN) backref revision 1
item 0 key (
1048576 FREE_SPACE_INFO
4194304) itemoff 16275 itemsize 8
free space info extent count 1 flags 0
item 1 key (
1048576 FREE_SPACE_EXTENT
4194304) itemoff 16275 itemsize 0
free space extent
item 2 key (
5242880 FREE_SPACE_INFO
8388608) itemoff 16267 itemsize 8
free space info extent count 1 flags 0
item 3 key (
5242880 FREE_SPACE_EXTENT
8388608) itemoff 16267 itemsize 0
free space extent
^^^ Above 4 items are all before the first chunk.
item 4 key (
13631488 FREE_SPACE_INFO
8388608) itemoff 16259 itemsize 8
free space info extent count 1 flags 0
item 5 key (
13631488 FREE_SPACE_EXTENT
8388608) itemoff 16259 itemsize 0
free space extent
...
This can trigger btrfs check errors.
[CAUSE]
It's a bug in free space tree implementation of btrfs-progs, which
doesn't delete involved fst entries for the to-be-deleted chunk/block
group.
[ENHANCEMENT]
The mostly common fix is to clear the space cache and rebuild it, but
that requires a ro->rw remount which may not be possible for rootfs,
and also relies on users to use "clear_cache" mount option manually.
Here introduce a kernel fix for it, which will delete any entries that
is before the first block group automatically at the first RW mount.
For filesystems without such problem, the overhead is just a single tree
search and no modification to the free space tree, thus the overhead
should be minimal.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>