When writing into a preallocated extent, ordered extent completion calls
btrfs_mark_extent_written() to convert the file extent item from the
BTRFS_FILE_EXTENT_PREALLOC type to the BTRFS_FILE_EXTENT_REG type.
If the preallocated extent was created beyond i_size with fallocate
keep-size, and the inode is evicted and loaded again before the write,
the inode's file_extent_tree is initialized only up to i_size.
The beyond i_size prealloc extent is therefore not tracked there.
After a write into that extent extends i_size, btrfs_mark_extent_written()
updates the file extent item, but the corresponding range is not marked
dirty in the inode's file_extent_tree.
This can leave disk_i_size stale when the filesystem does not use the
no-holes feature, so after remount the file size can go back to the old
value.
The following reproducer triggers the problem:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f -O ^no-holes $DEV
mount $DEV $MNT
touch $MNT/file
fallocate -n -l 2M $MNT/file
umount $MNT
mount $DEV $MNT
dd if=/dev/zero of=$MNT/file bs=1M count=1 conv=notrunc
ls -lh $MNT/file
umount $MNT
mount $DEV $MNT
ls -lh $MNT/file
umount $MNT
Running the reproducer gives the following result:
$ ./test.sh
(...)
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.
000596024 s, 1.8 GB/s
-rw-rw-r-- 1 root root 1.0M May 8 16:34 /mnt/sdi/file
-rw-rw-r-- 1 root root 0 May 8 16:34 /mnt/sdi/file
Fix this by marking the written range dirty in the inode's
file_extent_tree after successfully converting the prealloc extent to a
regular extent.
Fixes: 9ddc959e802b ("btrfs: use the file extent tree infrastructure")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Robbie Ko <robbieko@synology.com>
[ Minor change log updates ]
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>