homework: Ensure we don't stack block devices
Ensure we don't create a loop device on top of a physical block device.
This leads to huge performance degradation of discard operations if the
physical device does not support discard_on_zeroes.
- loop device historical semantics dictates that when the device is
discarded, it needs to return zero data on read. This can be
implemented easily on a filesystem. since fallocate zero-range
would return immediately & the holes are handled at the filesystem
level to return zero data on read.
- For a raw block device, the feature (discard_zeroes_data) depends on
the capabilities of the physical device that is exposed to the
block layer by the driver. This means that to guarantee that the loop
device stacked on a block device returns zero on discarded data,
it needs to convert discarded range into write_zero op on the block device.
https://github.com/torvalds/linux/blob/
63676eefb7a026d04b51dcb7aaf54f358517a2ec/drivers/block/loop.c#L773
For example on one of my local nvme I can see the following:
cat /sys/class/block/nvme1n1/queue/write_zeroes_max_bytes
131072
cat /sys/class/block/nvme0n1/queue/discard_max_hw_bytes
2199023255040
This means maximum size of a write_zero operation can be 128KiB &
maximum size of discard operation can be 2TiB on the block device.
So discarding for example 1TB of data, which would be a single block
device operation, gets split into 8.3 million block device operations
when issued on top of stacked loop device.