git.ipfire.org Git - thirdparty/systemd.git/commit

homework: Ensure we don't stack block devices

Ensure we don't create a loop device on top of a physical block device.
This leads to huge performance degradation of discard operations if the
physical device does not support discard_on_zeroes.

- loop device historical semantics dictates that when the device is
  discarded, it needs to return zero data on read. This can be
  implemented easily on a filesystem. since fallocate zero-range
  would return immediately & the holes are handled at the filesystem
  level to return zero data on read.
- For a raw block device, the feature (discard_zeroes_data) depends on
  the capabilities of the physical device that is exposed to the
  block layer by the driver. This means that to guarantee that the loop
  device stacked on a block device returns zero on discarded data,
  it needs to convert discarded range into write_zero op on the block device.
  https://github.com/torvalds/linux/blob/63676eefb7a026d04b51dcb7aaf54f358517a2ec/drivers/block/loop.c#L773

For example on one of my local nvme I can see the following:
cat /sys/class/block/nvme1n1/queue/write_zeroes_max_bytes
131072
cat /sys/class/block/nvme0n1/queue/discard_max_hw_bytes
2199023255040

This means maximum size of a write_zero operation can be 128KiB &
maximum size of discard operation can be 2TiB on the block device.
So discarding for example 1TB of data, which would be a single block
device operation, gets split into 8.3 million block device operations
when issued on top of stacked loop device.

author	scarlet-storm <12461256+scarlet-storm@users.noreply.github.com>
	Sat, 28 Dec 2024 07:55:25 +0000 (13:25 +0530)
committer	scarlet-storm <12461256+scarlet-storm@users.noreply.github.com>
	Tue, 17 Feb 2026 12:53:35 +0000 (18:23 +0530)
commit	6a389701b22af2a5d492df328cd883a3889b2c1f
tree	708d61012f00884deeb9192f3d8f15e3093c2090	tree
parent	292525dd20ce2b6ecb6511c470bcddcb06a3ea16	commit \| diff