On zoned filesystems metadata space accounting can become overly optimistic
due to delayed refs reservations growing without a hard upper bound.
The delayed_refs_rsv block reservation is allowed to speculatively grow and
is only backed by actual metadata space when refilled. On zoned devices this
can result in delayed_refs_rsv reserving a large portion of metadata space
that is already effectively unusable due to zone write pointer constraints.
As a result, space_info->may_use can grow far beyond the usable metadata
capacity, causing the allocator to believe space is available when it is not.
This leads to premature ENOSPC failures and "cannot satisfy tickets" reports
even though commits would be able to make progress by flushing delayed refs.
Analysis of "-o enospc_debug" dumps using a Python debug script
confirmed that delayed_refs_rsv was responsible for the majority of
metadata overcommit on zoned devices. By correlating space_info counters
(total, used, may_use, zone_unusable) across transactions, the analysis
showed that may_use continued to grow even after usable metadata space
was exhausted, with delayed refs refills accounting for the excess
reservations.
Here's the output of the analysis:
======================================================================
Space Type: METADATA
======================================================================
Raw Values:
Total: 256.00 MB (
268435456 bytes)
Used: 128.00 KB (131072 bytes)
Pinned: 16.00 KB (16384 bytes)
Reserved: 144.00 KB (147456 bytes)
May Use: 255.48 MB (
267894784 bytes)
Zone Unusable: 192.00 KB (196608 bytes)
Calculated Metrics:
Actually Usable: 255.81 MB (total - zone_unusable)
Committed: 255.77 MB (used + pinned + reserved + may_use)
Consumed: 320.00 KB (used + zone_unusable)
Percentages:
Zone Unusable: 0.07% of total
May Use: 99.80% of total
Fix this by adding a zoned-specific cap in btrfs_delayed_refs_rsv_refill():
Before reserving additional metadata bytes, limit the delayed refs
reservation based on the usable metadata space (total bytes minus
zone_unusable). If the reservation would exceed this cap, return -EAGAIN
to trigger the existing flush/commit logic instead of overcommitting
metadata space.
This preserves the existing reservation and flushing semantics while
preventing metadata overcommit on zoned devices. The change is limited to
metadata space and does not affect non-zoned filesystems.
This patch addresses premature metadata ENOSPC conditions on zoned devices
and ensures delayed refs are throttled before exhausting usable metadata.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
* This will refill the delayed block_rsv up to 1 items size worth of space and
* will return -ENOSPC if we can't make the reservation.
*/
+static int btrfs_zoned_cap_metadata_reservation(struct btrfs_space_info *space_info)
+{
+ struct btrfs_fs_info *fs_info = space_info->fs_info;
+ struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv;
+ u64 usable;
+ u64 cap;
+ int ret = 0;
+
+ if (!btrfs_is_zoned(fs_info))
+ return 0;
+
+ spin_lock(&space_info->lock);
+ usable = space_info->total_bytes - space_info->bytes_zone_unusable;
+ spin_unlock(&space_info->lock);
+ cap = usable >> 1;
+
+ spin_lock(&block_rsv->lock);
+ if (block_rsv->size > cap)
+ ret = -EAGAIN;
+ spin_unlock(&block_rsv->lock);
+
+ return ret;
+}
+
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush)
{
if (!num_bytes)
return 0;
+ ret = btrfs_zoned_cap_metadata_reservation(space_info);
+ if (ret)
+ return ret;
+
ret = btrfs_reserve_metadata_bytes(space_info, num_bytes, flush);
if (ret)
return ret;
* here.
*/
ret = btrfs_delayed_refs_rsv_refill(fs_info, flush);
+ if (ret == -EAGAIN) {
+ ASSERT(btrfs_is_zoned(fs_info));
+ ret = btrfs_commit_current_transaction(root);
+ if (ret)
+ goto reserve_fail;
+ ret = btrfs_delayed_refs_rsv_refill(fs_info, flush);
+ }
+
if (ret)
goto reserve_fail;
}