From: Ionut Nechita Date: Tue, 19 May 2026 13:52:33 +0000 (+0300) Subject: scsi: sas: Skip opt_sectors when DMA reports no real optimization hint X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=be8fcd4a8217a916344c88a4b1b84f5736dda17e;p=thirdparty%2Fkernel%2Flinux.git scsi: sas: Skip opt_sectors when DMA reports no real optimization hint sas_host_setup() unconditionally sets shost->opt_sectors from dma_opt_mapping_size(). When the IOMMU is disabled or in passthrough mode and no DMA ops provide an opt_mapping_size callback, dma_opt_mapping_size() returns min(dma_max_mapping_size(), SIZE_MAX) which equals dma_max_mapping_size() — a hard upper bound, not an optimization hint. On a Dell PowerEdge R750 with mpt3sas (Broadcom SAS3816, FW 33.15.00.00) and intel_iommu=off the following values are observed: dma_opt_mapping_size() = dma_max_mapping_size() (no real hint) shost->max_sectors = 32767 opt_sectors = min(32767, huge >> 9) = 32767 optimal_io_size = 32767 << 9 = 16776704 → round_down(16776704, 4096) = 16773120 The SAS disk (SAMSUNG MZILT800HBHQ0D3) does not report an Optimal Transfer Length in VPD page B0, so sdkp->opt_xfer_blocks remains 0. sd_revalidate_disk() then uses min_not_zero(0, opt_sectors) = opt_sectors, propagating the bogus value into the block device's optimal_io_size (visible as OPT-IO = 16773120 in lsblk --topology). mkfs.xfs picks up optimal_io_size and minimum_io_size and computes: swidth = 16773120 / 4096 = 4095 sunit = 8192 / 4096 = 2 Since 4095 % 2 != 0, XFS rejects the geometry: SB stripe unit sanity check failed This makes it impossible to create XFS filesystems (e.g. for /var/lib/docker) during system bootstrap. Fix this by introducing a sas_dma_setup_opt_sectors() helper that sets opt_sectors only when dma_opt_mapping_size() is strictly less than dma_max_mapping_size(), indicating a genuine DMA optimization constraint. The helper computes min(opt_sectors, max_sectors) first, then rounds down to a power of two so that filesystem geometry calculations always produce clean results. When the two DMA values are equal, no backend provided a real hint, so opt_sectors stays at 0 ("no preference"). [mkp: implemented hch's suggestion] Fixes: 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit") Cc: stable@vger.kernel.org Reviewed-by: John Garry Signed-off-by: Ionut Nechita Reviewed-by: Christoph Hellwig Link: https://patch.msgid.link/20260519135238.373784-2-ionut.nechita@windriver.com Signed-off-by: Martin K. Petersen --- diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c index d8f2377b017f..d689b9ed08a6 100644 --- a/drivers/scsi/scsi_transport_sas.c +++ b/drivers/scsi/scsi_transport_sas.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -220,12 +221,45 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, struct sas_rphy *rphy) * SAS host attributes */ +/* + * Set shost->opt_sectors from the DMA optimal mapping size, but only + * when dma_opt_mapping_size() is strictly less than dma_max_mapping_size(), + * indicating a genuine optimization hint from an IOMMU or DMA backend. + * When the two are equal (e.g. IOMMU disabled / passthrough), no real + * hint exists, so leave opt_sectors at 0 to avoid bogus optimal_io_size + * values that break filesystem geometry (e.g. mkfs.xfs stripe alignment). + */ +static void sas_dma_setup_opt_sectors(struct Scsi_Host *shost) +{ + struct device *dma_dev = shost->dma_dev; + size_t opt = dma_opt_mapping_size(dma_dev); + size_t max = dma_max_mapping_size(dma_dev); + unsigned int opt_sectors; + + /* opt >= max means no real hint was provided by the DMA layer */ + if (opt >= max) + return; + + /* Clamp to max_sectors to avoid overflow in sector arithmetic */ + opt_sectors = min_t(unsigned int, opt >> SECTOR_SHIFT, + shost->max_sectors); + + /* Guard against zero before rounddown_pow_of_two() */ + if (!opt_sectors) + return; + + /* + * Round down to power-of-two so filesystem geometry calculations + * (e.g. XFS stripe width/unit) always produce clean divisors. + */ + shost->opt_sectors = rounddown_pow_of_two(opt_sectors); +} + static int sas_host_setup(struct transport_container *tc, struct device *dev, struct device *cdev) { struct Scsi_Host *shost = dev_to_shost(dev); struct sas_host_attrs *sas_host = to_sas_host_attrs(shost); - struct device *dma_dev = shost->dma_dev; INIT_LIST_HEAD(&sas_host->rphy_list); mutex_init(&sas_host->lock); @@ -237,10 +271,7 @@ static int sas_host_setup(struct transport_container *tc, struct device *dev, dev_printk(KERN_ERR, dev, "fail to a bsg device %d\n", shost->host_no); - if (dma_dev->dma_mask) { - shost->opt_sectors = min_t(unsigned int, shost->max_sectors, - dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT); - } + sas_dma_setup_opt_sectors(shost); return 0; }