iommu/riscv: Compute best stride for single invalidation
Replace the per-page IOTLB invalidation loop with stride-based
invalidation that uses the level bitmaps from iommu_iotlb_gather.
Pre-calculate the invalidation information before running over the
bonds loop as it is the same for every entry.
The lowest set bit in the PT_FEAT_DETAILED_GATHER bitmaps indicates
the stride. This design ignores the SVNAPOT contiguous pages on the
assumption that they still have to be individually invalidated like
ARM requires, though it is not clear from the spec.
Replace the 2M cutoff for global invalidation with a 512 command
limit. This is the same for a 4k stride and now scales with the
stride size.
Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>