Pull block updates from Jens Axboe:
- Support for batch request processing for ublk, improving the
efficiency of the kernel/ublk server communication. This can yield
nice 7-12% performance improvements
- Support for integrity data for ublk
- Various other ublk improvements and additions, including a ton of
selftests additions and updated
- Move the handling of blk-crypto software fallback from below the
block layer to above it. This reduces the complexity of dealing with
bio splitting
- Series fixing a number of potential deadlocks in blk-mq related to
the queue usage counter and writeback throttling and rq-qos debugfs
handling
- Add an async_depth queue attribute, to resolve a performance
regression that's been around for a qhilw related to the scheduler
depth handling
- Only use task_work for IOPOLL completions on NVMe, if it is necessary
to do so. An earlier fix for an issue resulted in all these
completions being punted to task_work, to guarantee that completions
were only run for a given io_uring ring when it was local to that
ring. With the new changes, we can detect if it's necessary to use
task_work or not, and avoid it if possible.
- rnbd fixes:
- Fix refcount underflow in device unmap path
- Handle PREFLUSH and NOUNMAP flags properly in protocol
- Fix server-side bi_size for special IOs
- Zero response buffer before use
- Fix trace format for flags
- Add .release to rnbd_dev_ktype
- MD pull requests via Yu Kuai
- Fix raid5_run() to return error when log_init() fails
- Fix IO hang with degraded array with llbitmap
- Fix percpu_ref not resurrected on suspend timeout in llbitmap
- Fix GPF in write_page caused by resize race
- Fix NULL pointer dereference in process_metadata_update
- Fix hang when stopping arrays with metadata through dm-raid
- Fix any_working flag handling in raid10_sync_request
- Refactor sync/recovery code path, improve error handling for
badblocks, and remove unused recovery_disabled field
- Consolidate mddev boolean fields into mddev_flags
- Use mempool to allocate stripe_request_ctx and make sure
max_sectors is not less than io_opt in raid5
- Fix return value of mddev_trylock
- Fix memory leak in raid1_run()
- Add Li Nan as mdraid reviewer
- Move phys_vec definitions to the kernel types, mostly in preparation
for some VFIO and RDMA changes
- Improve the speed for secure erase for some devices
- Various little rust updates
- Various other minor fixes, improvements, and cleanups
* tag 'for-7.0/block-
20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
blk-mq: ABI/sysfs-block: fix docs build warnings
selftests: ublk: organize test directories by test ID
block: decouple secure erase size limit from discard size limit
block: remove redundant kill_bdev() call in set_blocksize()
blk-mq: add documentation for new queue attribute async_dpeth
block, bfq: convert to use request_queue->async_depth
mq-deadline: covert to use request_queue->async_depth
kyber: covert to use request_queue->async_depth
blk-mq: add a new queue sysfs attribute async_depth
blk-mq: factor out a helper blk_mq_limit_depth()
blk-mq-sched: unify elevators checking for async requests
block: convert nr_requests to unsigned int
block: don't use strcpy to copy blockdev name
blk-mq-debugfs: warn about possible deadlock
blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs()
blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos()
blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
blk-rq-qos: fix possible debugfs_mutex deadlock
blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter
...
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
- struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
+ if (dma_use_iova(&iod->dma_state) || !dma_need_unmap(dma_dev))
+ return true;
+
+ if (!iod->nr_dma_vecs) {
+ struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
+
+ iod->dma_vecs = mempool_alloc(nvmeq->dev->dmavec_mempool,
+ GFP_ATOMIC);
+ if (!iod->dma_vecs) {
+ iter->status = BLK_STS_RESOURCE;
+ return false;
+ }
+ }
+
+ iod->dma_vecs[iod->nr_dma_vecs].addr = iter->addr;
+ iod->dma_vecs[iod->nr_dma_vecs].len = iter->len;
+ iod->nr_dma_vecs++;
+ return true;
+}
+
+static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
+ struct blk_dma_iter *iter)
+{
if (iter->len)
return true;
- if (!blk_rq_dma_map_iter_next(req, dma_dev, &iod->dma_state, iter))
+ if (!blk_rq_dma_map_iter_next(req, dma_dev, iter))
return false;
- if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) {
- iod->dma_vecs[iod->nr_dma_vecs].addr = iter->addr;
- iod->dma_vecs[iod->nr_dma_vecs].len = iter->len;
- iod->nr_dma_vecs++;
- }
- return true;
+ return nvme_pci_prp_save_mapping(req, dma_dev, iter);
}
static blk_status_t nvme_pci_setup_data_prp(struct request *req,
int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
{
- struct io_wq_work_node *pos, *start, *prev;
unsigned int poll_flags = 0;
DEFINE_IO_COMP_BATCH(iob);
+ struct io_kiocb *req, *tmp;
int nr_events = 0;
+ /*
+ * Store the polling io_ring_ctx so drivers can detect if they're
+ * completing a request in the same ring context that's polling.
+ */
+ iob.poll_ctx = ctx;
+
/*
* Only spin for completions if we don't have multiple devices hanging
* off our complete list.
unsigned tgt_data, unsigned q_id, unsigned is_target_io)
{
/* we only have 7 bits to encode q_id */
- _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7);
+ _Static_assert(UBLK_MAX_QUEUES_SHIFT <= 7, "UBLK_MAX_QUEUES_SHIFT must be <= 7");
- assert(!(tag >> 16) && !(op >> 8) && !(tgt_data >> 16) && !(q_id >> 7));
+ ublk_assert(!(tag >> 16) && !(op >> 8) && !(tgt_data >> 16) && !(q_id >> 7));
- return tag | (op << 16) | (tgt_data << 24) |
+ return tag | ((__u64)op << 16) | ((__u64)tgt_data << 24) |
(__u64)q_id << 56 | (__u64)is_target_io << 63;
}