From 4ed85c3d8bc7c67f47a127b9d4ab60bff0e2f301 Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Tue, 3 Dec 2024 07:33:27 -0500 Subject: [PATCH] Fixes for 6.11 Signed-off-by: Sasha Levin --- queue-6.11/9p-xen-fix-init-sequence.patch | 56 +++ queue-6.11/9p-xen-fix-release-of-irq.patch | 42 ++ ...wner-variant-of-start_freeze-unfreez.patch | 74 ++++ ...-bfq-fix-bfqq-uaf-in-bfq_limit_depth.patch | 199 +++++++++ ...w-an-atomic-write-be-truncated-in-bl.patch | 57 +++ ...af-for-flush-rq-while-iterating-tags.patch | 167 ++++++++ ...ze-enter-queue-as-lock-for-supportin.patch | 364 +++++++++++++++++ ...return-unsigned-int-from-bdev_io_min.patch | 39 ++ ...-number-of-allocated-pages-which-dis.patch | 42 ++ ...ount-make-sure-passwords-are-in-sync.patch | 166 ++++++++ ...-native-symlinks-relative-to-the-exp.patch | 383 ++++++++++++++++++ ...-reparse-point-with-native-symlink-i.patch | 52 +++ ...-unlock-on-error-in-smb3_reconfigure.patch | 39 ++ ...s2-fix-use-of-uninitialized-variable.patch | 57 +++ ...on-t-fail-if-modules.order-is-missin.patch | 58 +++ ...n-t-include-dma-mapping.h-in-kfifo.h.patch | 67 +++ ...move-incorrect-code-in-do_eisa_entry.patch | 86 ++++ ...don-t-attempt-unregister-for-invalid.patch | 72 ++++ ...limit-repeat-device-registration-on-.patch | 71 ++++ ...s-ignore-sb_rdonly-when-mounting-nfs.patch | 79 ++++ ...e-after-free-problem-in-the-asynchro.patch | 52 +++ ...-kernel-crash-while-shutting-down-co.patch | 96 +++++ ...void-hang-on-inaccessible-namespaces.patch | 90 ++++ ...ix-rcu-list-traversal-to-use-srcu-pr.patch | 107 +++++ ...ure-port-and-device-id-bits-are-set-.patch | 50 +++ ...uv3-fix-lockdep-assert-in-event_init.patch | 68 ++++ ...e-to-.data.once-to-fix-resetting-war.patch | 153 +++++++ ...ame-.data.unlikely-to-.data.unlikely.patch | 67 +++ ...-reuse-partially-completed-requests-.patch | 118 ++++++ ...t-fail-temperature-reads-on-undervol.patch | 49 +++ ...dt-bit-position-of-the-status-regist.patch | 39 ++ ...tc_read_time-was-successful-in-rtc_t.patch | 53 +++ ...ix-bcd-to-rtc_time-conversion-errors.patch | 52 +++ ...e-irqf_no_autoen-flag-in-request_irq.patch | 50 +++ queue-6.11/series | 51 +++ ...after-free-bug-in-register_intc_cont.patch | 46 +++ ...le-directory-caching-when-dir_cache_.patch | 44 ++ ...fid-tcon-before-performing-network-o.patch | 45 ++ ...t_sock_upd_timeout-when-reset-transp.patch | 38 ++ ...af-issue-caused-by-sunrpc-kernel-tcp.patch | 165 ++++++++ ...nd-cancel-tls-handshake-with-etimedo.patch | 64 +++ ...ostat-fix-child-s-argument-forwardin.patch | 37 ++ ...wer-turbostat-fix-trailing-n-parsing.patch | 55 +++ ...duplicate-slab-cache-names-while-att.patch | 104 +++++ ...chedule-fm_work-if-wear-leveling-poo.patch | 98 +++++ ...tion-fix-use-after-free-in-ubifs_tnc.patch | 171 ++++++++ ...e-total-block-count-by-deducting-jou.patch | 46 +++ ...race-for-specified-task-in-show_stac.patch | 37 ++ ...-integer-overflow-during-physmem-set.patch | 50 +++ ...n-value-of-elf_core_copy_task_fpregs.patch | 36 ++ ...ialize-ubd-s-disk-pointer-in-ubd_add.patch | 39 ++ ...n-update-algo-in-init_size-descripti.patch | 77 ++++ 52 files changed, 4417 insertions(+) create mode 100644 queue-6.11/9p-xen-fix-init-sequence.patch create mode 100644 queue-6.11/9p-xen-fix-release-of-irq.patch create mode 100644 queue-6.11/blk-mq-add-non_owner-variant-of-start_freeze-unfreez.patch create mode 100644 queue-6.11/block-bfq-fix-bfqq-uaf-in-bfq_limit_depth.patch create mode 100644 queue-6.11/block-don-t-allow-an-atomic-write-be-truncated-in-bl.patch create mode 100644 queue-6.11/block-fix-uaf-for-flush-rq-while-iterating-tags.patch create mode 100644 queue-6.11/block-model-freeze-enter-queue-as-lock-for-supportin.patch create mode 100644 queue-6.11/block-return-unsigned-int-from-bdev_io_min.patch create mode 100644 queue-6.11/brd-decrease-the-number-of-allocated-pages-which-dis.patch create mode 100644 queue-6.11/cifs-during-remount-make-sure-passwords-are-in-sync.patch create mode 100644 queue-6.11/cifs-fix-parsing-native-symlinks-relative-to-the-exp.patch create mode 100644 queue-6.11/cifs-fix-parsing-reparse-point-with-native-symlink-i.patch create mode 100644 queue-6.11/cifs-unlock-on-error-in-smb3_reconfigure.patch create mode 100644 queue-6.11/jffs2-fix-use-of-uninitialized-variable.patch create mode 100644 queue-6.11/kbuild-deb-pkg-don-t-fail-if-modules.order-is-missin.patch create mode 100644 queue-6.11/kfifo-don-t-include-dma-mapping.h-in-kfifo.h.patch create mode 100644 queue-6.11/modpost-remove-incorrect-code-in-do_eisa_entry.patch create mode 100644 queue-6.11/nfs-blocklayout-don-t-attempt-unregister-for-invalid.patch create mode 100644 queue-6.11/nfs-blocklayout-limit-repeat-device-registration-on-.patch create mode 100644 queue-6.11/nfs-ignore-sb_rdonly-when-mounting-nfs.patch create mode 100644 queue-6.11/nfsv4.0-fix-a-use-after-free-problem-in-the-asynchro.patch create mode 100644 queue-6.11/nvme-fabrics-fix-kernel-crash-while-shutting-down-co.patch create mode 100644 queue-6.11/nvme-multipath-avoid-hang-on-inaccessible-namespaces.patch create mode 100644 queue-6.11/nvme-multipath-fix-rcu-list-traversal-to-use-srcu-pr.patch create mode 100644 queue-6.11/perf-arm-cmn-ensure-port-and-device-id-bits-are-set-.patch create mode 100644 queue-6.11/perf-arm-smmuv3-fix-lockdep-assert-in-event_init.patch create mode 100644 queue-6.11/rename-.data.once-to-.data.once-to-fix-resetting-war.patch create mode 100644 queue-6.11/rename-.data.unlikely-to-.data.unlikely.patch create mode 100644 queue-6.11/revert-nfs-don-t-reuse-partially-completed-requests-.patch create mode 100644 queue-6.11/rtc-ab-eoz9-don-t-fail-temperature-reads-on-undervol.patch create mode 100644 queue-6.11/rtc-abx80x-fix-wdt-bit-position-of-the-status-regist.patch create mode 100644 queue-6.11/rtc-check-if-__rtc_read_time-was-successful-in-rtc_t.patch create mode 100644 queue-6.11/rtc-rzn1-fix-bcd-to-rtc_time-conversion-errors.patch create mode 100644 queue-6.11/rtc-st-lpc-use-irqf_no_autoen-flag-in-request_irq.patch create mode 100644 queue-6.11/sh-intc-fix-use-after-free-bug-in-register_intc_cont.patch create mode 100644 queue-6.11/smb-client-disable-directory-caching-when-dir_cache_.patch create mode 100644 queue-6.11/smb-initialize-cfid-tcon-before-performing-network-o.patch create mode 100644 queue-6.11/sunrpc-clear-xprt_sock_upd_timeout-when-reset-transp.patch create mode 100644 queue-6.11/sunrpc-fix-one-uaf-issue-caused-by-sunrpc-kernel-tcp.patch create mode 100644 queue-6.11/sunrpc-timeout-and-cancel-tls-handshake-with-etimedo.patch create mode 100644 queue-6.11/tools-power-turbostat-fix-child-s-argument-forwardin.patch create mode 100644 queue-6.11/tools-power-turbostat-fix-trailing-n-parsing.patch create mode 100644 queue-6.11/ubi-fastmap-fix-duplicate-slab-cache-names-while-att.patch create mode 100644 queue-6.11/ubi-fastmap-wl-schedule-fm_work-if-wear-leveling-poo.patch create mode 100644 queue-6.11/ubifs-authentication-fix-use-after-free-in-ubifs_tnc.patch create mode 100644 queue-6.11/ubifs-correct-the-total-block-count-by-deducting-jou.patch create mode 100644 queue-6.11/um-always-dump-trace-for-specified-task-in-show_stac.patch create mode 100644 queue-6.11/um-fix-potential-integer-overflow-during-physmem-set.patch create mode 100644 queue-6.11/um-fix-the-return-value-of-elf_core_copy_task_fpregs.patch create mode 100644 queue-6.11/um-ubd-initialize-ubd-s-disk-pointer-in-ubd_add.patch create mode 100644 queue-6.11/x86-documentation-update-algo-in-init_size-descripti.patch diff --git a/queue-6.11/9p-xen-fix-init-sequence.patch b/queue-6.11/9p-xen-fix-init-sequence.patch new file mode 100644 index 00000000000..f7ae62d4dae --- /dev/null +++ b/queue-6.11/9p-xen-fix-init-sequence.patch @@ -0,0 +1,56 @@ +From 3f3551055261a85ec4528f513a89302092cd2132 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 19 Nov 2024 21:16:33 +0000 +Subject: 9p/xen: fix init sequence + +From: Alex Zenla + +[ Upstream commit 7ef3ae82a6ebbf4750967d1ce43bcdb7e44ff74b ] + +Large amount of mount hangs observed during hotplugging of 9pfs devices. The +9pfs Xen driver attempts to initialize itself more than once, causing the +frontend and backend to disagree: the backend listens on a channel that the +frontend does not send on, resulting in stalled processing. + +Only allow initialization of 9p frontend once. + +Fixes: c15fe55d14b3b ("9p/xen: fix connection sequence") +Signed-off-by: Alex Zenla +Signed-off-by: Alexander Merritt +Signed-off-by: Ariadne Conill +Reviewed-by: Juergen Gross +Message-ID: <20241119211633.38321-1-alexander@edera.dev> +Signed-off-by: Dominique Martinet +Signed-off-by: Sasha Levin +--- + net/9p/trans_xen.c | 7 +++++-- + 1 file changed, 5 insertions(+), 2 deletions(-) + +diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c +index dfdbe1ca53387..0304e8a1616d8 100644 +--- a/net/9p/trans_xen.c ++++ b/net/9p/trans_xen.c +@@ -465,6 +465,7 @@ static int xen_9pfs_front_init(struct xenbus_device *dev) + goto error; + } + ++ xenbus_switch_state(dev, XenbusStateInitialised); + return 0; + + error_xenbus: +@@ -512,8 +513,10 @@ static void xen_9pfs_front_changed(struct xenbus_device *dev, + break; + + case XenbusStateInitWait: +- if (!xen_9pfs_front_init(dev)) +- xenbus_switch_state(dev, XenbusStateInitialised); ++ if (dev->state != XenbusStateInitialising) ++ break; ++ ++ xen_9pfs_front_init(dev); + break; + + case XenbusStateConnected: +-- +2.43.0 + diff --git a/queue-6.11/9p-xen-fix-release-of-irq.patch b/queue-6.11/9p-xen-fix-release-of-irq.patch new file mode 100644 index 00000000000..fb8edc9fd56 --- /dev/null +++ b/queue-6.11/9p-xen-fix-release-of-irq.patch @@ -0,0 +1,42 @@ +From 4d21ec2c1c150cd9eb40c006db90eebf2dc111b7 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 21 Nov 2024 22:51:00 +0000 +Subject: 9p/xen: fix release of IRQ + +From: Alex Zenla + +[ Upstream commit e43c608f40c065b30964f0a806348062991b802d ] + +Kernel logs indicate an IRQ was double-freed. + +Pass correct device ID during IRQ release. + +Fixes: 71ebd71921e45 ("xen/9pfs: connect to the backend") +Signed-off-by: Alex Zenla +Signed-off-by: Alexander Merritt +Signed-off-by: Ariadne Conill +Reviewed-by: Juergen Gross +Message-ID: <20241121225100.5736-1-alexander@edera.dev> +[Dominique: remove confusing variable reset to 0] +Signed-off-by: Dominique Martinet +Signed-off-by: Sasha Levin +--- + net/9p/trans_xen.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c +index 0304e8a1616d8..b9ff69c7522a1 100644 +--- a/net/9p/trans_xen.c ++++ b/net/9p/trans_xen.c +@@ -286,7 +286,7 @@ static void xen_9pfs_front_free(struct xen_9pfs_front_priv *priv) + if (!priv->rings[i].intf) + break; + if (priv->rings[i].irq > 0) +- unbind_from_irqhandler(priv->rings[i].irq, priv->dev); ++ unbind_from_irqhandler(priv->rings[i].irq, ring); + if (priv->rings[i].data.in) { + for (j = 0; + j < (1 << priv->rings[i].intf->ring_order); +-- +2.43.0 + diff --git a/queue-6.11/blk-mq-add-non_owner-variant-of-start_freeze-unfreez.patch b/queue-6.11/blk-mq-add-non_owner-variant-of-start_freeze-unfreez.patch new file mode 100644 index 00000000000..493990fd07f --- /dev/null +++ b/queue-6.11/blk-mq-add-non_owner-variant-of-start_freeze-unfreez.patch @@ -0,0 +1,74 @@ +From 09c98173cf6d9c16275416b38e3183dbede8430f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 25 Oct 2024 08:37:18 +0800 +Subject: blk-mq: add non_owner variant of start_freeze/unfreeze queue APIs + +From: Ming Lei + +[ Upstream commit 8acdd0e7bfadda6b5103f2960d293581954454ed ] + +Add non_owner variant of start_freeze/unfreeze queue APIs, so that the +caller knows that what they are doing, and we can skip lockdep support +for non_owner variant in per-call level. + +Prepare for supporting lockdep for freezing/unfreezing queue. + +Reviewed-by: Christoph Hellwig +Suggested-by: Christoph Hellwig +Signed-off-by: Ming Lei +Link: https://lore.kernel.org/r/20241025003722.3630252-2-ming.lei@redhat.com +Signed-off-by: Jens Axboe +Stable-dep-of: 3802f73bd807 ("block: fix uaf for flush rq while iterating tags") +Signed-off-by: Sasha Levin +--- + block/blk-mq.c | 20 ++++++++++++++++++++ + include/linux/blk-mq.h | 2 ++ + 2 files changed, 22 insertions(+) + +diff --git a/block/blk-mq.c b/block/blk-mq.c +index f2c7fe2dc7aac..a2c40a97328b6 100644 +--- a/block/blk-mq.c ++++ b/block/blk-mq.c +@@ -196,6 +196,26 @@ void blk_mq_unfreeze_queue(struct request_queue *q) + } + EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); + ++/* ++ * non_owner variant of blk_freeze_queue_start ++ * ++ * Unlike blk_freeze_queue_start, the queue doesn't need to be unfrozen ++ * by the same task. This is fragile and should not be used if at all ++ * possible. ++ */ ++void blk_freeze_queue_start_non_owner(struct request_queue *q) ++{ ++ blk_freeze_queue_start(q); ++} ++EXPORT_SYMBOL_GPL(blk_freeze_queue_start_non_owner); ++ ++/* non_owner variant of blk_mq_unfreeze_queue */ ++void blk_mq_unfreeze_queue_non_owner(struct request_queue *q) ++{ ++ __blk_mq_unfreeze_queue(q, false); ++} ++EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue_non_owner); ++ + /* + * FIXME: replace the scsi_internal_device_*block_nowait() calls in the + * mpt3sas driver such that this function can be removed. +diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h +index 8d304b1d16b15..a2d27a4d7b6c7 100644 +--- a/include/linux/blk-mq.h ++++ b/include/linux/blk-mq.h +@@ -928,6 +928,8 @@ void blk_freeze_queue_start(struct request_queue *q); + void blk_mq_freeze_queue_wait(struct request_queue *q); + int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, + unsigned long timeout); ++void blk_mq_unfreeze_queue_non_owner(struct request_queue *q); ++void blk_freeze_queue_start_non_owner(struct request_queue *q); + + void blk_mq_map_queues(struct blk_mq_queue_map *qmap); + void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues); +-- +2.43.0 + diff --git a/queue-6.11/block-bfq-fix-bfqq-uaf-in-bfq_limit_depth.patch b/queue-6.11/block-bfq-fix-bfqq-uaf-in-bfq_limit_depth.patch new file mode 100644 index 00000000000..5e8ddf27b00 --- /dev/null +++ b/queue-6.11/block-bfq-fix-bfqq-uaf-in-bfq_limit_depth.patch @@ -0,0 +1,199 @@ +From 321efcf000b90e5ed9697d39cc00bf8b59869eb4 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 29 Nov 2024 17:15:09 +0800 +Subject: block, bfq: fix bfqq uaf in bfq_limit_depth() + +From: Yu Kuai + +[ Upstream commit e8b8344de3980709080d86c157d24e7de07d70ad ] + +Set new allocated bfqq to bic or remove freed bfqq from bic are both +protected by bfqd->lock, however bfq_limit_depth() is deferencing bfqq +from bic without the lock, this can lead to UAF if the io_context is +shared by multiple tasks. + +For example, test bfq with io_uring can trigger following UAF in v6.6: + +================================================================== +BUG: KASAN: slab-use-after-free in bfqq_group+0x15/0x50 + +Call Trace: + + dump_stack_lvl+0x47/0x80 + print_address_description.constprop.0+0x66/0x300 + print_report+0x3e/0x70 + kasan_report+0xb4/0xf0 + bfqq_group+0x15/0x50 + bfqq_request_over_limit+0x130/0x9a0 + bfq_limit_depth+0x1b5/0x480 + __blk_mq_alloc_requests+0x2b5/0xa00 + blk_mq_get_new_requests+0x11d/0x1d0 + blk_mq_submit_bio+0x286/0xb00 + submit_bio_noacct_nocheck+0x331/0x400 + __block_write_full_folio+0x3d0/0x640 + writepage_cb+0x3b/0xc0 + write_cache_pages+0x254/0x6c0 + write_cache_pages+0x254/0x6c0 + do_writepages+0x192/0x310 + filemap_fdatawrite_wbc+0x95/0xc0 + __filemap_fdatawrite_range+0x99/0xd0 + filemap_write_and_wait_range.part.0+0x4d/0xa0 + blkdev_read_iter+0xef/0x1e0 + io_read+0x1b6/0x8a0 + io_issue_sqe+0x87/0x300 + io_wq_submit_work+0xeb/0x390 + io_worker_handle_work+0x24d/0x550 + io_wq_worker+0x27f/0x6c0 + ret_from_fork_asm+0x1b/0x30 + + +Allocated by task 808602: + kasan_save_stack+0x1e/0x40 + kasan_set_track+0x21/0x30 + __kasan_slab_alloc+0x83/0x90 + kmem_cache_alloc_node+0x1b1/0x6d0 + bfq_get_queue+0x138/0xfa0 + bfq_get_bfqq_handle_split+0xe3/0x2c0 + bfq_init_rq+0x196/0xbb0 + bfq_insert_request.isra.0+0xb5/0x480 + bfq_insert_requests+0x156/0x180 + blk_mq_insert_request+0x15d/0x440 + blk_mq_submit_bio+0x8a4/0xb00 + submit_bio_noacct_nocheck+0x331/0x400 + __blkdev_direct_IO_async+0x2dd/0x330 + blkdev_write_iter+0x39a/0x450 + io_write+0x22a/0x840 + io_issue_sqe+0x87/0x300 + io_wq_submit_work+0xeb/0x390 + io_worker_handle_work+0x24d/0x550 + io_wq_worker+0x27f/0x6c0 + ret_from_fork+0x2d/0x50 + ret_from_fork_asm+0x1b/0x30 + +Freed by task 808589: + kasan_save_stack+0x1e/0x40 + kasan_set_track+0x21/0x30 + kasan_save_free_info+0x27/0x40 + __kasan_slab_free+0x126/0x1b0 + kmem_cache_free+0x10c/0x750 + bfq_put_queue+0x2dd/0x770 + __bfq_insert_request.isra.0+0x155/0x7a0 + bfq_insert_request.isra.0+0x122/0x480 + bfq_insert_requests+0x156/0x180 + blk_mq_dispatch_plug_list+0x528/0x7e0 + blk_mq_flush_plug_list.part.0+0xe5/0x590 + __blk_flush_plug+0x3b/0x90 + blk_finish_plug+0x40/0x60 + do_writepages+0x19d/0x310 + filemap_fdatawrite_wbc+0x95/0xc0 + __filemap_fdatawrite_range+0x99/0xd0 + filemap_write_and_wait_range.part.0+0x4d/0xa0 + blkdev_read_iter+0xef/0x1e0 + io_read+0x1b6/0x8a0 + io_issue_sqe+0x87/0x300 + io_wq_submit_work+0xeb/0x390 + io_worker_handle_work+0x24d/0x550 + io_wq_worker+0x27f/0x6c0 + ret_from_fork+0x2d/0x50 + ret_from_fork_asm+0x1b/0x30 + +Fix the problem by protecting bic_to_bfqq() with bfqd->lock. + +CC: Jan Kara +Fixes: 76f1df88bbc2 ("bfq: Limit number of requests consumed by each cgroup") +Signed-off-by: Yu Kuai +Link: https://lore.kernel.org/r/20241129091509.2227136-1-yukuai1@huaweicloud.com +Signed-off-by: Jens Axboe +Signed-off-by: Sasha Levin +--- + block/bfq-iosched.c | 37 ++++++++++++++++++++++++------------- + 1 file changed, 24 insertions(+), 13 deletions(-) + +diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c +index 1cc40a857fb85..2d94a4bb9efa5 100644 +--- a/block/bfq-iosched.c ++++ b/block/bfq-iosched.c +@@ -582,23 +582,31 @@ static struct request *bfq_choose_req(struct bfq_data *bfqd, + #define BFQ_LIMIT_INLINE_DEPTH 16 + + #ifdef CONFIG_BFQ_GROUP_IOSCHED +-static bool bfqq_request_over_limit(struct bfq_queue *bfqq, int limit) ++static bool bfqq_request_over_limit(struct bfq_data *bfqd, ++ struct bfq_io_cq *bic, blk_opf_t opf, ++ unsigned int act_idx, int limit) + { +- struct bfq_data *bfqd = bfqq->bfqd; +- struct bfq_entity *entity = &bfqq->entity; + struct bfq_entity *inline_entities[BFQ_LIMIT_INLINE_DEPTH]; + struct bfq_entity **entities = inline_entities; +- int depth, level, alloc_depth = BFQ_LIMIT_INLINE_DEPTH; +- int class_idx = bfqq->ioprio_class - 1; ++ int alloc_depth = BFQ_LIMIT_INLINE_DEPTH; + struct bfq_sched_data *sched_data; ++ struct bfq_entity *entity; ++ struct bfq_queue *bfqq; + unsigned long wsum; + bool ret = false; +- +- if (!entity->on_st_or_in_serv) +- return false; ++ int depth; ++ int level; + + retry: + spin_lock_irq(&bfqd->lock); ++ bfqq = bic_to_bfqq(bic, op_is_sync(opf), act_idx); ++ if (!bfqq) ++ goto out; ++ ++ entity = &bfqq->entity; ++ if (!entity->on_st_or_in_serv) ++ goto out; ++ + /* +1 for bfqq entity, root cgroup not included */ + depth = bfqg_to_blkg(bfqq_group(bfqq))->blkcg->css.cgroup->level + 1; + if (depth > alloc_depth) { +@@ -643,7 +651,7 @@ static bool bfqq_request_over_limit(struct bfq_queue *bfqq, int limit) + * class. + */ + wsum = 0; +- for (i = 0; i <= class_idx; i++) { ++ for (i = 0; i <= bfqq->ioprio_class - 1; i++) { + wsum = wsum * IOPRIO_BE_NR + + sched_data->service_tree[i].wsum; + } +@@ -666,7 +674,9 @@ static bool bfqq_request_over_limit(struct bfq_queue *bfqq, int limit) + return ret; + } + #else +-static bool bfqq_request_over_limit(struct bfq_queue *bfqq, int limit) ++static bool bfqq_request_over_limit(struct bfq_data *bfqd, ++ struct bfq_io_cq *bic, blk_opf_t opf, ++ unsigned int act_idx, int limit) + { + return false; + } +@@ -704,8 +714,9 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data) + } + + for (act_idx = 0; bic && act_idx < bfqd->num_actuators; act_idx++) { +- struct bfq_queue *bfqq = +- bic_to_bfqq(bic, op_is_sync(opf), act_idx); ++ /* Fast path to check if bfqq is already allocated. */ ++ if (!bic_to_bfqq(bic, op_is_sync(opf), act_idx)) ++ continue; + + /* + * Does queue (or any parent entity) exceed number of +@@ -713,7 +724,7 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data) + * limit depth so that it cannot consume more + * available requests and thus starve other entities. + */ +- if (bfqq && bfqq_request_over_limit(bfqq, limit)) { ++ if (bfqq_request_over_limit(bfqd, bic, opf, act_idx, limit)) { + depth = 1; + break; + } +-- +2.43.0 + diff --git a/queue-6.11/block-don-t-allow-an-atomic-write-be-truncated-in-bl.patch b/queue-6.11/block-don-t-allow-an-atomic-write-be-truncated-in-bl.patch new file mode 100644 index 00000000000..bbedcfc56b7 --- /dev/null +++ b/queue-6.11/block-don-t-allow-an-atomic-write-be-truncated-in-bl.patch @@ -0,0 +1,57 @@ +From 3516a7d1464fc062e7ba84696923fe8747b9f5e7 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 27 Nov 2024 09:23:18 +0000 +Subject: block: Don't allow an atomic write be truncated in + blkdev_write_iter() + +From: John Garry + +[ Upstream commit 2cbd51f1f8739fd2fdf4bae1386bcf75ce0176ba ] + +A write which goes past the end of the bdev in blkdev_write_iter() will +be truncated. Truncating cannot tolerated for an atomic write, so error +that condition. + +Fixes: caf336f81b3a ("block: Add fops atomic write support") +Signed-off-by: John Garry +Reviewed-by: Christoph Hellwig +Link: https://lore.kernel.org/r/20241127092318.632790-1-john.g.garry@oracle.com +Signed-off-by: Jens Axboe +Signed-off-by: Sasha Levin +--- + block/fops.c | 5 ++++- + 1 file changed, 4 insertions(+), 1 deletion(-) + +diff --git a/block/fops.c b/block/fops.c +index 56db751bba49b..16bb5ae702379 100644 +--- a/block/fops.c ++++ b/block/fops.c +@@ -676,6 +676,7 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) + struct file *file = iocb->ki_filp; + struct inode *bd_inode = bdev_file_inode(file); + struct block_device *bdev = I_BDEV(bd_inode); ++ bool atomic = iocb->ki_flags & IOCB_ATOMIC; + loff_t size = bdev_nr_bytes(bdev); + size_t shorted = 0; + ssize_t ret; +@@ -695,7 +696,7 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) + if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT) + return -EOPNOTSUPP; + +- if (iocb->ki_flags & IOCB_ATOMIC) { ++ if (atomic) { + ret = generic_atomic_write_valid(iocb, from); + if (ret) + return ret; +@@ -703,6 +704,8 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) + + size -= iocb->ki_pos; + if (iov_iter_count(from) > size) { ++ if (atomic) ++ return -EINVAL; + shorted = iov_iter_count(from) - size; + iov_iter_truncate(from, size); + } +-- +2.43.0 + diff --git a/queue-6.11/block-fix-uaf-for-flush-rq-while-iterating-tags.patch b/queue-6.11/block-fix-uaf-for-flush-rq-while-iterating-tags.patch new file mode 100644 index 00000000000..7f74725f39a --- /dev/null +++ b/queue-6.11/block-fix-uaf-for-flush-rq-while-iterating-tags.patch @@ -0,0 +1,167 @@ +From 99d5351e73f49aaaa5b7cb29fecbe5a809162224 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 4 Nov 2024 19:00:05 +0800 +Subject: block: fix uaf for flush rq while iterating tags + +From: Yu Kuai + +[ Upstream commit 3802f73bd80766d70f319658f334754164075bc3 ] + +blk_mq_clear_flush_rq_mapping() is not called during scsi probe, by +checking blk_queue_init_done(). However, QUEUE_FLAG_INIT_DONE is cleared +in del_gendisk by commit aec89dc5d421 ("block: keep q_usage_counter in +atomic mode after del_gendisk"), hence for disk like scsi, following +blk_mq_destroy_queue() will not clear flush rq from tags->rqs[] as well, +cause following uaf that is found by our syzkaller for v6.6: + +================================================================== +BUG: KASAN: slab-use-after-free in blk_mq_find_and_get_req+0x16e/0x1a0 block/blk-mq-tag.c:261 +Read of size 4 at addr ffff88811c969c20 by task kworker/1:2H/224909 + +CPU: 1 PID: 224909 Comm: kworker/1:2H Not tainted 6.6.0-ga836a5060850 #32 +Workqueue: kblockd blk_mq_timeout_work +Call Trace: + +__dump_stack lib/dump_stack.c:88 [inline] +dump_stack_lvl+0x91/0xf0 lib/dump_stack.c:106 +print_address_description.constprop.0+0x66/0x300 mm/kasan/report.c:364 +print_report+0x3e/0x70 mm/kasan/report.c:475 +kasan_report+0xb8/0xf0 mm/kasan/report.c:588 +blk_mq_find_and_get_req+0x16e/0x1a0 block/blk-mq-tag.c:261 +bt_iter block/blk-mq-tag.c:288 [inline] +__sbitmap_for_each_set include/linux/sbitmap.h:295 [inline] +sbitmap_for_each_set include/linux/sbitmap.h:316 [inline] +bt_for_each+0x455/0x790 block/blk-mq-tag.c:325 +blk_mq_queue_tag_busy_iter+0x320/0x740 block/blk-mq-tag.c:534 +blk_mq_timeout_work+0x1a3/0x7b0 block/blk-mq.c:1673 +process_one_work+0x7c4/0x1450 kernel/workqueue.c:2631 +process_scheduled_works kernel/workqueue.c:2704 [inline] +worker_thread+0x804/0xe40 kernel/workqueue.c:2785 +kthread+0x346/0x450 kernel/kthread.c:388 +ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 +ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:293 + +Allocated by task 942: +kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 +kasan_set_track+0x25/0x30 mm/kasan/common.c:52 +____kasan_kmalloc mm/kasan/common.c:374 [inline] +__kasan_kmalloc mm/kasan/common.c:383 [inline] +__kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:380 +kasan_kmalloc include/linux/kasan.h:198 [inline] +__do_kmalloc_node mm/slab_common.c:1007 [inline] +__kmalloc_node+0x69/0x170 mm/slab_common.c:1014 +kmalloc_node include/linux/slab.h:620 [inline] +kzalloc_node include/linux/slab.h:732 [inline] +blk_alloc_flush_queue+0x144/0x2f0 block/blk-flush.c:499 +blk_mq_alloc_hctx+0x601/0x940 block/blk-mq.c:3788 +blk_mq_alloc_and_init_hctx+0x27f/0x330 block/blk-mq.c:4261 +blk_mq_realloc_hw_ctxs+0x488/0x5e0 block/blk-mq.c:4294 +blk_mq_init_allocated_queue+0x188/0x860 block/blk-mq.c:4350 +blk_mq_init_queue_data block/blk-mq.c:4166 [inline] +blk_mq_init_queue+0x8d/0x100 block/blk-mq.c:4176 +scsi_alloc_sdev+0x843/0xd50 drivers/scsi/scsi_scan.c:335 +scsi_probe_and_add_lun+0x77c/0xde0 drivers/scsi/scsi_scan.c:1189 +__scsi_scan_target+0x1fc/0x5a0 drivers/scsi/scsi_scan.c:1727 +scsi_scan_channel drivers/scsi/scsi_scan.c:1815 [inline] +scsi_scan_channel+0x14b/0x1e0 drivers/scsi/scsi_scan.c:1791 +scsi_scan_host_selected+0x2fe/0x400 drivers/scsi/scsi_scan.c:1844 +scsi_scan+0x3a0/0x3f0 drivers/scsi/scsi_sysfs.c:151 +store_scan+0x2a/0x60 drivers/scsi/scsi_sysfs.c:191 +dev_attr_store+0x5c/0x90 drivers/base/core.c:2388 +sysfs_kf_write+0x11c/0x170 fs/sysfs/file.c:136 +kernfs_fop_write_iter+0x3fc/0x610 fs/kernfs/file.c:338 +call_write_iter include/linux/fs.h:2083 [inline] +new_sync_write+0x1b4/0x2d0 fs/read_write.c:493 +vfs_write+0x76c/0xb00 fs/read_write.c:586 +ksys_write+0x127/0x250 fs/read_write.c:639 +do_syscall_x64 arch/x86/entry/common.c:51 [inline] +do_syscall_64+0x70/0x120 arch/x86/entry/common.c:81 +entry_SYSCALL_64_after_hwframe+0x78/0xe2 + +Freed by task 244687: +kasan_save_stack+0x22/0x50 mm/kasan/common.c:45 +kasan_set_track+0x25/0x30 mm/kasan/common.c:52 +kasan_save_free_info+0x2b/0x50 mm/kasan/generic.c:522 +____kasan_slab_free mm/kasan/common.c:236 [inline] +__kasan_slab_free+0x12a/0x1b0 mm/kasan/common.c:244 +kasan_slab_free include/linux/kasan.h:164 [inline] +slab_free_hook mm/slub.c:1815 [inline] +slab_free_freelist_hook mm/slub.c:1841 [inline] +slab_free mm/slub.c:3807 [inline] +__kmem_cache_free+0xe4/0x520 mm/slub.c:3820 +blk_free_flush_queue+0x40/0x60 block/blk-flush.c:520 +blk_mq_hw_sysfs_release+0x4a/0x170 block/blk-mq-sysfs.c:37 +kobject_cleanup+0x136/0x410 lib/kobject.c:689 +kobject_release lib/kobject.c:720 [inline] +kref_put include/linux/kref.h:65 [inline] +kobject_put+0x119/0x140 lib/kobject.c:737 +blk_mq_release+0x24f/0x3f0 block/blk-mq.c:4144 +blk_free_queue block/blk-core.c:298 [inline] +blk_put_queue+0xe2/0x180 block/blk-core.c:314 +blkg_free_workfn+0x376/0x6e0 block/blk-cgroup.c:144 +process_one_work+0x7c4/0x1450 kernel/workqueue.c:2631 +process_scheduled_works kernel/workqueue.c:2704 [inline] +worker_thread+0x804/0xe40 kernel/workqueue.c:2785 +kthread+0x346/0x450 kernel/kthread.c:388 +ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 +ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:293 + +Other than blk_mq_clear_flush_rq_mapping(), the flag is only used in +blk_register_queue() from initialization path, hence it's safe not to +clear the flag in del_gendisk. And since QUEUE_FLAG_REGISTERED already +make sure that queue should only be registered once, there is no need +to test the flag as well. + +Fixes: 6cfeadbff3f8 ("blk-mq: don't clear flush_rq from tags->rqs[]") +Depends-on: commit aec89dc5d421 ("block: keep q_usage_counter in atomic mode after del_gendisk") +Signed-off-by: Yu Kuai +Reviewed-by: Ming Lei +Link: https://lore.kernel.org/r/20241104110005.1412161-1-yukuai1@huaweicloud.com +Signed-off-by: Jens Axboe +Signed-off-by: Sasha Levin +--- + block/blk-sysfs.c | 6 ++---- + block/genhd.c | 9 +++------ + 2 files changed, 5 insertions(+), 10 deletions(-) + +diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c +index e85941bec857b..207577145c54f 100644 +--- a/block/blk-sysfs.c ++++ b/block/blk-sysfs.c +@@ -794,10 +794,8 @@ int blk_register_queue(struct gendisk *disk) + * faster to shut down and is made fully functional here as + * request_queues for non-existent devices never get registered. + */ +- if (!blk_queue_init_done(q)) { +- blk_queue_flag_set(QUEUE_FLAG_INIT_DONE, q); +- percpu_ref_switch_to_percpu(&q->q_usage_counter); +- } ++ blk_queue_flag_set(QUEUE_FLAG_INIT_DONE, q); ++ percpu_ref_switch_to_percpu(&q->q_usage_counter); + + return ret; + +diff --git a/block/genhd.c b/block/genhd.c +index 6ad3fcde01105..8645cf3b0816e 100644 +--- a/block/genhd.c ++++ b/block/genhd.c +@@ -722,13 +722,10 @@ void del_gendisk(struct gendisk *disk) + * If the disk does not own the queue, allow using passthrough requests + * again. Else leave the queue frozen to fail all I/O. + */ +- if (!test_bit(GD_OWNS_QUEUE, &disk->state)) { +- blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q); ++ if (!test_bit(GD_OWNS_QUEUE, &disk->state)) + __blk_mq_unfreeze_queue(q, true); +- } else { +- if (queue_is_mq(q)) +- blk_mq_exit_queue(q); +- } ++ else if (queue_is_mq(q)) ++ blk_mq_exit_queue(q); + + if (start_drain) + blk_unfreeze_release_lock(q, true, queue_dying); +-- +2.43.0 + diff --git a/queue-6.11/block-model-freeze-enter-queue-as-lock-for-supportin.patch b/queue-6.11/block-model-freeze-enter-queue-as-lock-for-supportin.patch new file mode 100644 index 00000000000..33e351494a7 --- /dev/null +++ b/queue-6.11/block-model-freeze-enter-queue-as-lock-for-supportin.patch @@ -0,0 +1,364 @@ +From 771fd93eea875e74016dc77af82715eaf0f6ae20 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 25 Oct 2024 08:37:20 +0800 +Subject: block: model freeze & enter queue as lock for supporting lockdep + +From: Ming Lei + +[ Upstream commit f1be1788a32e8fa63416ad4518bbd1a85a825c9d ] + +Recently we got several deadlock report[1][2][3] caused by +blk_mq_freeze_queue and blk_enter_queue(). + +Turns out the two are just like acquiring read/write lock, so model them +as read/write lock for supporting lockdep: + +1) model q->q_usage_counter as two locks(io and queue lock) + +- queue lock covers sync with blk_enter_queue() + +- io lock covers sync with bio_enter_queue() + +2) make the lockdep class/key as per-queue: + +- different subsystem has very different lock use pattern, shared lock + class causes false positive easily + +- freeze_queue degrades to no lock in case that disk state becomes DEAD + because bio_enter_queue() won't be blocked any more + +- freeze_queue degrades to no lock in case that request queue becomes dying + because blk_enter_queue() won't be blocked any more + +3) model blk_mq_freeze_queue() as acquire_exclusive & try_lock +- it is exclusive lock, so dependency with blk_enter_queue() is covered + +- it is trylock because blk_mq_freeze_queue() are allowed to run + concurrently + +4) model blk_enter_queue() & bio_enter_queue() as acquire_read() +- nested blk_enter_queue() are allowed + +- dependency with blk_mq_freeze_queue() is covered + +- blk_queue_exit() is often called from other contexts(such as irq), and +it can't be annotated as lock_release(), so simply do it in +blk_enter_queue(), this way still covered cases as many as possible + +With lockdep support, such kind of reports may be reported asap and +needn't wait until the real deadlock is triggered. + +For example, lockdep report can be triggered in the report[3] with this +patch applied. + +[1] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler' +https://bugzilla.kernel.org/show_bug.cgi?id=219166 + +[2] del_gendisk() vs blk_queue_enter() race condition +https://lore.kernel.org/linux-block/20241003085610.GK11458@google.com/ + +[3] queue_freeze & queue_enter deadlock in scsi +https://lore.kernel.org/linux-block/ZxG38G9BuFdBpBHZ@fedora/T/#u + +Reviewed-by: Christoph Hellwig +Signed-off-by: Ming Lei +Link: https://lore.kernel.org/r/20241025003722.3630252-4-ming.lei@redhat.com +Signed-off-by: Jens Axboe +Stable-dep-of: 3802f73bd807 ("block: fix uaf for flush rq while iterating tags") +Signed-off-by: Sasha Levin +--- + block/blk-core.c | 18 ++++++++++++++++-- + block/blk-mq.c | 26 ++++++++++++++++++++++---- + block/blk.h | 29 ++++++++++++++++++++++++++--- + block/genhd.c | 15 +++++++++++---- + include/linux/blkdev.h | 6 ++++++ + 5 files changed, 81 insertions(+), 13 deletions(-) + +diff --git a/block/blk-core.c b/block/blk-core.c +index bc5e8c5eaac9f..09d10bb95fda0 100644 +--- a/block/blk-core.c ++++ b/block/blk-core.c +@@ -261,6 +261,8 @@ static void blk_free_queue(struct request_queue *q) + blk_mq_release(q); + + ida_free(&blk_queue_ida, q->id); ++ lockdep_unregister_key(&q->io_lock_cls_key); ++ lockdep_unregister_key(&q->q_lock_cls_key); + call_rcu(&q->rcu_head, blk_free_queue_rcu); + } + +@@ -278,18 +280,20 @@ void blk_put_queue(struct request_queue *q) + } + EXPORT_SYMBOL(blk_put_queue); + +-void blk_queue_start_drain(struct request_queue *q) ++bool blk_queue_start_drain(struct request_queue *q) + { + /* + * When queue DYING flag is set, we need to block new req + * entering queue, so we call blk_freeze_queue_start() to + * prevent I/O from crossing blk_queue_enter(). + */ +- blk_freeze_queue_start(q); ++ bool freeze = __blk_freeze_queue_start(q); + if (queue_is_mq(q)) + blk_mq_wake_waiters(q); + /* Make blk_queue_enter() reexamine the DYING flag. */ + wake_up_all(&q->mq_freeze_wq); ++ ++ return freeze; + } + + /** +@@ -321,6 +325,8 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) + return -ENODEV; + } + ++ rwsem_acquire_read(&q->q_lockdep_map, 0, 0, _RET_IP_); ++ rwsem_release(&q->q_lockdep_map, _RET_IP_); + return 0; + } + +@@ -352,6 +358,8 @@ int __bio_queue_enter(struct request_queue *q, struct bio *bio) + goto dead; + } + ++ rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_); ++ rwsem_release(&q->io_lockdep_map, _RET_IP_); + return 0; + dead: + bio_io_error(bio); +@@ -441,6 +449,12 @@ struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id) + PERCPU_REF_INIT_ATOMIC, GFP_KERNEL); + if (error) + goto fail_stats; ++ lockdep_register_key(&q->io_lock_cls_key); ++ lockdep_register_key(&q->q_lock_cls_key); ++ lockdep_init_map(&q->io_lockdep_map, "&q->q_usage_counter(io)", ++ &q->io_lock_cls_key, 0); ++ lockdep_init_map(&q->q_lockdep_map, "&q->q_usage_counter(queue)", ++ &q->q_lock_cls_key, 0); + + q->nr_requests = BLKDEV_DEFAULT_RQ; + +diff --git a/block/blk-mq.c b/block/blk-mq.c +index a2c40a97328b6..46e1cd2d1be8d 100644 +--- a/block/blk-mq.c ++++ b/block/blk-mq.c +@@ -120,17 +120,29 @@ void blk_mq_in_flight_rw(struct request_queue *q, struct block_device *part, + inflight[1] = mi.inflight[1]; + } + +-void blk_freeze_queue_start(struct request_queue *q) ++bool __blk_freeze_queue_start(struct request_queue *q) + { ++ int freeze; ++ + mutex_lock(&q->mq_freeze_lock); + if (++q->mq_freeze_depth == 1) { + percpu_ref_kill(&q->q_usage_counter); + mutex_unlock(&q->mq_freeze_lock); + if (queue_is_mq(q)) + blk_mq_run_hw_queues(q, false); ++ freeze = true; + } else { + mutex_unlock(&q->mq_freeze_lock); ++ freeze = false; + } ++ ++ return freeze; ++} ++ ++void blk_freeze_queue_start(struct request_queue *q) ++{ ++ if (__blk_freeze_queue_start(q)) ++ blk_freeze_acquire_lock(q, false, false); + } + EXPORT_SYMBOL_GPL(blk_freeze_queue_start); + +@@ -176,8 +188,10 @@ void blk_mq_freeze_queue(struct request_queue *q) + } + EXPORT_SYMBOL_GPL(blk_mq_freeze_queue); + +-void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic) ++bool __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic) + { ++ int unfreeze = false; ++ + mutex_lock(&q->mq_freeze_lock); + if (force_atomic) + q->q_usage_counter.data->force_atomic = true; +@@ -186,13 +200,17 @@ void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic) + if (!q->mq_freeze_depth) { + percpu_ref_resurrect(&q->q_usage_counter); + wake_up_all(&q->mq_freeze_wq); ++ unfreeze = true; + } + mutex_unlock(&q->mq_freeze_lock); ++ ++ return unfreeze; + } + + void blk_mq_unfreeze_queue(struct request_queue *q) + { +- __blk_mq_unfreeze_queue(q, false); ++ if (__blk_mq_unfreeze_queue(q, false)) ++ blk_unfreeze_release_lock(q, false, false); + } + EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); + +@@ -205,7 +223,7 @@ EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); + */ + void blk_freeze_queue_start_non_owner(struct request_queue *q) + { +- blk_freeze_queue_start(q); ++ __blk_freeze_queue_start(q); + } + EXPORT_SYMBOL_GPL(blk_freeze_queue_start_non_owner); + +diff --git a/block/blk.h b/block/blk.h +index 61c2afa67daab..42f1d31d4649a 100644 +--- a/block/blk.h ++++ b/block/blk.h +@@ -4,6 +4,7 @@ + + #include + #include ++#include + #include /* for max_pfn/max_low_pfn */ + #include + #include +@@ -35,8 +36,9 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size, + void blk_free_flush_queue(struct blk_flush_queue *q); + + void blk_freeze_queue(struct request_queue *q); +-void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic); +-void blk_queue_start_drain(struct request_queue *q); ++bool __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic); ++bool blk_queue_start_drain(struct request_queue *q); ++bool __blk_freeze_queue_start(struct request_queue *q); + int __bio_queue_enter(struct request_queue *q, struct bio *bio); + void submit_bio_noacct_nocheck(struct bio *bio); + void bio_await_chain(struct bio *bio); +@@ -69,8 +71,11 @@ static inline int bio_queue_enter(struct bio *bio) + { + struct request_queue *q = bdev_get_queue(bio->bi_bdev); + +- if (blk_try_enter_queue(q, false)) ++ if (blk_try_enter_queue(q, false)) { ++ rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_); ++ rwsem_release(&q->io_lockdep_map, _RET_IP_); + return 0; ++ } + return __bio_queue_enter(q, bio); + } + +@@ -724,4 +729,22 @@ void blk_integrity_verify(struct bio *bio); + void blk_integrity_prepare(struct request *rq); + void blk_integrity_complete(struct request *rq, unsigned int nr_bytes); + ++static inline void blk_freeze_acquire_lock(struct request_queue *q, bool ++ disk_dead, bool queue_dying) ++{ ++ if (!disk_dead) ++ rwsem_acquire(&q->io_lockdep_map, 0, 1, _RET_IP_); ++ if (!queue_dying) ++ rwsem_acquire(&q->q_lockdep_map, 0, 1, _RET_IP_); ++} ++ ++static inline void blk_unfreeze_release_lock(struct request_queue *q, bool ++ disk_dead, bool queue_dying) ++{ ++ if (!queue_dying) ++ rwsem_release(&q->q_lockdep_map, _RET_IP_); ++ if (!disk_dead) ++ rwsem_release(&q->io_lockdep_map, _RET_IP_); ++} ++ + #endif /* BLK_INTERNAL_H */ +diff --git a/block/genhd.c b/block/genhd.c +index 1c05dd4c6980b..6ad3fcde01105 100644 +--- a/block/genhd.c ++++ b/block/genhd.c +@@ -581,13 +581,13 @@ static void blk_report_disk_dead(struct gendisk *disk, bool surprise) + rcu_read_unlock(); + } + +-static void __blk_mark_disk_dead(struct gendisk *disk) ++static bool __blk_mark_disk_dead(struct gendisk *disk) + { + /* + * Fail any new I/O. + */ + if (test_and_set_bit(GD_DEAD, &disk->state)) +- return; ++ return false; + + if (test_bit(GD_OWNS_QUEUE, &disk->state)) + blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue); +@@ -600,7 +600,7 @@ static void __blk_mark_disk_dead(struct gendisk *disk) + /* + * Prevent new I/O from crossing bio_queue_enter(). + */ +- blk_queue_start_drain(disk->queue); ++ return blk_queue_start_drain(disk->queue); + } + + /** +@@ -641,6 +641,7 @@ void del_gendisk(struct gendisk *disk) + struct request_queue *q = disk->queue; + struct block_device *part; + unsigned long idx; ++ bool start_drain, queue_dying; + + might_sleep(); + +@@ -668,7 +669,10 @@ void del_gendisk(struct gendisk *disk) + * Drop all partitions now that the disk is marked dead. + */ + mutex_lock(&disk->open_mutex); +- __blk_mark_disk_dead(disk); ++ start_drain = __blk_mark_disk_dead(disk); ++ queue_dying = blk_queue_dying(q); ++ if (start_drain) ++ blk_freeze_acquire_lock(q, true, queue_dying); + xa_for_each_start(&disk->part_tbl, idx, part, 1) + drop_partition(part); + mutex_unlock(&disk->open_mutex); +@@ -725,6 +729,9 @@ void del_gendisk(struct gendisk *disk) + if (queue_is_mq(q)) + blk_mq_exit_queue(q); + } ++ ++ if (start_drain) ++ blk_unfreeze_release_lock(q, true, queue_dying); + } + EXPORT_SYMBOL(del_gendisk); + +diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h +index 643c9020a35a6..a6e42a823b71e 100644 +--- a/include/linux/blkdev.h ++++ b/include/linux/blkdev.h +@@ -25,6 +25,7 @@ + #include + #include + #include ++#include + + struct module; + struct request_queue; +@@ -471,6 +472,11 @@ struct request_queue { + struct xarray hctx_table; + + struct percpu_ref q_usage_counter; ++ struct lock_class_key io_lock_cls_key; ++ struct lockdep_map io_lockdep_map; ++ ++ struct lock_class_key q_lock_cls_key; ++ struct lockdep_map q_lockdep_map; + + struct request *last_merge; + +-- +2.43.0 + diff --git a/queue-6.11/block-return-unsigned-int-from-bdev_io_min.patch b/queue-6.11/block-return-unsigned-int-from-bdev_io_min.patch new file mode 100644 index 00000000000..fd076a7062b --- /dev/null +++ b/queue-6.11/block-return-unsigned-int-from-bdev_io_min.patch @@ -0,0 +1,39 @@ +From f00007c531be915531b9989e37609d3a3a965526 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 19 Nov 2024 08:26:02 +0100 +Subject: block: return unsigned int from bdev_io_min + +From: Christoph Hellwig + +[ Upstream commit 46fd48ab3ea3eb3bb215684bd66ea3d260b091a9 ] + +The underlying limit is defined as an unsigned int, so return that from +bdev_io_min as well. + +Fixes: ac481c20ef8f ("block: Topology ioctls") +Signed-off-by: Christoph Hellwig +Reviewed-by: Martin K. Petersen +Reviewed-by: John Garry +Link: https://lore.kernel.org/r/20241119072602.1059488-1-hch@lst.de +Signed-off-by: Jens Axboe +Signed-off-by: Sasha Levin +--- + include/linux/blkdev.h | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h +index a6e42a823b71e..7e35d9ebdc374 100644 +--- a/include/linux/blkdev.h ++++ b/include/linux/blkdev.h +@@ -1255,7 +1255,7 @@ static inline unsigned int queue_io_min(const struct request_queue *q) + return q->limits.io_min; + } + +-static inline int bdev_io_min(struct block_device *bdev) ++static inline unsigned int bdev_io_min(struct block_device *bdev) + { + return queue_io_min(bdev_get_queue(bdev)); + } +-- +2.43.0 + diff --git a/queue-6.11/brd-decrease-the-number-of-allocated-pages-which-dis.patch b/queue-6.11/brd-decrease-the-number-of-allocated-pages-which-dis.patch new file mode 100644 index 00000000000..71930344412 --- /dev/null +++ b/queue-6.11/brd-decrease-the-number-of-allocated-pages-which-dis.patch @@ -0,0 +1,42 @@ +From d0d47046659853d23506113d376dcb1cbad52152 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 28 Nov 2024 17:00:56 +0800 +Subject: brd: decrease the number of allocated pages which discarded + +From: Zhang Xianwei + +[ Upstream commit 82734209bedd65a8b508844bab652b464379bfdd ] + +The number of allocated pages which discarded will not decrease. +Fix it. + +Fixes: 9ead7efc6f3f ("brd: implement discard support") + +Signed-off-by: Zhang Xianwei +Reviewed-by: Ming Lei +Link: https://lore.kernel.org/r/20241128170056565nPKSz2vsP8K8X2uk2iaDG@zte.com.cn +Signed-off-by: Jens Axboe +Signed-off-by: Sasha Levin +--- + drivers/block/brd.c | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +diff --git a/drivers/block/brd.c b/drivers/block/brd.c +index 5a95671d81515..292f127cae0ab 100644 +--- a/drivers/block/brd.c ++++ b/drivers/block/brd.c +@@ -231,8 +231,10 @@ static void brd_do_discard(struct brd_device *brd, sector_t sector, u32 size) + xa_lock(&brd->brd_pages); + while (size >= PAGE_SIZE && aligned_sector < rd_size * 2) { + page = __xa_erase(&brd->brd_pages, aligned_sector >> PAGE_SECTORS_SHIFT); +- if (page) ++ if (page) { + __free_page(page); ++ brd->brd_nr_pages--; ++ } + aligned_sector += PAGE_SECTORS; + size -= PAGE_SIZE; + } +-- +2.43.0 + diff --git a/queue-6.11/cifs-during-remount-make-sure-passwords-are-in-sync.patch b/queue-6.11/cifs-during-remount-make-sure-passwords-are-in-sync.patch new file mode 100644 index 00000000000..e7cd75df2c8 --- /dev/null +++ b/queue-6.11/cifs-during-remount-make-sure-passwords-are-in-sync.patch @@ -0,0 +1,166 @@ +From 716269cdb05c455df38e106e1ebfb9a2af08d93a Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 30 Oct 2024 06:45:50 +0000 +Subject: cifs: during remount, make sure passwords are in sync + +From: Shyam Prasad N + +[ Upstream commit 0f0e357902957fba28ed31bde0d6921c6bd1485d ] + +This fixes scenarios where remount can overwrite the only currently +working password, breaking reconnect. + +We recently introduced a password2 field in both ses and ctx structs. +This was done so as to allow the client to rotate passwords for a mount +without any downtime. However, when the client transparently handles +password rotation, it can swap the values of the two password fields +in the ses struct, but not in smb3_fs_context struct that hangs off +cifs_sb. This can lead to a situation where a remount unintentionally +overwrites a working password in the ses struct. + +In order to fix this, we first get the passwords in ctx struct +in-sync with ses struct, before replacing them with what the passwords +that could be passed as a part of remount. + +Also, in order to avoid race condition between smb2_reconnect and +smb3_reconfigure, we make sure to lock session_mutex before changing +password and password2 fields of the ses structure. + +Fixes: 35f834265e0d ("smb3: fix broken reconnect when password changing on the server by allowing password rotation") +Signed-off-by: Shyam Prasad N +Signed-off-by: Meetakshi Setiya +Signed-off-by: Steve French +Signed-off-by: Sasha Levin +--- + fs/smb/client/fs_context.c | 83 +++++++++++++++++++++++++++++++++----- + fs/smb/client/fs_context.h | 1 + + 2 files changed, 75 insertions(+), 9 deletions(-) + +diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c +index 4069b69fbc7e0..e84660b48d533 100644 +--- a/fs/smb/client/fs_context.c ++++ b/fs/smb/client/fs_context.c +@@ -890,12 +890,37 @@ do { \ + cifs_sb->ctx->field = NULL; \ + } while (0) + ++int smb3_sync_session_ctx_passwords(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses) ++{ ++ if (ses->password && ++ cifs_sb->ctx->password && ++ strcmp(ses->password, cifs_sb->ctx->password)) { ++ kfree_sensitive(cifs_sb->ctx->password); ++ cifs_sb->ctx->password = kstrdup(ses->password, GFP_KERNEL); ++ if (!cifs_sb->ctx->password) ++ return -ENOMEM; ++ } ++ if (ses->password2 && ++ cifs_sb->ctx->password2 && ++ strcmp(ses->password2, cifs_sb->ctx->password2)) { ++ kfree_sensitive(cifs_sb->ctx->password2); ++ cifs_sb->ctx->password2 = kstrdup(ses->password2, GFP_KERNEL); ++ if (!cifs_sb->ctx->password2) { ++ kfree_sensitive(cifs_sb->ctx->password); ++ cifs_sb->ctx->password = NULL; ++ return -ENOMEM; ++ } ++ } ++ return 0; ++} ++ + static int smb3_reconfigure(struct fs_context *fc) + { + struct smb3_fs_context *ctx = smb3_fc2context(fc); + struct dentry *root = fc->root; + struct cifs_sb_info *cifs_sb = CIFS_SB(root->d_sb); + struct cifs_ses *ses = cifs_sb_master_tcon(cifs_sb)->ses; ++ char *new_password = NULL, *new_password2 = NULL; + bool need_recon = false; + int rc; + +@@ -915,21 +940,61 @@ static int smb3_reconfigure(struct fs_context *fc) + STEAL_STRING(cifs_sb, ctx, UNC); + STEAL_STRING(cifs_sb, ctx, source); + STEAL_STRING(cifs_sb, ctx, username); ++ + if (need_recon == false) + STEAL_STRING_SENSITIVE(cifs_sb, ctx, password); + else { +- kfree_sensitive(ses->password); +- ses->password = kstrdup(ctx->password, GFP_KERNEL); +- if (!ses->password) +- return -ENOMEM; +- kfree_sensitive(ses->password2); +- ses->password2 = kstrdup(ctx->password2, GFP_KERNEL); +- if (!ses->password2) { +- kfree_sensitive(ses->password); +- ses->password = NULL; ++ if (ctx->password) { ++ new_password = kstrdup(ctx->password, GFP_KERNEL); ++ if (!new_password) ++ return -ENOMEM; ++ } else ++ STEAL_STRING_SENSITIVE(cifs_sb, ctx, password); ++ } ++ ++ /* ++ * if a new password2 has been specified, then reset it's value ++ * inside the ses struct ++ */ ++ if (ctx->password2) { ++ new_password2 = kstrdup(ctx->password2, GFP_KERNEL); ++ if (!new_password2) { ++ kfree_sensitive(new_password); + return -ENOMEM; + } ++ } else ++ STEAL_STRING_SENSITIVE(cifs_sb, ctx, password2); ++ ++ /* ++ * we may update the passwords in the ses struct below. Make sure we do ++ * not race with smb2_reconnect ++ */ ++ mutex_lock(&ses->session_mutex); ++ ++ /* ++ * smb2_reconnect may swap password and password2 in case session setup ++ * failed. First get ctx passwords in sync with ses passwords. It should ++ * be okay to do this even if this function were to return an error at a ++ * later stage ++ */ ++ rc = smb3_sync_session_ctx_passwords(cifs_sb, ses); ++ if (rc) ++ return rc; ++ ++ /* ++ * now that allocations for passwords are done, commit them ++ */ ++ if (new_password) { ++ kfree_sensitive(ses->password); ++ ses->password = new_password; + } ++ if (new_password2) { ++ kfree_sensitive(ses->password2); ++ ses->password2 = new_password2; ++ } ++ ++ mutex_unlock(&ses->session_mutex); ++ + STEAL_STRING(cifs_sb, ctx, domainname); + STEAL_STRING(cifs_sb, ctx, nodename); + STEAL_STRING(cifs_sb, ctx, iocharset); +diff --git a/fs/smb/client/fs_context.h b/fs/smb/client/fs_context.h +index cf577ec0dd0ac..bbd2063ab838d 100644 +--- a/fs/smb/client/fs_context.h ++++ b/fs/smb/client/fs_context.h +@@ -298,6 +298,7 @@ static inline struct smb3_fs_context *smb3_fc2context(const struct fs_context *f + } + + extern int smb3_fs_context_dup(struct smb3_fs_context *new_ctx, struct smb3_fs_context *ctx); ++extern int smb3_sync_session_ctx_passwords(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses); + extern void smb3_update_mnt_flags(struct cifs_sb_info *cifs_sb); + + /* +-- +2.43.0 + diff --git a/queue-6.11/cifs-fix-parsing-native-symlinks-relative-to-the-exp.patch b/queue-6.11/cifs-fix-parsing-native-symlinks-relative-to-the-exp.patch new file mode 100644 index 00000000000..c661c4362f6 --- /dev/null +++ b/queue-6.11/cifs-fix-parsing-native-symlinks-relative-to-the-exp.patch @@ -0,0 +1,383 @@ +From e182f74d29fc1a1d85cf77c78edcc6b9ad4f8e48 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 23 Sep 2024 22:40:38 +0200 +Subject: cifs: Fix parsing native symlinks relative to the export +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Pali Rohár + +[ Upstream commit 723f4ef90452aa629f3d923e92e0449d69362b1d ] + +SMB symlink which has SYMLINK_FLAG_RELATIVE set is relative (as opposite of +the absolute) and it can be relative either to the current directory (where +is the symlink stored) or relative to the top level export path. To what it +is relative depends on the first character of the symlink target path. + +If the first character is path separator then symlink is relative to the +export, otherwise to the current directory. Linux (and generally POSIX +systems) supports only symlink paths relative to the current directory +where is symlink stored. + +Currently if Linux SMB client reads relative SMB symlink with first +character as path separator (slash), it let as is. Which means that Linux +interpret it as absolute symlink pointing from the root (/). But this +location is different than the top level directory of SMB export (unless +SMB export was mounted to the root) and thefore SMB symlinks relative to +the export are interpreted wrongly by Linux SMB client. + +Fix this problem. As Linux does not have equivalent of the path relative to +the top of the mount point, convert such symlink target path relative to +the current directory. Do this by prepending "../" pattern N times before +the SMB target path, where N is the number of path separators found in SMB +symlink path. + +So for example, if SMB share is mounted to Linux path /mnt/share/, symlink +is stored in file /mnt/share/test/folder1/symlink (so SMB symlink path is +test\folder1\symlink) and SMB symlink target points to \test\folder2\file, +then convert symlink target path to Linux path ../../test/folder2/file. + +Deduplicate code for parsing SMB symlinks in native form from functions +smb2_parse_symlink_response() and parse_reparse_native_symlink() into new +function smb2_parse_native_symlink() and pass into this new function a new +full_path parameter from callers, which specify SMB full path where is +symlink stored. + +This change fixes resolving of the native Windows symlinks relative to the +top level directory of the SMB share. + +Signed-off-by: Pali Rohár +Signed-off-by: Steve French +Stable-dep-of: f4ca4f5a36ea ("cifs: Fix parsing reparse point with native symlink in SMB1 non-UNICODE session") +Signed-off-by: Sasha Levin +--- + fs/smb/client/cifsglob.h | 1 + + fs/smb/client/cifsproto.h | 1 + + fs/smb/client/inode.c | 1 + + fs/smb/client/reparse.c | 90 +++++++++++++++++++++++++++++++++------ + fs/smb/client/reparse.h | 4 +- + fs/smb/client/smb1ops.c | 3 +- + fs/smb/client/smb2file.c | 21 +++++---- + fs/smb/client/smb2inode.c | 6 ++- + fs/smb/client/smb2proto.h | 9 +++- + 9 files changed, 108 insertions(+), 28 deletions(-) + +diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h +index 1dfedb64ffcbc..64610236cc725 100644 +--- a/fs/smb/client/cifsglob.h ++++ b/fs/smb/client/cifsglob.h +@@ -589,6 +589,7 @@ struct smb_version_operations { + /* Check for STATUS_NETWORK_NAME_DELETED */ + bool (*is_network_name_deleted)(char *buf, struct TCP_Server_Info *srv); + int (*parse_reparse_point)(struct cifs_sb_info *cifs_sb, ++ const char *full_path, + struct kvec *rsp_iov, + struct cifs_open_info_data *data); + int (*create_reparse_symlink)(const unsigned int xid, +diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h +index 497bf3c447bcb..8d35a5cab39e3 100644 +--- a/fs/smb/client/cifsproto.h ++++ b/fs/smb/client/cifsproto.h +@@ -675,6 +675,7 @@ char *extract_hostname(const char *unc); + char *extract_sharename(const char *unc); + int parse_reparse_point(struct reparse_data_buffer *buf, + u32 plen, struct cifs_sb_info *cifs_sb, ++ const char *full_path, + bool unicode, struct cifs_open_info_data *data); + int cifs_sfu_make_node(unsigned int xid, struct inode *inode, + struct dentry *dentry, struct cifs_tcon *tcon, +diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c +index ede36884be8ae..200936773a956 100644 +--- a/fs/smb/client/inode.c ++++ b/fs/smb/client/inode.c +@@ -1075,6 +1075,7 @@ static int reparse_info_to_fattr(struct cifs_open_info_data *data, + rc = 0; + } else if (iov && server->ops->parse_reparse_point) { + rc = server->ops->parse_reparse_point(cifs_sb, ++ full_path, + iov, data); + } + break; +diff --git a/fs/smb/client/reparse.c b/fs/smb/client/reparse.c +index 90da1e2b6217b..f74d0a86f44a4 100644 +--- a/fs/smb/client/reparse.c ++++ b/fs/smb/client/reparse.c +@@ -535,9 +535,76 @@ static int parse_reparse_posix(struct reparse_posix_data *buf, + return 0; + } + ++int smb2_parse_native_symlink(char **target, const char *buf, unsigned int len, ++ bool unicode, bool relative, ++ const char *full_path, ++ struct cifs_sb_info *cifs_sb) ++{ ++ char sep = CIFS_DIR_SEP(cifs_sb); ++ char *linux_target = NULL; ++ char *smb_target = NULL; ++ int levels; ++ int rc; ++ int i; ++ ++ smb_target = cifs_strndup_from_utf16(buf, len, unicode, cifs_sb->local_nls); ++ if (!smb_target) { ++ rc = -ENOMEM; ++ goto out; ++ } ++ ++ if (smb_target[0] == sep && relative) { ++ /* ++ * This is a relative SMB symlink from the top of the share, ++ * which is the top level directory of the Linux mount point. ++ * Linux does not support such relative symlinks, so convert ++ * it to the relative symlink from the current directory. ++ * full_path is the SMB path to the symlink (from which is ++ * extracted current directory) and smb_target is the SMB path ++ * where symlink points, therefore full_path must always be on ++ * the SMB share. ++ */ ++ int smb_target_len = strlen(smb_target)+1; ++ levels = 0; ++ for (i = 1; full_path[i]; i++) { /* i=1 to skip leading sep */ ++ if (full_path[i] == sep) ++ levels++; ++ } ++ linux_target = kmalloc(levels*3 + smb_target_len, GFP_KERNEL); ++ if (!linux_target) { ++ rc = -ENOMEM; ++ goto out; ++ } ++ for (i = 0; i < levels; i++) { ++ linux_target[i*3 + 0] = '.'; ++ linux_target[i*3 + 1] = '.'; ++ linux_target[i*3 + 2] = sep; ++ } ++ memcpy(linux_target + levels*3, smb_target+1, smb_target_len); /* +1 to skip leading sep */ ++ } else { ++ linux_target = smb_target; ++ smb_target = NULL; ++ } ++ ++ if (sep == '\\') ++ convert_delimiter(linux_target, '/'); ++ ++ rc = 0; ++ *target = linux_target; ++ ++ cifs_dbg(FYI, "%s: symlink target: %s\n", __func__, *target); ++ ++out: ++ if (rc != 0) ++ kfree(linux_target); ++ kfree(smb_target); ++ return rc; ++} ++ + static int parse_reparse_symlink(struct reparse_symlink_data_buffer *sym, + u32 plen, bool unicode, + struct cifs_sb_info *cifs_sb, ++ const char *full_path, + struct cifs_open_info_data *data) + { + unsigned int len; +@@ -552,20 +619,18 @@ static int parse_reparse_symlink(struct reparse_symlink_data_buffer *sym, + return -EIO; + } + +- data->symlink_target = cifs_strndup_from_utf16(sym->PathBuffer + offs, +- len, unicode, +- cifs_sb->local_nls); +- if (!data->symlink_target) +- return -ENOMEM; +- +- convert_delimiter(data->symlink_target, '/'); +- cifs_dbg(FYI, "%s: target path: %s\n", __func__, data->symlink_target); +- +- return 0; ++ return smb2_parse_native_symlink(&data->symlink_target, ++ sym->PathBuffer + offs, ++ len, ++ unicode, ++ le32_to_cpu(sym->Flags) & SYMLINK_FLAG_RELATIVE, ++ full_path, ++ cifs_sb); + } + + int parse_reparse_point(struct reparse_data_buffer *buf, + u32 plen, struct cifs_sb_info *cifs_sb, ++ const char *full_path, + bool unicode, struct cifs_open_info_data *data) + { + struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); +@@ -580,7 +645,7 @@ int parse_reparse_point(struct reparse_data_buffer *buf, + case IO_REPARSE_TAG_SYMLINK: + return parse_reparse_symlink( + (struct reparse_symlink_data_buffer *)buf, +- plen, unicode, cifs_sb, data); ++ plen, unicode, cifs_sb, full_path, data); + case IO_REPARSE_TAG_LX_SYMLINK: + case IO_REPARSE_TAG_AF_UNIX: + case IO_REPARSE_TAG_LX_FIFO: +@@ -596,6 +661,7 @@ int parse_reparse_point(struct reparse_data_buffer *buf, + } + + int smb2_parse_reparse_point(struct cifs_sb_info *cifs_sb, ++ const char *full_path, + struct kvec *rsp_iov, + struct cifs_open_info_data *data) + { +@@ -605,7 +671,7 @@ int smb2_parse_reparse_point(struct cifs_sb_info *cifs_sb, + + buf = (struct reparse_data_buffer *)((u8 *)io + + le32_to_cpu(io->OutputOffset)); +- return parse_reparse_point(buf, plen, cifs_sb, true, data); ++ return parse_reparse_point(buf, plen, cifs_sb, full_path, true, data); + } + + static void wsl_to_fattr(struct cifs_open_info_data *data, +diff --git a/fs/smb/client/reparse.h b/fs/smb/client/reparse.h +index 2a9f4f9f79de0..ff05b0e75c928 100644 +--- a/fs/smb/client/reparse.h ++++ b/fs/smb/client/reparse.h +@@ -117,7 +117,9 @@ int smb2_create_reparse_symlink(const unsigned int xid, struct inode *inode, + int smb2_mknod_reparse(unsigned int xid, struct inode *inode, + struct dentry *dentry, struct cifs_tcon *tcon, + const char *full_path, umode_t mode, dev_t dev); +-int smb2_parse_reparse_point(struct cifs_sb_info *cifs_sb, struct kvec *rsp_iov, ++int smb2_parse_reparse_point(struct cifs_sb_info *cifs_sb, ++ const char *full_path, ++ struct kvec *rsp_iov, + struct cifs_open_info_data *data); + + #endif /* _CIFS_REPARSE_H */ +diff --git a/fs/smb/client/smb1ops.c b/fs/smb/client/smb1ops.c +index 8c03250d85ae0..3c7f3c4b94c8d 100644 +--- a/fs/smb/client/smb1ops.c ++++ b/fs/smb/client/smb1ops.c +@@ -994,6 +994,7 @@ static int cifs_query_symlink(const unsigned int xid, + } + + static int cifs_parse_reparse_point(struct cifs_sb_info *cifs_sb, ++ const char *full_path, + struct kvec *rsp_iov, + struct cifs_open_info_data *data) + { +@@ -1004,7 +1005,7 @@ static int cifs_parse_reparse_point(struct cifs_sb_info *cifs_sb, + + buf = (struct reparse_data_buffer *)((__u8 *)&io->hdr.Protocol + + le32_to_cpu(io->DataOffset)); +- return parse_reparse_point(buf, plen, cifs_sb, unicode, data); ++ return parse_reparse_point(buf, plen, cifs_sb, full_path, unicode, data); + } + + static bool +diff --git a/fs/smb/client/smb2file.c b/fs/smb/client/smb2file.c +index c23478ab1cf85..dc52995f55910 100644 +--- a/fs/smb/client/smb2file.c ++++ b/fs/smb/client/smb2file.c +@@ -63,12 +63,12 @@ static struct smb2_symlink_err_rsp *symlink_data(const struct kvec *iov) + return sym; + } + +-int smb2_parse_symlink_response(struct cifs_sb_info *cifs_sb, const struct kvec *iov, char **path) ++int smb2_parse_symlink_response(struct cifs_sb_info *cifs_sb, const struct kvec *iov, ++ const char *full_path, char **path) + { + struct smb2_symlink_err_rsp *sym; + unsigned int sub_offs, sub_len; + unsigned int print_offs, print_len; +- char *s; + + if (!cifs_sb || !iov || !iov->iov_base || !iov->iov_len || !path) + return -EINVAL; +@@ -86,15 +86,13 @@ int smb2_parse_symlink_response(struct cifs_sb_info *cifs_sb, const struct kvec + iov->iov_len < SMB2_SYMLINK_STRUCT_SIZE + print_offs + print_len) + return -EINVAL; + +- s = cifs_strndup_from_utf16((char *)sym->PathBuffer + sub_offs, sub_len, true, +- cifs_sb->local_nls); +- if (!s) +- return -ENOMEM; +- convert_delimiter(s, '/'); +- cifs_dbg(FYI, "%s: symlink target: %s\n", __func__, s); +- +- *path = s; +- return 0; ++ return smb2_parse_native_symlink(path, ++ (char *)sym->PathBuffer + sub_offs, ++ sub_len, ++ true, ++ le32_to_cpu(sym->Flags) & SYMLINK_FLAG_RELATIVE, ++ full_path, ++ cifs_sb); + } + + int smb2_open_file(const unsigned int xid, struct cifs_open_parms *oparms, __u32 *oplock, void *buf) +@@ -126,6 +124,7 @@ int smb2_open_file(const unsigned int xid, struct cifs_open_parms *oparms, __u32 + goto out; + if (hdr->Status == STATUS_STOPPED_ON_SYMLINK) { + rc = smb2_parse_symlink_response(oparms->cifs_sb, &err_iov, ++ oparms->path, + &data->symlink_target); + if (!rc) { + memset(smb2_data, 0, sizeof(*smb2_data)); +diff --git a/fs/smb/client/smb2inode.c b/fs/smb/client/smb2inode.c +index cdb0e028e73c4..9a28a30ec1a34 100644 +--- a/fs/smb/client/smb2inode.c ++++ b/fs/smb/client/smb2inode.c +@@ -828,6 +828,7 @@ static int smb2_compound_op(const unsigned int xid, struct cifs_tcon *tcon, + + static int parse_create_response(struct cifs_open_info_data *data, + struct cifs_sb_info *cifs_sb, ++ const char *full_path, + const struct kvec *iov) + { + struct smb2_create_rsp *rsp = iov->iov_base; +@@ -841,6 +842,7 @@ static int parse_create_response(struct cifs_open_info_data *data, + break; + case STATUS_STOPPED_ON_SYMLINK: + rc = smb2_parse_symlink_response(cifs_sb, iov, ++ full_path, + &data->symlink_target); + if (rc) + return rc; +@@ -930,14 +932,14 @@ int smb2_query_path_info(const unsigned int xid, + + switch (rc) { + case 0: +- rc = parse_create_response(data, cifs_sb, &out_iov[0]); ++ rc = parse_create_response(data, cifs_sb, full_path, &out_iov[0]); + break; + case -EOPNOTSUPP: + /* + * BB TODO: When support for special files added to Samba + * re-verify this path. + */ +- rc = parse_create_response(data, cifs_sb, &out_iov[0]); ++ rc = parse_create_response(data, cifs_sb, full_path, &out_iov[0]); + if (rc || !data->reparse_point) + goto out; + +diff --git a/fs/smb/client/smb2proto.h b/fs/smb/client/smb2proto.h +index 5e0855fefcfe6..aa01ae234732a 100644 +--- a/fs/smb/client/smb2proto.h ++++ b/fs/smb/client/smb2proto.h +@@ -113,7 +113,14 @@ extern int smb3_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon, + struct cifs_sb_info *cifs_sb, + const unsigned char *path, char *pbuf, + unsigned int *pbytes_read); +-int smb2_parse_symlink_response(struct cifs_sb_info *cifs_sb, const struct kvec *iov, char **path); ++int smb2_parse_native_symlink(char **target, const char *buf, unsigned int len, ++ bool unicode, bool relative, ++ const char *full_path, ++ struct cifs_sb_info *cifs_sb); ++int smb2_parse_symlink_response(struct cifs_sb_info *cifs_sb, ++ const struct kvec *iov, ++ const char *full_path, ++ char **path); + int smb2_open_file(const unsigned int xid, struct cifs_open_parms *oparms, __u32 *oplock, + void *buf); + extern int smb2_unlock_range(struct cifsFileInfo *cfile, +-- +2.43.0 + diff --git a/queue-6.11/cifs-fix-parsing-reparse-point-with-native-symlink-i.patch b/queue-6.11/cifs-fix-parsing-reparse-point-with-native-symlink-i.patch new file mode 100644 index 00000000000..e8c1006ee9f --- /dev/null +++ b/queue-6.11/cifs-fix-parsing-reparse-point-with-native-symlink-i.patch @@ -0,0 +1,52 @@ +From 7184f68de0262526eb74d177e1b5e219d5bbaaa5 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Sun, 6 Oct 2024 19:30:01 +0200 +Subject: cifs: Fix parsing reparse point with native symlink in SMB1 + non-UNICODE session +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Pali Rohár + +[ Upstream commit f4ca4f5a36eac9b4da378a0f28cbbe38534a0901 ] + +SMB1 NT_TRANSACT_IOCTL/FSCTL_GET_REPARSE_POINT even in non-UNICODE mode +returns reparse buffer in UNICODE/UTF-16 format. + +This is because FSCTL_GET_REPARSE_POINT is NT-based IOCTL which does not +distinguish between 8-bit non-UNICODE and 16-bit UNICODE modes and its path +buffers are always encoded in UTF-16. + +This change fixes reading of native symlinks in SMB1 when UNICODE session +is not active. + +Fixes: ed3e0a149b58 ("smb: client: implement ->query_reparse_point() for SMB1") +Signed-off-by: Pali Rohár +Signed-off-by: Steve French +Signed-off-by: Sasha Levin +--- + fs/smb/client/smb1ops.c | 3 +-- + 1 file changed, 1 insertion(+), 2 deletions(-) + +diff --git a/fs/smb/client/smb1ops.c b/fs/smb/client/smb1ops.c +index 3c7f3c4b94c8d..c252447918d67 100644 +--- a/fs/smb/client/smb1ops.c ++++ b/fs/smb/client/smb1ops.c +@@ -1000,12 +1000,11 @@ static int cifs_parse_reparse_point(struct cifs_sb_info *cifs_sb, + { + struct reparse_data_buffer *buf; + TRANSACT_IOCTL_RSP *io = rsp_iov->iov_base; +- bool unicode = !!(io->hdr.Flags2 & SMBFLG2_UNICODE); + u32 plen = le16_to_cpu(io->ByteCount); + + buf = (struct reparse_data_buffer *)((__u8 *)&io->hdr.Protocol + + le32_to_cpu(io->DataOffset)); +- return parse_reparse_point(buf, plen, cifs_sb, full_path, unicode, data); ++ return parse_reparse_point(buf, plen, cifs_sb, full_path, true, data); + } + + static bool +-- +2.43.0 + diff --git a/queue-6.11/cifs-unlock-on-error-in-smb3_reconfigure.patch b/queue-6.11/cifs-unlock-on-error-in-smb3_reconfigure.patch new file mode 100644 index 00000000000..6fb6a65ca6f --- /dev/null +++ b/queue-6.11/cifs-unlock-on-error-in-smb3_reconfigure.patch @@ -0,0 +1,39 @@ +From 0e7cfaa8eb46f23ecbeb86eb5c5378d69b400d5a Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 15 Nov 2024 12:13:58 +0300 +Subject: cifs: unlock on error in smb3_reconfigure() + +From: Dan Carpenter + +[ Upstream commit cda88d2fef7aa7de80b5697e8009fcbbb436f42d ] + +Unlock before returning if smb3_sync_session_ctx_passwords() fails. + +Fixes: 7e654ab7da03 ("cifs: during remount, make sure passwords are in sync") +Signed-off-by: Dan Carpenter +Reviewed-by: Bharath SM +Signed-off-by: Steve French +Signed-off-by: Sasha Levin +--- + fs/smb/client/fs_context.c | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c +index e84660b48d533..e9fe48a3625ba 100644 +--- a/fs/smb/client/fs_context.c ++++ b/fs/smb/client/fs_context.c +@@ -978,8 +978,10 @@ static int smb3_reconfigure(struct fs_context *fc) + * later stage + */ + rc = smb3_sync_session_ctx_passwords(cifs_sb, ses); +- if (rc) ++ if (rc) { ++ mutex_unlock(&ses->session_mutex); + return rc; ++ } + + /* + * now that allocations for passwords are done, commit them +-- +2.43.0 + diff --git a/queue-6.11/jffs2-fix-use-of-uninitialized-variable.patch b/queue-6.11/jffs2-fix-use-of-uninitialized-variable.patch new file mode 100644 index 00000000000..8c094afddd9 --- /dev/null +++ b/queue-6.11/jffs2-fix-use-of-uninitialized-variable.patch @@ -0,0 +1,57 @@ +From 8db728d33e5985bb36a7e6784a2ae806987eae56 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 1 Jul 2024 12:52:05 +0800 +Subject: jffs2: fix use of uninitialized variable + +From: Qingfang Deng + +[ Upstream commit 3ba44ee966bc3c41dd8a944f963466c8fcc60dc8 ] + +When building the kernel with -Wmaybe-uninitialized, the compiler +reports this warning: + +In function 'jffs2_mark_erased_block', + inlined from 'jffs2_erase_pending_blocks' at fs/jffs2/erase.c:116:4: +fs/jffs2/erase.c:474:9: warning: 'bad_offset' may be used uninitialized [-Wmaybe-uninitialized] + 474 | jffs2_erase_failed(c, jeb, bad_offset); + | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +fs/jffs2/erase.c: In function 'jffs2_erase_pending_blocks': +fs/jffs2/erase.c:402:18: note: 'bad_offset' was declared here + 402 | uint32_t bad_offset; + | ^~~~~~~~~~ + +When mtd->point() is used, jffs2_erase_pending_blocks can return -EIO +without initializing bad_offset, which is later used at the filebad +label in jffs2_mark_erased_block. +Fix it by initializing this variable. + +Fixes: 8a0f572397ca ("[JFFS2] Return values of jffs2_block_check_erase error paths") +Signed-off-by: Qingfang Deng +Reviewed-by: Zhihao Cheng +Signed-off-by: Richard Weinberger +Signed-off-by: Sasha Levin +--- + fs/jffs2/erase.c | 7 +++---- + 1 file changed, 3 insertions(+), 4 deletions(-) + +diff --git a/fs/jffs2/erase.c b/fs/jffs2/erase.c +index acd32f05b5198..ef3a1e1b6cb06 100644 +--- a/fs/jffs2/erase.c ++++ b/fs/jffs2/erase.c +@@ -338,10 +338,9 @@ static int jffs2_block_check_erase(struct jffs2_sb_info *c, struct jffs2_erasebl + } while(--retlen); + mtd_unpoint(c->mtd, jeb->offset, c->sector_size); + if (retlen) { +- pr_warn("Newly-erased block contained word 0x%lx at offset 0x%08tx\n", +- *wordebuf, +- jeb->offset + +- c->sector_size-retlen * sizeof(*wordebuf)); ++ *bad_offset = jeb->offset + c->sector_size - retlen * sizeof(*wordebuf); ++ pr_warn("Newly-erased block contained word 0x%lx at offset 0x%08x\n", ++ *wordebuf, *bad_offset); + return -EIO; + } + return 0; +-- +2.43.0 + diff --git a/queue-6.11/kbuild-deb-pkg-don-t-fail-if-modules.order-is-missin.patch b/queue-6.11/kbuild-deb-pkg-don-t-fail-if-modules.order-is-missin.patch new file mode 100644 index 00000000000..7b9842ad440 --- /dev/null +++ b/queue-6.11/kbuild-deb-pkg-don-t-fail-if-modules.order-is-missin.patch @@ -0,0 +1,58 @@ +From feab48111ae71cb3917521543a4edff6e4bc6966 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 7 Nov 2024 15:05:08 +0000 +Subject: kbuild: deb-pkg: Don't fail if modules.order is missing + +From: Matt Fleming + +[ Upstream commit bcbbf493f2fa6fa1f0832f6b5b4c80a65de242d6 ] + +Kernels built without CONFIG_MODULES might still want to create -dbg deb +packages but install_linux_image_dbg() assumes modules.order always +exists. This obviously isn't true if no modules were built, so we should +skip reading modules.order in that case. + +Fixes: 16c36f8864e3 ("kbuild: deb-pkg: use build ID instead of debug link for dbg package") +Signed-off-by: Matt Fleming +Signed-off-by: Masahiro Yamada +Signed-off-by: Sasha Levin +--- + scripts/package/builddeb | 22 ++++++++++++---------- + 1 file changed, 12 insertions(+), 10 deletions(-) + +diff --git a/scripts/package/builddeb b/scripts/package/builddeb +index c1757db6aa8a8..718fbf99e2cea 100755 +--- a/scripts/package/builddeb ++++ b/scripts/package/builddeb +@@ -97,16 +97,18 @@ install_linux_image_dbg () { + + # Parse modules.order directly because 'make modules_install' may sign, + # compress modules, and then run unneeded depmod. +- while read -r mod; do +- mod="${mod%.o}.ko" +- dbg="${pdir}/usr/lib/debug/lib/modules/${KERNELRELEASE}/kernel/${mod}" +- buildid=$("${READELF}" -n "${mod}" | sed -n 's@^.*Build ID: \(..\)\(.*\)@\1/\2@p') +- link="${pdir}/usr/lib/debug/.build-id/${buildid}.debug" +- +- mkdir -p "${dbg%/*}" "${link%/*}" +- "${OBJCOPY}" --only-keep-debug "${mod}" "${dbg}" +- ln -sf --relative "${dbg}" "${link}" +- done < modules.order ++ if is_enabled CONFIG_MODULES; then ++ while read -r mod; do ++ mod="${mod%.o}.ko" ++ dbg="${pdir}/usr/lib/debug/lib/modules/${KERNELRELEASE}/kernel/${mod}" ++ buildid=$("${READELF}" -n "${mod}" | sed -n 's@^.*Build ID: \(..\)\(.*\)@\1/\2@p') ++ link="${pdir}/usr/lib/debug/.build-id/${buildid}.debug" ++ ++ mkdir -p "${dbg%/*}" "${link%/*}" ++ "${OBJCOPY}" --only-keep-debug "${mod}" "${dbg}" ++ ln -sf --relative "${dbg}" "${link}" ++ done < modules.order ++ fi + + # Build debug package + # Different tools want the image in different locations +-- +2.43.0 + diff --git a/queue-6.11/kfifo-don-t-include-dma-mapping.h-in-kfifo.h.patch b/queue-6.11/kfifo-don-t-include-dma-mapping.h-in-kfifo.h.patch new file mode 100644 index 00000000000..077dd063267 --- /dev/null +++ b/queue-6.11/kfifo-don-t-include-dma-mapping.h-in-kfifo.h.patch @@ -0,0 +1,67 @@ +From fd83d21d28f092b4884dc11b685c88223a0e72b1 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 23 Oct 2024 07:53:04 +0200 +Subject: kfifo: don't include dma-mapping.h in kfifo.h + +From: Christoph Hellwig + +[ Upstream commit 44059790a5cb9258ae6137387e4c39b717fd2ced ] + +Nothing in kfifo.h directly needs dma-mapping.h, only two macros +use DMA_MAPPING_ERROR when actually instantiated. Drop the +dma-mapping.h include to reduce include bloat. + +Add an explicity include to drivers/mailbox/omap-mailbox.c +as that file uses __raw_readl and __raw_writel through a complicated +include chain involving + +Fixes: d52b761e4b1a ("kfifo: add kfifo_dma_out_prepare_mapped()") +Signed-off-by: Christoph Hellwig +Link: https://lore.kernel.org/r/20241023055317.313234-1-hch@lst.de +Signed-off-by: Greg Kroah-Hartman +Signed-off-by: Sasha Levin +--- + drivers/mailbox/omap-mailbox.c | 1 + + include/linux/kfifo.h | 1 - + samples/kfifo/dma-example.c | 1 + + 3 files changed, 2 insertions(+), 1 deletion(-) + +diff --git a/drivers/mailbox/omap-mailbox.c b/drivers/mailbox/omap-mailbox.c +index 7a87424657a15..bd0b9762cef4f 100644 +--- a/drivers/mailbox/omap-mailbox.c ++++ b/drivers/mailbox/omap-mailbox.c +@@ -15,6 +15,7 @@ + #include + #include + #include ++#include + #include + #include + #include +diff --git a/include/linux/kfifo.h b/include/linux/kfifo.h +index 564868bdce898..fd743d4c4b4bd 100644 +--- a/include/linux/kfifo.h ++++ b/include/linux/kfifo.h +@@ -37,7 +37,6 @@ + */ + + #include +-#include + #include + #include + #include +diff --git a/samples/kfifo/dma-example.c b/samples/kfifo/dma-example.c +index 48df719dac8c6..8076ac410161a 100644 +--- a/samples/kfifo/dma-example.c ++++ b/samples/kfifo/dma-example.c +@@ -9,6 +9,7 @@ + #include + #include + #include ++#include + + /* + * This module shows how to handle fifo dma operations. +-- +2.43.0 + diff --git a/queue-6.11/modpost-remove-incorrect-code-in-do_eisa_entry.patch b/queue-6.11/modpost-remove-incorrect-code-in-do_eisa_entry.patch new file mode 100644 index 00000000000..0983285d28b --- /dev/null +++ b/queue-6.11/modpost-remove-incorrect-code-in-do_eisa_entry.patch @@ -0,0 +1,86 @@ +From 901e3a03a9cdf5908857b48e1d3972e3badbec4b Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 20 Nov 2024 08:56:39 +0900 +Subject: modpost: remove incorrect code in do_eisa_entry() +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Masahiro Yamada + +[ Upstream commit 0c3e091319e4748cb36ac9a50848903dc6f54054 ] + +This function contains multiple bugs after the following commits: + + - ac551828993e ("modpost: i2c aliases need no trailing wildcard") + - 6543becf26ff ("mod/file2alias: make modalias generation safe for cross compiling") + +Commit ac551828993e inserted the following code to do_eisa_entry(): + +    else +            strcat(alias, "*"); + +This is incorrect because 'alias' is uninitialized. If it is not +NULL-terminated, strcat() could cause a buffer overrun. + +Even if 'alias' happens to be zero-filled, it would output: + + MODULE_ALIAS("*"); + +This would match anything. As a result, the module could be loaded by +any unrelated uevent from an unrelated subsystem. + +Commit ac551828993e introduced another bug.             + +Prior to that commit, the conditional check was: + +    if (eisa->sig[0]) + +This checked if the first character of eisa_device_id::sig was not '\0'. + +However, commit ac551828993e changed it as follows: + +    if (sig[0]) + +sig[0] is NOT the first character of the eisa_device_id::sig. The +type of 'sig' is 'char (*)[8]', meaning that the type of 'sig[0]' is +'char [8]' instead of 'char'. 'sig[0]' and 'symval' refer to the same +address, which never becomes NULL. + +The correct conversion would have been: + +    if ((*sig)[0]) + +However, this if-conditional was meaningless because the earlier change +in commit ac551828993e was incorrect. + +This commit removes the entire incorrect code, which should never have +been executed. + +Fixes: ac551828993e ("modpost: i2c aliases need no trailing wildcard") +Fixes: 6543becf26ff ("mod/file2alias: make modalias generation safe for cross compiling") +Signed-off-by: Masahiro Yamada +Signed-off-by: Sasha Levin +--- + scripts/mod/file2alias.c | 5 +---- + 1 file changed, 1 insertion(+), 4 deletions(-) + +diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c +index 5d1c61fa5a550..bcb5a7e20775e 100644 +--- a/scripts/mod/file2alias.c ++++ b/scripts/mod/file2alias.c +@@ -809,10 +809,7 @@ static int do_eisa_entry(const char *filename, void *symval, + char *alias) + { + DEF_FIELD_ADDR(symval, eisa_device_id, sig); +- if (sig[0]) +- sprintf(alias, EISA_DEVICE_MODALIAS_FMT "*", *sig); +- else +- strcat(alias, "*"); ++ sprintf(alias, EISA_DEVICE_MODALIAS_FMT "*", *sig); + return 1; + } + +-- +2.43.0 + diff --git a/queue-6.11/nfs-blocklayout-don-t-attempt-unregister-for-invalid.patch b/queue-6.11/nfs-blocklayout-don-t-attempt-unregister-for-invalid.patch new file mode 100644 index 00000000000..f2f44d36490 --- /dev/null +++ b/queue-6.11/nfs-blocklayout-don-t-attempt-unregister-for-invalid.patch @@ -0,0 +1,72 @@ +From 61cbe247b823c010cb36922614cb22d20bece25f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 22 Nov 2024 10:11:11 -0500 +Subject: nfs/blocklayout: Don't attempt unregister for invalid block device + +From: Benjamin Coddington + +[ Upstream commit 3a4ce14d9a6b868e0787e4582420b721c04ee41e ] + +Since commit d869da91cccb ("nfs/blocklayout: Fix premature PR key +unregistration") an unmount of a pNFS SCSI layout-enabled NFS may +dereference a NULL block_device in: + + bl_unregister_scsi+0x16/0xe0 [blocklayoutdriver] + bl_free_device+0x70/0x80 [blocklayoutdriver] + bl_free_deviceid_node+0x12/0x30 [blocklayoutdriver] + nfs4_put_deviceid_node+0x60/0xc0 [nfsv4] + nfs4_deviceid_purge_client+0x132/0x190 [nfsv4] + unset_pnfs_layoutdriver+0x59/0x60 [nfsv4] + nfs4_destroy_server+0x36/0x70 [nfsv4] + nfs_free_server+0x23/0xe0 [nfs] + deactivate_locked_super+0x30/0xb0 + cleanup_mnt+0xba/0x150 + task_work_run+0x59/0x90 + syscall_exit_to_user_mode+0x217/0x220 + do_syscall_64+0x8e/0x160 + +This happens because even though we were able to create the +nfs4_deviceid_node, the lookup for the device was unable to attach the +block device to the pnfs_block_dev. + +If we never found a block device to register, we can avoid this case with +the PNFS_BDEV_REGISTERED flag. Move the deref behind the test for the +flag. + +Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") +Signed-off-by: Benjamin Coddington +Reviewed-by: Christoph Hellwig +Reviewed-by: Chuck Lever +Signed-off-by: Trond Myklebust +Signed-off-by: Sasha Levin +--- + fs/nfs/blocklayout/dev.c | 6 ++---- + 1 file changed, 2 insertions(+), 4 deletions(-) + +diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c +index 6252f44479457..cab8809f0e0f4 100644 +--- a/fs/nfs/blocklayout/dev.c ++++ b/fs/nfs/blocklayout/dev.c +@@ -20,9 +20,6 @@ static void bl_unregister_scsi(struct pnfs_block_dev *dev) + const struct pr_ops *ops = bdev->bd_disk->fops->pr_ops; + int status; + +- if (!test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags)) +- return; +- + status = ops->pr_register(bdev, dev->pr_key, 0, false); + if (status) + trace_bl_pr_key_unreg_err(bdev, dev->pr_key, status); +@@ -58,7 +55,8 @@ static void bl_unregister_dev(struct pnfs_block_dev *dev) + return; + } + +- if (dev->type == PNFS_BLOCK_VOLUME_SCSI) ++ if (dev->type == PNFS_BLOCK_VOLUME_SCSI && ++ test_and_clear_bit(PNFS_BDEV_REGISTERED, &dev->flags)) + bl_unregister_scsi(dev); + } + +-- +2.43.0 + diff --git a/queue-6.11/nfs-blocklayout-limit-repeat-device-registration-on-.patch b/queue-6.11/nfs-blocklayout-limit-repeat-device-registration-on-.patch new file mode 100644 index 00000000000..a72404fefe9 --- /dev/null +++ b/queue-6.11/nfs-blocklayout-limit-repeat-device-registration-on-.patch @@ -0,0 +1,71 @@ +From 89f1fdac02766523cd94bc92800aeb3356cddf5c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 22 Nov 2024 10:11:12 -0500 +Subject: nfs/blocklayout: Limit repeat device registration on failure + +From: Benjamin Coddington + +[ Upstream commit 614733f9441ed53bb442d4734112ec1e24bd6da7 ] + +Every pNFS SCSI IO wants to do LAYOUTGET, then within the layout find the +device which can drive GETDEVINFO, then finally may need to prep the device +with a reservation. This slow work makes a mess of IO latencies if one of +the later steps is going to fail for awhile. + +If we're unable to register a SCSI device, ensure we mark the device as +unavailable so that it will timeout and be re-added via GETDEVINFO. This +avoids repeated doomed attempts to register a device in the IO path. + +Add some clarifying comments as well. + +Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration") +Signed-off-by: Benjamin Coddington +Reviewed-by: Christoph Hellwig +Reviewed-by: Chuck Lever +Signed-off-by: Trond Myklebust +Signed-off-by: Sasha Levin +--- + fs/nfs/blocklayout/blocklayout.c | 15 ++++++++++++++- + 1 file changed, 14 insertions(+), 1 deletion(-) + +diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c +index 0becdec129704..47189476b5538 100644 +--- a/fs/nfs/blocklayout/blocklayout.c ++++ b/fs/nfs/blocklayout/blocklayout.c +@@ -571,19 +571,32 @@ bl_find_get_deviceid(struct nfs_server *server, + if (!node) + return ERR_PTR(-ENODEV); + ++ /* ++ * Devices that are marked unavailable are left in the cache with a ++ * timeout to avoid sending GETDEVINFO after every LAYOUTGET, or ++ * constantly attempting to register the device. Once marked as ++ * unavailable they must be deleted and never reused. ++ */ + if (test_bit(NFS_DEVICEID_UNAVAILABLE, &node->flags)) { + unsigned long end = jiffies; + unsigned long start = end - PNFS_DEVICE_RETRY_TIMEOUT; + + if (!time_in_range(node->timestamp_unavailable, start, end)) { ++ /* Uncork subsequent GETDEVINFO operations for this device */ + nfs4_delete_deviceid(node->ld, node->nfs_client, id); + goto retry; + } + goto out_put; + } + +- if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) ++ if (!bl_register_dev(container_of(node, struct pnfs_block_dev, node))) { ++ /* ++ * If we cannot register, treat this device as transient: ++ * Make a negative cache entry for the device ++ */ ++ nfs4_mark_deviceid_unavailable(node); + goto out_put; ++ } + + return node; + +-- +2.43.0 + diff --git a/queue-6.11/nfs-ignore-sb_rdonly-when-mounting-nfs.patch b/queue-6.11/nfs-ignore-sb_rdonly-when-mounting-nfs.patch new file mode 100644 index 00000000000..af5dfc20bb3 --- /dev/null +++ b/queue-6.11/nfs-ignore-sb_rdonly-when-mounting-nfs.patch @@ -0,0 +1,79 @@ +From 2eac1b6ee7d4c747ff7e557b9b1588a828bb964b Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 14 Nov 2024 12:53:03 +0800 +Subject: nfs: ignore SB_RDONLY when mounting nfs + +From: Li Lingfeng + +[ Upstream commit 52cb7f8f177878b4f22397b9c4d2c8f743766be3 ] + +When exporting only one file system with fsid=0 on the server side, the +client alternately uses the ro/rw mount options to perform the mount +operation, and a new vfsmount is generated each time. + +It can be reproduced as follows: +[root@localhost ~]# mount /dev/sda /mnt2 +[root@localhost ~]# echo "/mnt2 *(rw,no_root_squash,fsid=0)" >/etc/exports +[root@localhost ~]# systemctl restart nfs-server +[root@localhost ~]# mount -t nfs -o ro,vers=4 127.0.0.1:/ /mnt/sdaa +[root@localhost ~]# mount -t nfs -o rw,vers=4 127.0.0.1:/ /mnt/sdaa +[root@localhost ~]# mount -t nfs -o ro,vers=4 127.0.0.1:/ /mnt/sdaa +[root@localhost ~]# mount -t nfs -o rw,vers=4 127.0.0.1:/ /mnt/sdaa +[root@localhost ~]# mount | grep nfs4 +127.0.0.1:/ on /mnt/sdaa type nfs4 (ro,relatime,vers=4.2,rsize=1048576,... +127.0.0.1:/ on /mnt/sdaa type nfs4 (rw,relatime,vers=4.2,rsize=1048576,... +127.0.0.1:/ on /mnt/sdaa type nfs4 (ro,relatime,vers=4.2,rsize=1048576,... +127.0.0.1:/ on /mnt/sdaa type nfs4 (rw,relatime,vers=4.2,rsize=1048576,... +[root@localhost ~]# + +We expected that after mounting with the ro option, using the rw option to +mount again would return EBUSY, but the actual situation was not the case. + +As shown above, when mounting for the first time, a superblock with the ro +flag will be generated, and at the same time, in do_new_mount_fc --> +do_add_mount, it detects that the superblock corresponding to the current +target directory is inconsistent with the currently generated one +(path->mnt->mnt_sb != newmnt->mnt.mnt_sb), and a new vfsmount will be +generated. + +When mounting with the rw option for the second time, since no matching +superblock can be found in the fs_supers list, a new superblock with the +rw flag will be generated again. The superblock in use (ro) is different +from the newly generated superblock (rw), and a new vfsmount will be +generated again. + +When mounting with the ro option for the third time, the superblock (ro) +is found in fs_supers, the superblock in use (rw) is different from the +found superblock (ro), and a new vfsmount will be generated again. + +We can switch between ro/rw through remount, and only one superblock needs +to be generated, thus avoiding the problem of repeated generation of +vfsmount caused by switching superblocks. + +Furthermore, This can also resolve the issue described in the link. + +Fixes: 275a5d24bf56 ("NFS: Error when mounting the same filesystem with different options") +Link: https://lore.kernel.org/all/20240604112636.236517-3-lilingfeng@huaweicloud.com/ +Signed-off-by: Li Lingfeng +Signed-off-by: Trond Myklebust +Signed-off-by: Sasha Levin +--- + fs/nfs/internal.h | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h +index 5902a9beca1f3..080af26c14dce 100644 +--- a/fs/nfs/internal.h ++++ b/fs/nfs/internal.h +@@ -11,7 +11,7 @@ + #include + #include + +-#define NFS_SB_MASK (SB_RDONLY|SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS) ++#define NFS_SB_MASK (SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS) + + extern const struct export_operations nfs_export_ops; + +-- +2.43.0 + diff --git a/queue-6.11/nfsv4.0-fix-a-use-after-free-problem-in-the-asynchro.patch b/queue-6.11/nfsv4.0-fix-a-use-after-free-problem-in-the-asynchro.patch new file mode 100644 index 00000000000..99e9bb310fc --- /dev/null +++ b/queue-6.11/nfsv4.0-fix-a-use-after-free-problem-in-the-asynchro.patch @@ -0,0 +1,52 @@ +From 680a54d449ccef44363caa770ecfe8c3b26f40ca Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 8 Nov 2024 12:13:31 -0500 +Subject: NFSv4.0: Fix a use-after-free problem in the asynchronous open() + +From: Trond Myklebust + +[ Upstream commit 2fdb05dc0931250574f0cb0ebeb5ed8e20f4a889 ] + +Yang Erkun reports that when two threads are opening files at the same +time, and are forced to abort before a reply is seen, then the call to +nfs_release_seqid() in nfs4_opendata_free() can result in a +use-after-free of the pointer to the defunct rpc task of the other +thread. +The fix is to ensure that if the RPC call is aborted before the call to +nfs_wait_on_sequence() is complete, then we must call nfs_release_seqid() +in nfs4_open_release() before the rpc_task is freed. + +Reported-by: Yang Erkun +Fixes: 24ac23ab88df ("NFSv4: Convert open() into an asynchronous RPC call") +Reviewed-by: Yang Erkun +Signed-off-by: Trond Myklebust +Signed-off-by: Sasha Levin +--- + fs/nfs/nfs4proc.c | 8 +++++--- + 1 file changed, 5 insertions(+), 3 deletions(-) + +diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c +index 9d40319e063de..405f17e6e0b45 100644 +--- a/fs/nfs/nfs4proc.c ++++ b/fs/nfs/nfs4proc.c +@@ -2603,12 +2603,14 @@ static void nfs4_open_release(void *calldata) + struct nfs4_opendata *data = calldata; + struct nfs4_state *state = NULL; + ++ /* In case of error, no cleanup! */ ++ if (data->rpc_status != 0 || !data->rpc_done) { ++ nfs_release_seqid(data->o_arg.seqid); ++ goto out_free; ++ } + /* If this request hasn't been cancelled, do nothing */ + if (!data->cancelled) + goto out_free; +- /* In case of error, no cleanup! */ +- if (data->rpc_status != 0 || !data->rpc_done) +- goto out_free; + /* In case we need an open_confirm, no cleanup! */ + if (data->o_res.rflags & NFS4_OPEN_RESULT_CONFIRM) + goto out_free; +-- +2.43.0 + diff --git a/queue-6.11/nvme-fabrics-fix-kernel-crash-while-shutting-down-co.patch b/queue-6.11/nvme-fabrics-fix-kernel-crash-while-shutting-down-co.patch new file mode 100644 index 00000000000..e5761d8fe4c --- /dev/null +++ b/queue-6.11/nvme-fabrics-fix-kernel-crash-while-shutting-down-co.patch @@ -0,0 +1,96 @@ +From 4365cf2dac213a0d94c0d300c5ea23439f79b135 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 5 Nov 2024 11:42:09 +0530 +Subject: nvme-fabrics: fix kernel crash while shutting down controller + +From: Nilay Shroff + +[ Upstream commit e9869c85c81168a1275f909d5972a3fc435304be ] + +The nvme keep-alive operation, which executes at a periodic interval, +could potentially sneak in while shutting down a fabric controller. +This may lead to a race between the fabric controller admin queue +destroy code path (invoked while shutting down controller) and hw/hctx +queue dispatcher called from the nvme keep-alive async request queuing +operation. This race could lead to the kernel crash shown below: + +Call Trace: + autoremove_wake_function+0x0/0xbc (unreliable) + __blk_mq_sched_dispatch_requests+0x114/0x24c + blk_mq_sched_dispatch_requests+0x44/0x84 + blk_mq_run_hw_queue+0x140/0x220 + nvme_keep_alive_work+0xc8/0x19c [nvme_core] + process_one_work+0x200/0x4e0 + worker_thread+0x340/0x504 + kthread+0x138/0x140 + start_kernel_thread+0x14/0x18 + +While shutting down fabric controller, if nvme keep-alive request sneaks +in then it would be flushed off. The nvme_keep_alive_end_io function is +then invoked to handle the end of the keep-alive operation which +decrements the admin->q_usage_counter and assuming this is the last/only +request in the admin queue then the admin->q_usage_counter becomes zero. +If that happens then blk-mq destroy queue operation (blk_mq_destroy_ +queue()) which could be potentially running simultaneously on another +cpu (as this is the controller shutdown code path) would forward +progress and deletes the admin queue. So, now from this point onward +we are not supposed to access the admin queue resources. However the +issue here's that the nvme keep-alive thread running hw/hctx queue +dispatch operation hasn't yet finished its work and so it could still +potentially access the admin queue resource while the admin queue had +been already deleted and that causes the above crash. + +The above kernel crash is regression caused due to changes implemented +in commit a54a93d0e359 ("nvme: move stopping keep-alive into +nvme_uninit_ctrl()"). Ideally we should stop keep-alive before destroyin +g the admin queue and freeing the admin tagset so that it wouldn't sneak +in during the shutdown operation. However we removed the keep alive stop +operation from the beginning of the controller shutdown code path in commit +a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()") +and added it under nvme_uninit_ctrl() which executes very late in the +shutdown code path after the admin queue is destroyed and its tagset is +removed. So this change created the possibility of keep-alive sneaking in +and interfering with the shutdown operation and causing observed kernel +crash. + +To fix the observed crash, we decided to move nvme_stop_keep_alive() from +nvme_uninit_ctrl() to nvme_remove_admin_tag_set(). This change would ensure +that we don't forward progress and delete the admin queue until the keep- +alive operation is finished (if it's in-flight) or cancelled and that would +help contain the race condition explained above and hence avoid the crash. + +Moving nvme_stop_keep_alive() to nvme_remove_admin_tag_set() instead of +adding nvme_stop_keep_alive() to the beginning of the controller shutdown +code path in nvme_stop_ctrl(), as was the case earlier before commit +a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()"), +would help save one callsite of nvme_stop_keep_alive(). + +Fixes: a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()") +Link: https://lore.kernel.org/all/1a21f37b-0f2a-4745-8c56-4dc8628d3983@linux.ibm.com/ +Reviewed-by: Ming Lei +Signed-off-by: Nilay Shroff +Signed-off-by: Keith Busch +Signed-off-by: Sasha Levin +--- + drivers/nvme/host/core.c | 5 +++++ + 1 file changed, 5 insertions(+) + +diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c +index 128932c849a1a..f49431cbc8dfc 100644 +--- a/drivers/nvme/host/core.c ++++ b/drivers/nvme/host/core.c +@@ -4551,6 +4551,11 @@ EXPORT_SYMBOL_GPL(nvme_alloc_admin_tag_set); + + void nvme_remove_admin_tag_set(struct nvme_ctrl *ctrl) + { ++ /* ++ * As we're about to destroy the queue and free tagset ++ * we can not have keep-alive work running. ++ */ ++ nvme_stop_keep_alive(ctrl); + blk_mq_destroy_queue(ctrl->admin_q); + blk_put_queue(ctrl->admin_q); + if (ctrl->ops->flags & NVME_F_FABRICS) { +-- +2.43.0 + diff --git a/queue-6.11/nvme-multipath-avoid-hang-on-inaccessible-namespaces.patch b/queue-6.11/nvme-multipath-avoid-hang-on-inaccessible-namespaces.patch new file mode 100644 index 00000000000..ae90c01a595 --- /dev/null +++ b/queue-6.11/nvme-multipath-avoid-hang-on-inaccessible-namespaces.patch @@ -0,0 +1,90 @@ +From 9545c8c79758941a82be8afa1b4d1416b477ed24 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Sat, 14 Sep 2024 14:01:23 +0200 +Subject: nvme-multipath: avoid hang on inaccessible namespaces + +From: Hannes Reinecke + +[ Upstream commit 3b97f5a05cfc55e7729ff3769f63eef64e2178bb ] + +During repetitive namespace remapping operations on the target the +namespace might have changed between the time the initial scan +was performed, and partition scan was invoked by device_add_disk() +in nvme_mpath_set_live(). We then end up with a stuck scanning process: + +[<0>] folio_wait_bit_common+0x12a/0x310 +[<0>] filemap_read_folio+0x97/0xd0 +[<0>] do_read_cache_folio+0x108/0x390 +[<0>] read_part_sector+0x31/0xa0 +[<0>] read_lba+0xc5/0x160 +[<0>] efi_partition+0xd9/0x8f0 +[<0>] bdev_disk_changed+0x23d/0x6d0 +[<0>] blkdev_get_whole+0x78/0xc0 +[<0>] bdev_open+0x2c6/0x3b0 +[<0>] bdev_file_open_by_dev+0xcb/0x120 +[<0>] disk_scan_partitions+0x5d/0x100 +[<0>] device_add_disk+0x402/0x420 +[<0>] nvme_mpath_set_live+0x4f/0x1f0 [nvme_core] +[<0>] nvme_mpath_add_disk+0x107/0x120 [nvme_core] +[<0>] nvme_alloc_ns+0xac6/0xe60 [nvme_core] +[<0>] nvme_scan_ns+0x2dd/0x3e0 [nvme_core] +[<0>] nvme_scan_work+0x1a3/0x490 [nvme_core] + +This happens when we have several paths, some of which are inaccessible, +and the active paths are removed first. Then nvme_find_path() will requeue +I/O in the ns_head (as paths are present), but the requeue list is never +triggered as all remaining paths are inactive. + +This patch checks for NVME_NSHEAD_DISK_LIVE in nvme_available_path(), +and requeue I/O after NVME_NSHEAD_DISK_LIVE has been cleared once +the last path has been removed to properly terminate pending I/O. + +Signed-off-by: Hannes Reinecke +Reviewed-by: Sagi Grimberg +Signed-off-by: Keith Busch +Stable-dep-of: 5dd18f09ce73 ("nvme/multipath: Fix RCU list traversal to use SRCU primitive") +Signed-off-by: Sasha Levin +--- + drivers/nvme/host/multipath.c | 12 ++++++++++-- + 1 file changed, 10 insertions(+), 2 deletions(-) + +diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c +index a43982aaa40d7..6b2decd998de5 100644 +--- a/drivers/nvme/host/multipath.c ++++ b/drivers/nvme/host/multipath.c +@@ -421,6 +421,9 @@ static bool nvme_available_path(struct nvme_ns_head *head) + { + struct nvme_ns *ns; + ++ if (!test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) ++ return NULL; ++ + list_for_each_entry_rcu(ns, &head->list, siblings) { + if (test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ns->ctrl->flags)) + continue; +@@ -995,8 +998,7 @@ void nvme_mpath_shutdown_disk(struct nvme_ns_head *head) + { + if (!head->disk) + return; +- kblockd_schedule_work(&head->requeue_work); +- if (test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) { ++ if (test_and_clear_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) { + nvme_cdev_del(&head->cdev, &head->cdev_device); + /* + * requeue I/O after NVME_NSHEAD_DISK_LIVE has been cleared +@@ -1006,6 +1008,12 @@ void nvme_mpath_shutdown_disk(struct nvme_ns_head *head) + kblockd_schedule_work(&head->requeue_work); + del_gendisk(head->disk); + } ++ /* ++ * requeue I/O after NVME_NSHEAD_DISK_LIVE has been cleared ++ * to allow multipath to fail all I/O. ++ */ ++ synchronize_srcu(&head->srcu); ++ kblockd_schedule_work(&head->requeue_work); + } + + void nvme_mpath_remove_disk(struct nvme_ns_head *head) +-- +2.43.0 + diff --git a/queue-6.11/nvme-multipath-fix-rcu-list-traversal-to-use-srcu-pr.patch b/queue-6.11/nvme-multipath-fix-rcu-list-traversal-to-use-srcu-pr.patch new file mode 100644 index 00000000000..8440b8150f8 --- /dev/null +++ b/queue-6.11/nvme-multipath-fix-rcu-list-traversal-to-use-srcu-pr.patch @@ -0,0 +1,107 @@ +From f2ee2cf5a12613870c2a9fafb078d3eb4e948773 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 5 Nov 2024 06:42:46 -0800 +Subject: nvme/multipath: Fix RCU list traversal to use SRCU primitive + +From: Breno Leitao + +[ Upstream commit 5dd18f09ce7399df6fffe80d1598add46c395ae9 ] + +The code currently uses list_for_each_entry_rcu() while holding an SRCU +lock, triggering false positive warnings with CONFIG_PROVE_RCU=y +enabled: + + drivers/nvme/host/multipath.c:168 RCU-list traversed in non-reader section!! + drivers/nvme/host/multipath.c:227 RCU-list traversed in non-reader section!! + drivers/nvme/host/multipath.c:260 RCU-list traversed in non-reader section!! + +While the list is properly protected by SRCU lock, the code uses the +wrong list traversal primitive. Replace list_for_each_entry_rcu() with +list_for_each_entry_srcu() to correctly indicate SRCU-based protection +and eliminate the false warning. + +Signed-off-by: Breno Leitao +Fixes: be647e2c76b2 ("nvme: use srcu for iterating namespace list") +Signed-off-by: Keith Busch +Signed-off-by: Sasha Levin +--- + drivers/nvme/host/multipath.c | 21 ++++++++++++++------- + 1 file changed, 14 insertions(+), 7 deletions(-) + +diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c +index 6b2decd998de5..07fcf98540c56 100644 +--- a/drivers/nvme/host/multipath.c ++++ b/drivers/nvme/host/multipath.c +@@ -165,7 +165,8 @@ void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl) + int srcu_idx; + + srcu_idx = srcu_read_lock(&ctrl->srcu); +- list_for_each_entry_rcu(ns, &ctrl->namespaces, list) { ++ list_for_each_entry_srcu(ns, &ctrl->namespaces, list, ++ srcu_read_lock_held(&ctrl->srcu)) { + if (!ns->head->disk) + continue; + kblockd_schedule_work(&ns->head->requeue_work); +@@ -209,7 +210,8 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl) + int srcu_idx; + + srcu_idx = srcu_read_lock(&ctrl->srcu); +- list_for_each_entry_rcu(ns, &ctrl->namespaces, list) { ++ list_for_each_entry_srcu(ns, &ctrl->namespaces, list, ++ srcu_read_lock_held(&ctrl->srcu)) { + nvme_mpath_clear_current_path(ns); + kblockd_schedule_work(&ns->head->requeue_work); + } +@@ -224,7 +226,8 @@ void nvme_mpath_revalidate_paths(struct nvme_ns *ns) + int srcu_idx; + + srcu_idx = srcu_read_lock(&head->srcu); +- list_for_each_entry_rcu(ns, &head->list, siblings) { ++ list_for_each_entry_srcu(ns, &head->list, siblings, ++ srcu_read_lock_held(&head->srcu)) { + if (capacity != get_capacity(ns->disk)) + clear_bit(NVME_NS_READY, &ns->flags); + } +@@ -257,7 +260,8 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) + int found_distance = INT_MAX, fallback_distance = INT_MAX, distance; + struct nvme_ns *found = NULL, *fallback = NULL, *ns; + +- list_for_each_entry_rcu(ns, &head->list, siblings) { ++ list_for_each_entry_srcu(ns, &head->list, siblings, ++ srcu_read_lock_held(&head->srcu)) { + if (nvme_path_is_disabled(ns)) + continue; + +@@ -356,7 +360,8 @@ static struct nvme_ns *nvme_queue_depth_path(struct nvme_ns_head *head) + unsigned int min_depth_opt = UINT_MAX, min_depth_nonopt = UINT_MAX; + unsigned int depth; + +- list_for_each_entry_rcu(ns, &head->list, siblings) { ++ list_for_each_entry_srcu(ns, &head->list, siblings, ++ srcu_read_lock_held(&head->srcu)) { + if (nvme_path_is_disabled(ns)) + continue; + +@@ -424,7 +429,8 @@ static bool nvme_available_path(struct nvme_ns_head *head) + if (!test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) + return NULL; + +- list_for_each_entry_rcu(ns, &head->list, siblings) { ++ list_for_each_entry_srcu(ns, &head->list, siblings, ++ srcu_read_lock_held(&head->srcu)) { + if (test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ns->ctrl->flags)) + continue; + switch (nvme_ctrl_state(ns->ctrl)) { +@@ -786,7 +792,8 @@ static int nvme_update_ana_state(struct nvme_ctrl *ctrl, + return 0; + + srcu_idx = srcu_read_lock(&ctrl->srcu); +- list_for_each_entry_rcu(ns, &ctrl->namespaces, list) { ++ list_for_each_entry_srcu(ns, &ctrl->namespaces, list, ++ srcu_read_lock_held(&ctrl->srcu)) { + unsigned nsid; + again: + nsid = le32_to_cpu(desc->nsids[n]); +-- +2.43.0 + diff --git a/queue-6.11/perf-arm-cmn-ensure-port-and-device-id-bits-are-set-.patch b/queue-6.11/perf-arm-cmn-ensure-port-and-device-id-bits-are-set-.patch new file mode 100644 index 00000000000..16aee3edb66 --- /dev/null +++ b/queue-6.11/perf-arm-cmn-ensure-port-and-device-id-bits-are-set-.patch @@ -0,0 +1,50 @@ +From ab41a97dccd53008faf7467dee7d21e51ab78d9e Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 20 Nov 2024 16:13:34 -0800 +Subject: perf/arm-cmn: Ensure port and device id bits are set properly + +From: Namhyung Kim + +[ Upstream commit dfdf714fed559c09021df1d2a4bb64c0ad5f53bc ] + +The portid_bits and deviceid_bits were set only for XP type nodes in +the arm_cmn_discover() and it confused other nodes to find XP nodes. +Copy the both bits from the XP nodes directly when it sets up a new +node. + +Fixes: e79634b53e39 ("perf/arm-cmn: Refactor node ID handling. Again.") +Signed-off-by: Namhyung Kim +Acked-by: Will Deacon +Reviewed-by: Robin Murphy +Link: https://lore.kernel.org/r/20241121001334.331334-1-namhyung@kernel.org +Signed-off-by: Catalin Marinas +Signed-off-by: Sasha Levin +--- + drivers/perf/arm-cmn.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c +index 48863b31ccfb1..a2032d3979640 100644 +--- a/drivers/perf/arm-cmn.c ++++ b/drivers/perf/arm-cmn.c +@@ -2147,8 +2147,6 @@ static int arm_cmn_init_dtcs(struct arm_cmn *cmn) + continue; + + xp = arm_cmn_node_to_xp(cmn, dn); +- dn->portid_bits = xp->portid_bits; +- dn->deviceid_bits = xp->deviceid_bits; + dn->dtc = xp->dtc; + dn->dtm = xp->dtm; + if (cmn->multi_dtm) +@@ -2379,6 +2377,8 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) + } + + arm_cmn_init_node_info(cmn, reg & CMN_CHILD_NODE_ADDR, dn); ++ dn->portid_bits = xp->portid_bits; ++ dn->deviceid_bits = xp->deviceid_bits; + + switch (dn->type) { + case CMN_TYPE_DTC: +-- +2.43.0 + diff --git a/queue-6.11/perf-arm-smmuv3-fix-lockdep-assert-in-event_init.patch b/queue-6.11/perf-arm-smmuv3-fix-lockdep-assert-in-event_init.patch new file mode 100644 index 00000000000..88ead7def7e --- /dev/null +++ b/queue-6.11/perf-arm-smmuv3-fix-lockdep-assert-in-event_init.patch @@ -0,0 +1,68 @@ +From 6ad46c5dc10e23903f03dd90091ee31a5d92d52d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 8 Nov 2024 05:08:05 +0000 +Subject: perf/arm-smmuv3: Fix lockdep assert in ->event_init() + +From: Chun-Tse Shao + +[ Upstream commit 02a55f2743012a8089f09f6867220c3d57f16564 ] + +Same as +https://lore.kernel.org/all/20240514180050.182454-1-namhyung@kernel.org/, +we should skip `for_each_sibling_event()` for group leader since it +doesn't have the ctx yet. + +Fixes: f3c0eba28704 ("perf: Add a few assertions") +Reported-by: Greg Thelen +Cc: Namhyung Kim +Cc: Robin Murphy +Cc: Tuan Phan +Signed-off-by: Chun-Tse Shao +Acked-by: Will Deacon +Link: https://lore.kernel.org/r/20241108050806.3730811-1-ctshao@google.com +Signed-off-by: Catalin Marinas +Signed-off-by: Sasha Levin +--- + drivers/perf/arm_smmuv3_pmu.c | 19 +++++++++++-------- + 1 file changed, 11 insertions(+), 8 deletions(-) + +diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c +index d5fa92ba83739..dabdb9f7bb82c 100644 +--- a/drivers/perf/arm_smmuv3_pmu.c ++++ b/drivers/perf/arm_smmuv3_pmu.c +@@ -431,6 +431,17 @@ static int smmu_pmu_event_init(struct perf_event *event) + return -EINVAL; + } + ++ /* ++ * Ensure all events are on the same cpu so all events are in the ++ * same cpu context, to avoid races on pmu_enable etc. ++ */ ++ event->cpu = smmu_pmu->on_cpu; ++ ++ hwc->idx = -1; ++ ++ if (event->group_leader == event) ++ return 0; ++ + for_each_sibling_event(sibling, event->group_leader) { + if (is_software_event(sibling)) + continue; +@@ -442,14 +453,6 @@ static int smmu_pmu_event_init(struct perf_event *event) + return -EINVAL; + } + +- hwc->idx = -1; +- +- /* +- * Ensure all events are on the same cpu so all events are in the +- * same cpu context, to avoid races on pmu_enable etc. +- */ +- event->cpu = smmu_pmu->on_cpu; +- + return 0; + } + +-- +2.43.0 + diff --git a/queue-6.11/rename-.data.once-to-.data.once-to-fix-resetting-war.patch b/queue-6.11/rename-.data.once-to-.data.once-to-fix-resetting-war.patch new file mode 100644 index 00000000000..83fdec682b3 --- /dev/null +++ b/queue-6.11/rename-.data.once-to-.data.once-to-fix-resetting-war.patch @@ -0,0 +1,153 @@ +From 032b718389826fe39a05d642d6a04bcaf1f601c1 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 7 Nov 2024 01:14:41 +0900 +Subject: Rename .data.once to .data..once to fix resetting WARN*_ONCE + +From: Masahiro Yamada + +[ Upstream commit dbefa1f31a91670c9e7dac9b559625336206466f ] + +Commit b1fca27d384e ("kernel debug: support resetting WARN*_ONCE") +added support for clearing the state of once warnings. However, +it is not functional when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION or +CONFIG_LTO_CLANG is enabled, because .data.once matches the +.data.[0-9a-zA-Z_]* pattern in the DATA_MAIN macro. + +Commit cb87481ee89d ("kbuild: linker script do not match C names unless +LD_DEAD_CODE_DATA_ELIMINATION is configured") was introduced to suppress +the issue for the default CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n case, +providing a minimal fix for stable backporting. We were aware this did +not address the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. The +plan was to apply correct fixes and then revert cb87481ee89d. [1] + +Seven years have passed since then, yet the #ifdef workaround remains in +place. Meanwhile, commit b1fca27d384e introduced the .data.once section, +and commit dc5723b02e52 ("kbuild: add support for Clang LTO") extended +the #ifdef. + +Using a ".." separator in the section name fixes the issue for +CONFIG_LD_DEAD_CODE_DATA_ELIMINATION and CONFIG_LTO_CLANG. + +[1]: https://lore.kernel.org/linux-kbuild/CAK7LNASck6BfdLnESxXUeECYL26yUDm0cwRZuM4gmaWUkxjL5g@mail.gmail.com/ + +Fixes: b1fca27d384e ("kernel debug: support resetting WARN*_ONCE") +Fixes: dc5723b02e52 ("kbuild: add support for Clang LTO") +Signed-off-by: Masahiro Yamada +Signed-off-by: Sasha Levin +--- + include/asm-generic/vmlinux.lds.h | 2 +- + include/linux/mmdebug.h | 6 +++--- + include/linux/once.h | 4 ++-- + include/linux/once_lite.h | 2 +- + include/net/net_debug.h | 2 +- + mm/internal.h | 2 +- + 6 files changed, 9 insertions(+), 9 deletions(-) + +diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h +index bee5a71f4b41e..38d710f620efc 100644 +--- a/include/asm-generic/vmlinux.lds.h ++++ b/include/asm-generic/vmlinux.lds.h +@@ -351,7 +351,7 @@ + *(.data..shared_aligned) /* percpu related */ \ + *(.data..unlikely) \ + __start_once = .; \ +- *(.data.once) \ ++ *(.data..once) \ + __end_once = .; \ + STRUCT_ALIGN(); \ + *(__tracepoints) \ +diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h +index 39a7714605a79..d7cb1e5ecbda9 100644 +--- a/include/linux/mmdebug.h ++++ b/include/linux/mmdebug.h +@@ -46,7 +46,7 @@ void vma_iter_dump_tree(const struct vma_iterator *vmi); + } \ + } while (0) + #define VM_WARN_ON_ONCE_PAGE(cond, page) ({ \ +- static bool __section(".data.once") __warned; \ ++ static bool __section(".data..once") __warned; \ + int __ret_warn_once = !!(cond); \ + \ + if (unlikely(__ret_warn_once && !__warned)) { \ +@@ -66,7 +66,7 @@ void vma_iter_dump_tree(const struct vma_iterator *vmi); + unlikely(__ret_warn); \ + }) + #define VM_WARN_ON_ONCE_FOLIO(cond, folio) ({ \ +- static bool __section(".data.once") __warned; \ ++ static bool __section(".data..once") __warned; \ + int __ret_warn_once = !!(cond); \ + \ + if (unlikely(__ret_warn_once && !__warned)) { \ +@@ -77,7 +77,7 @@ void vma_iter_dump_tree(const struct vma_iterator *vmi); + unlikely(__ret_warn_once); \ + }) + #define VM_WARN_ON_ONCE_MM(cond, mm) ({ \ +- static bool __section(".data.once") __warned; \ ++ static bool __section(".data..once") __warned; \ + int __ret_warn_once = !!(cond); \ + \ + if (unlikely(__ret_warn_once && !__warned)) { \ +diff --git a/include/linux/once.h b/include/linux/once.h +index bc714d414448a..30346fcdc7995 100644 +--- a/include/linux/once.h ++++ b/include/linux/once.h +@@ -46,7 +46,7 @@ void __do_once_sleepable_done(bool *done, struct static_key_true *once_key, + #define DO_ONCE(func, ...) \ + ({ \ + bool ___ret = false; \ +- static bool __section(".data.once") ___done = false; \ ++ static bool __section(".data..once") ___done = false; \ + static DEFINE_STATIC_KEY_TRUE(___once_key); \ + if (static_branch_unlikely(&___once_key)) { \ + unsigned long ___flags; \ +@@ -64,7 +64,7 @@ void __do_once_sleepable_done(bool *done, struct static_key_true *once_key, + #define DO_ONCE_SLEEPABLE(func, ...) \ + ({ \ + bool ___ret = false; \ +- static bool __section(".data.once") ___done = false; \ ++ static bool __section(".data..once") ___done = false; \ + static DEFINE_STATIC_KEY_TRUE(___once_key); \ + if (static_branch_unlikely(&___once_key)) { \ + ___ret = __do_once_sleepable_start(&___done); \ +diff --git a/include/linux/once_lite.h b/include/linux/once_lite.h +index b7bce4983638f..27de7bc32a061 100644 +--- a/include/linux/once_lite.h ++++ b/include/linux/once_lite.h +@@ -12,7 +12,7 @@ + + #define __ONCE_LITE_IF(condition) \ + ({ \ +- static bool __section(".data.once") __already_done; \ ++ static bool __section(".data..once") __already_done; \ + bool __ret_cond = !!(condition); \ + bool __ret_once = false; \ + \ +diff --git a/include/net/net_debug.h b/include/net/net_debug.h +index 1e74684cbbdbc..4a79204c8d306 100644 +--- a/include/net/net_debug.h ++++ b/include/net/net_debug.h +@@ -27,7 +27,7 @@ void netdev_info(const struct net_device *dev, const char *format, ...); + + #define netdev_level_once(level, dev, fmt, ...) \ + do { \ +- static bool __section(".data.once") __print_once; \ ++ static bool __section(".data..once") __print_once; \ + \ + if (!__print_once) { \ + __print_once = true; \ +diff --git a/mm/internal.h b/mm/internal.h +index 7da580dfae6c5..c791312eae764 100644 +--- a/mm/internal.h ++++ b/mm/internal.h +@@ -42,7 +42,7 @@ struct folio_batch; + * when we specify __GFP_NOWARN. + */ + #define WARN_ON_ONCE_GFP(cond, gfp) ({ \ +- static bool __section(".data.once") __warned; \ ++ static bool __section(".data..once") __warned; \ + int __ret_warn_once = !!(cond); \ + \ + if (unlikely(!(gfp & __GFP_NOWARN) && __ret_warn_once && !__warned)) { \ +-- +2.43.0 + diff --git a/queue-6.11/rename-.data.unlikely-to-.data.unlikely.patch b/queue-6.11/rename-.data.unlikely-to-.data.unlikely.patch new file mode 100644 index 00000000000..0be899f07c8 --- /dev/null +++ b/queue-6.11/rename-.data.unlikely-to-.data.unlikely.patch @@ -0,0 +1,67 @@ +From 7eab06ecf86b9273d3e0d1c51db344851e171082 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 7 Nov 2024 01:14:40 +0900 +Subject: Rename .data.unlikely to .data..unlikely + +From: Masahiro Yamada + +[ Upstream commit bb43a59944f45e89aa158740b8a16ba8f0b0fa2b ] + +Commit 7ccaba5314ca ("consolidate WARN_...ONCE() static variables") +was intended to collect all .data.unlikely sections into one chunk. +However, this has not worked when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION +or CONFIG_LTO_CLANG is enabled, because .data.unlikely matches the +.data.[0-9a-zA-Z_]* pattern in the DATA_MAIN macro. + +Commit cb87481ee89d ("kbuild: linker script do not match C names unless +LD_DEAD_CODE_DATA_ELIMINATION is configured") was introduced to suppress +the issue for the default CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n case, +providing a minimal fix for stable backporting. We were aware this did +not address the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. The +plan was to apply correct fixes and then revert cb87481ee89d. [1] + +Seven years have passed since then, yet the #ifdef workaround remains in +place. + +Using a ".." separator in the section name fixes the issue for +CONFIG_LD_DEAD_CODE_DATA_ELIMINATION and CONFIG_LTO_CLANG. + +[1]: https://lore.kernel.org/linux-kbuild/CAK7LNASck6BfdLnESxXUeECYL26yUDm0cwRZuM4gmaWUkxjL5g@mail.gmail.com/ + +Fixes: cb87481ee89d ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured") +Signed-off-by: Masahiro Yamada +Signed-off-by: Sasha Levin +--- + include/asm-generic/vmlinux.lds.h | 2 +- + include/linux/rcupdate.h | 2 +- + 2 files changed, 2 insertions(+), 2 deletions(-) + +diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h +index 1ae44793132a8..bee5a71f4b41e 100644 +--- a/include/asm-generic/vmlinux.lds.h ++++ b/include/asm-generic/vmlinux.lds.h +@@ -349,7 +349,7 @@ + *(.data..decrypted) \ + *(.ref.data) \ + *(.data..shared_aligned) /* percpu related */ \ +- *(.data.unlikely) \ ++ *(.data..unlikely) \ + __start_once = .; \ + *(.data.once) \ + __end_once = .; \ +diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h +index 13f6f00aecf9c..1986d017b67fb 100644 +--- a/include/linux/rcupdate.h ++++ b/include/linux/rcupdate.h +@@ -390,7 +390,7 @@ static inline int debug_lockdep_rcu_enabled(void) + */ + #define RCU_LOCKDEP_WARN(c, s) \ + do { \ +- static bool __section(".data.unlikely") __warned; \ ++ static bool __section(".data..unlikely") __warned; \ + if (debug_lockdep_rcu_enabled() && (c) && \ + debug_lockdep_rcu_enabled() && !__warned) { \ + __warned = true; \ +-- +2.43.0 + diff --git a/queue-6.11/revert-nfs-don-t-reuse-partially-completed-requests-.patch b/queue-6.11/revert-nfs-don-t-reuse-partially-completed-requests-.patch new file mode 100644 index 00000000000..8f347b15bfb --- /dev/null +++ b/queue-6.11/revert-nfs-don-t-reuse-partially-completed-requests-.patch @@ -0,0 +1,118 @@ +From e1bc58caf58720c46c5ee8971668562bd7815077 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 4 Nov 2024 21:09:21 -0500 +Subject: Revert "nfs: don't reuse partially completed requests in + nfs_lock_and_join_requests" + +From: Trond Myklebust + +[ Upstream commit 66f9dac9077c9c063552e465212abeb8f97d28a7 ] + +This reverts commit b571cfcb9dcac187c6d967987792d37cb0688610. + +This patch appears to assume that if one request is complete, then the +others will complete too before unlocking. That is not a valid +assumption, since other requests could hit a non-fatal error or a short +write that would cause them not to complete. + +Reported-by: Igor Raits +Link: https://bugzilla.kernel.org/show_bug.cgi?id=219508 +Fixes: b571cfcb9dca ("nfs: don't reuse partially completed requests in nfs_lock_and_join_requests") +Signed-off-by: Trond Myklebust +Signed-off-by: Sasha Levin +--- + fs/nfs/write.c | 49 +++++++++++++++++++++++++++++-------------------- + 1 file changed, 29 insertions(+), 20 deletions(-) + +diff --git a/fs/nfs/write.c b/fs/nfs/write.c +index d074d0ceb4f01..1e2b9ee4222e7 100644 +--- a/fs/nfs/write.c ++++ b/fs/nfs/write.c +@@ -144,6 +144,31 @@ static void nfs_io_completion_put(struct nfs_io_completion *ioc) + kref_put(&ioc->refcount, nfs_io_completion_release); + } + ++static void ++nfs_page_set_inode_ref(struct nfs_page *req, struct inode *inode) ++{ ++ if (!test_and_set_bit(PG_INODE_REF, &req->wb_flags)) { ++ kref_get(&req->wb_kref); ++ atomic_long_inc(&NFS_I(inode)->nrequests); ++ } ++} ++ ++static int ++nfs_cancel_remove_inode(struct nfs_page *req, struct inode *inode) ++{ ++ int ret; ++ ++ if (!test_bit(PG_REMOVE, &req->wb_flags)) ++ return 0; ++ ret = nfs_page_group_lock(req); ++ if (ret) ++ return ret; ++ if (test_and_clear_bit(PG_REMOVE, &req->wb_flags)) ++ nfs_page_set_inode_ref(req, inode); ++ nfs_page_group_unlock(req); ++ return 0; ++} ++ + /** + * nfs_folio_find_head_request - find head request associated with a folio + * @folio: pointer to folio +@@ -540,7 +565,6 @@ static struct nfs_page *nfs_lock_and_join_requests(struct folio *folio) + struct inode *inode = folio->mapping->host; + struct nfs_page *head, *subreq; + struct nfs_commit_info cinfo; +- bool removed; + int ret; + + /* +@@ -565,18 +589,18 @@ static struct nfs_page *nfs_lock_and_join_requests(struct folio *folio) + goto retry; + } + +- ret = nfs_page_group_lock(head); ++ ret = nfs_cancel_remove_inode(head, inode); + if (ret < 0) + goto out_unlock; + +- removed = test_bit(PG_REMOVE, &head->wb_flags); ++ ret = nfs_page_group_lock(head); ++ if (ret < 0) ++ goto out_unlock; + + /* lock each request in the page group */ + for (subreq = head->wb_this_page; + subreq != head; + subreq = subreq->wb_this_page) { +- if (test_bit(PG_REMOVE, &subreq->wb_flags)) +- removed = true; + ret = nfs_page_group_lock_subreq(head, subreq); + if (ret < 0) + goto out_unlock; +@@ -584,21 +608,6 @@ static struct nfs_page *nfs_lock_and_join_requests(struct folio *folio) + + nfs_page_group_unlock(head); + +- /* +- * If PG_REMOVE is set on any request, I/O on that request has +- * completed, but some requests were still under I/O at the time +- * we locked the head request. +- * +- * In that case the above wait for all requests means that all I/O +- * has now finished, and we can restart from a clean slate. Let the +- * old requests go away and start from scratch instead. +- */ +- if (removed) { +- nfs_unroll_locks(head, head); +- nfs_unlock_and_release_request(head); +- goto retry; +- } +- + nfs_init_cinfo_from_inode(&cinfo, inode); + nfs_join_page_group(head, &cinfo, inode); + return head; +-- +2.43.0 + diff --git a/queue-6.11/rtc-ab-eoz9-don-t-fail-temperature-reads-on-undervol.patch b/queue-6.11/rtc-ab-eoz9-don-t-fail-temperature-reads-on-undervol.patch new file mode 100644 index 00000000000..de353b9eb5b --- /dev/null +++ b/queue-6.11/rtc-ab-eoz9-don-t-fail-temperature-reads-on-undervol.patch @@ -0,0 +1,49 @@ +From 6da815a715f306f295e8ea01bfa346ce44676352 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 22 Nov 2024 11:10:30 +0100 +Subject: rtc: ab-eoz9: don't fail temperature reads on undervoltage + notification + +From: Maxime Chevallier + +[ Upstream commit e0779a0dcf41a6452ac0a169cd96863feb5787c7 ] + +The undervoltage flags reported by the RTC are useful to know if the +time and date are reliable after a reboot. Although the threshold VLOW1 +indicates that the thermometer has been shutdown and time compensation +is off, it doesn't mean that the temperature readout is currently +impossible. + +As the system is running, the RTC voltage is now fully established and +we can read the temperature. + +Fixes: 67075b63cce2 ("rtc: add AB-RTCMC-32.768kHz-EOZ9 RTC support") +Signed-off-by: Maxime Chevallier +Link: https://lore.kernel.org/r/20241122101031.68916-3-maxime.chevallier@bootlin.com +Signed-off-by: Alexandre Belloni +Signed-off-by: Sasha Levin +--- + drivers/rtc/rtc-ab-eoz9.c | 7 ------- + 1 file changed, 7 deletions(-) + +diff --git a/drivers/rtc/rtc-ab-eoz9.c b/drivers/rtc/rtc-ab-eoz9.c +index 02f7d07112877..e17bce9a27468 100644 +--- a/drivers/rtc/rtc-ab-eoz9.c ++++ b/drivers/rtc/rtc-ab-eoz9.c +@@ -396,13 +396,6 @@ static int abeoz9z3_temp_read(struct device *dev, + if (ret < 0) + return ret; + +- if ((val & ABEOZ9_REG_CTRL_STATUS_V1F) || +- (val & ABEOZ9_REG_CTRL_STATUS_V2F)) { +- dev_err(dev, +- "thermometer might be disabled due to low voltage\n"); +- return -EINVAL; +- } +- + switch (attr) { + case hwmon_temp_input: + ret = regmap_read(regmap, ABEOZ9_REG_REG_TEMP, &val); +-- +2.43.0 + diff --git a/queue-6.11/rtc-abx80x-fix-wdt-bit-position-of-the-status-regist.patch b/queue-6.11/rtc-abx80x-fix-wdt-bit-position-of-the-status-regist.patch new file mode 100644 index 00000000000..90e0517a5c4 --- /dev/null +++ b/queue-6.11/rtc-abx80x-fix-wdt-bit-position-of-the-status-regist.patch @@ -0,0 +1,39 @@ +From 1b0ae1348f01ad8ab2454d2be8d22539dea7be79 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 8 Oct 2024 13:17:37 +0900 +Subject: rtc: abx80x: Fix WDT bit position of the status register + +From: Nobuhiro Iwamatsu + +[ Upstream commit 10e078b273ee7a2b8b4f05a64ac458f5e652d18d ] + +The WDT bit in the status register is 5, not 6. This fixes from 6 to 5. + +Link: https://abracon.com/Support/AppsManuals/Precisiontiming/AB08XX-Application-Manual.pdf +Link: https://www.microcrystal.com/fileadmin/Media/Products/RTC/App.Manual/RV-1805-C3_App-Manual.pdf +Fixes: 749e36d0a0d7 ("rtc: abx80x: add basic watchdog support") +Cc: Jeremy Gebben +Signed-off-by: Nobuhiro Iwamatsu +Link: https://lore.kernel.org/r/20241008041737.1640633-1-iwamatsu@nigauri.org +Signed-off-by: Alexandre Belloni +Signed-off-by: Sasha Levin +--- + drivers/rtc/rtc-abx80x.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/drivers/rtc/rtc-abx80x.c b/drivers/rtc/rtc-abx80x.c +index 1298962402ff4..3fee27914ba80 100644 +--- a/drivers/rtc/rtc-abx80x.c ++++ b/drivers/rtc/rtc-abx80x.c +@@ -39,7 +39,7 @@ + #define ABX8XX_REG_STATUS 0x0f + #define ABX8XX_STATUS_AF BIT(2) + #define ABX8XX_STATUS_BLF BIT(4) +-#define ABX8XX_STATUS_WDT BIT(6) ++#define ABX8XX_STATUS_WDT BIT(5) + + #define ABX8XX_REG_CTRL1 0x10 + #define ABX8XX_CTRL_WRITE BIT(0) +-- +2.43.0 + diff --git a/queue-6.11/rtc-check-if-__rtc_read_time-was-successful-in-rtc_t.patch b/queue-6.11/rtc-check-if-__rtc_read_time-was-successful-in-rtc_t.patch new file mode 100644 index 00000000000..11f6fdfaf4b --- /dev/null +++ b/queue-6.11/rtc-check-if-__rtc_read_time-was-successful-in-rtc_t.patch @@ -0,0 +1,53 @@ +From 9405c91f5075dfed5d8fa4c0e9e553bad8163bda Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 11 Oct 2024 12:31:53 +0800 +Subject: rtc: check if __rtc_read_time was successful in rtc_timer_do_work() + +From: Yongliang Gao + +[ Upstream commit e8ba8a2bc4f60a1065f23d6a0e7cbea945a0f40d ] + +If the __rtc_read_time call fails,, the struct rtc_time tm; may contain +uninitialized data, or an illegal date/time read from the RTC hardware. + +When calling rtc_tm_to_ktime later, the result may be a very large value +(possibly KTIME_MAX). If there are periodic timers in rtc->timerqueue, +they will continually expire, may causing kernel softlockup. + +Fixes: 6610e0893b8b ("RTC: Rework RTC code to use timerqueue for events") +Signed-off-by: Yongliang Gao +Acked-by: Jingqun Li +Link: https://lore.kernel.org/r/20241011043153.3788112-1-leonylgao@gmail.com +Signed-off-by: Alexandre Belloni +Signed-off-by: Sasha Levin +--- + drivers/rtc/interface.c | 7 ++++++- + 1 file changed, 6 insertions(+), 1 deletion(-) + +diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c +index cca650b2e0b94..aaf76406cd7d7 100644 +--- a/drivers/rtc/interface.c ++++ b/drivers/rtc/interface.c +@@ -904,13 +904,18 @@ void rtc_timer_do_work(struct work_struct *work) + struct timerqueue_node *next; + ktime_t now; + struct rtc_time tm; ++ int err; + + struct rtc_device *rtc = + container_of(work, struct rtc_device, irqwork); + + mutex_lock(&rtc->ops_lock); + again: +- __rtc_read_time(rtc, &tm); ++ err = __rtc_read_time(rtc, &tm); ++ if (err) { ++ mutex_unlock(&rtc->ops_lock); ++ return; ++ } + now = rtc_tm_to_ktime(tm); + while ((next = timerqueue_getnext(&rtc->timerqueue))) { + if (next->expires > now) +-- +2.43.0 + diff --git a/queue-6.11/rtc-rzn1-fix-bcd-to-rtc_time-conversion-errors.patch b/queue-6.11/rtc-rzn1-fix-bcd-to-rtc_time-conversion-errors.patch new file mode 100644 index 00000000000..0452bcb623a --- /dev/null +++ b/queue-6.11/rtc-rzn1-fix-bcd-to-rtc_time-conversion-errors.patch @@ -0,0 +1,52 @@ +From d60f26ae63c90eb7d76a1fc3f79a60c4ca34ed14 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 13 Nov 2024 12:30:32 +0100 +Subject: rtc: rzn1: fix BCD to rtc_time conversion errors + +From: Wolfram Sang + +[ Upstream commit 55727188dfa3572aecd946e58fab9e4a64f06894 ] + +tm_mon describes months from 0 to 11, but the register contains BCD from +1 to 12. tm_year contains years since 1900, but the BCD contains 20XX. +Apply the offsets when converting these numbers. + +Fixes: deeb4b5393e1 ("rtc: rzn1: Add new RTC driver") +Signed-off-by: Wolfram Sang +Reviewed-by: Miquel Raynal +Link: https://lore.kernel.org/r/20241113113032.27409-1-wsa+renesas@sang-engineering.com +Signed-off-by: Alexandre Belloni +Signed-off-by: Sasha Levin +--- + drivers/rtc/rtc-rzn1.c | 8 ++++---- + 1 file changed, 4 insertions(+), 4 deletions(-) + +diff --git a/drivers/rtc/rtc-rzn1.c b/drivers/rtc/rtc-rzn1.c +index 56ebbd4d04814..8570c8e63d70c 100644 +--- a/drivers/rtc/rtc-rzn1.c ++++ b/drivers/rtc/rtc-rzn1.c +@@ -111,8 +111,8 @@ static int rzn1_rtc_read_time(struct device *dev, struct rtc_time *tm) + tm->tm_hour = bcd2bin(tm->tm_hour); + tm->tm_wday = bcd2bin(tm->tm_wday); + tm->tm_mday = bcd2bin(tm->tm_mday); +- tm->tm_mon = bcd2bin(tm->tm_mon); +- tm->tm_year = bcd2bin(tm->tm_year); ++ tm->tm_mon = bcd2bin(tm->tm_mon) - 1; ++ tm->tm_year = bcd2bin(tm->tm_year) + 100; + + return 0; + } +@@ -128,8 +128,8 @@ static int rzn1_rtc_set_time(struct device *dev, struct rtc_time *tm) + tm->tm_hour = bin2bcd(tm->tm_hour); + tm->tm_wday = bin2bcd(rzn1_rtc_tm_to_wday(tm)); + tm->tm_mday = bin2bcd(tm->tm_mday); +- tm->tm_mon = bin2bcd(tm->tm_mon); +- tm->tm_year = bin2bcd(tm->tm_year); ++ tm->tm_mon = bin2bcd(tm->tm_mon + 1); ++ tm->tm_year = bin2bcd(tm->tm_year - 100); + + val = readl(rtc->base + RZN1_RTC_CTL2); + if (!(val & RZN1_RTC_CTL2_STOPPED)) { +-- +2.43.0 + diff --git a/queue-6.11/rtc-st-lpc-use-irqf_no_autoen-flag-in-request_irq.patch b/queue-6.11/rtc-st-lpc-use-irqf_no_autoen-flag-in-request_irq.patch new file mode 100644 index 00000000000..549b458ff00 --- /dev/null +++ b/queue-6.11/rtc-st-lpc-use-irqf_no_autoen-flag-in-request_irq.patch @@ -0,0 +1,50 @@ +From af20a3c61e302f54db91abbf3f2a1e6793a0fd13 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 12 Sep 2024 11:37:27 +0800 +Subject: rtc: st-lpc: Use IRQF_NO_AUTOEN flag in request_irq() + +From: Jinjie Ruan + +[ Upstream commit b6cd7adec0cf03f0aefc55676e71dd721cbc71a8 ] + +If request_irq() fails in st_rtc_probe(), there is no need to enable +the irq, and if it succeeds, disable_irq() after request_irq() still has +a time gap in which interrupts can come. + +request_irq() with IRQF_NO_AUTOEN flag will disable IRQ auto-enable when +request IRQ. + +Fixes: b5b2bdfc2893 ("rtc: st: Add new driver for ST's LPC RTC") +Signed-off-by: Jinjie Ruan +Link: https://lore.kernel.org/r/20240912033727.3013951-1-ruanjinjie@huawei.com +Signed-off-by: Alexandre Belloni +Signed-off-by: Sasha Levin +--- + drivers/rtc/rtc-st-lpc.c | 5 ++--- + 1 file changed, 2 insertions(+), 3 deletions(-) + +diff --git a/drivers/rtc/rtc-st-lpc.c b/drivers/rtc/rtc-st-lpc.c +index d492a2d26600c..c6d4522411b31 100644 +--- a/drivers/rtc/rtc-st-lpc.c ++++ b/drivers/rtc/rtc-st-lpc.c +@@ -218,15 +218,14 @@ static int st_rtc_probe(struct platform_device *pdev) + return -EINVAL; + } + +- ret = devm_request_irq(&pdev->dev, rtc->irq, st_rtc_handler, 0, +- pdev->name, rtc); ++ ret = devm_request_irq(&pdev->dev, rtc->irq, st_rtc_handler, ++ IRQF_NO_AUTOEN, pdev->name, rtc); + if (ret) { + dev_err(&pdev->dev, "Failed to request irq %i\n", rtc->irq); + return ret; + } + + enable_irq_wake(rtc->irq); +- disable_irq(rtc->irq); + + rtc->clk = devm_clk_get_enabled(&pdev->dev, NULL); + if (IS_ERR(rtc->clk)) +-- +2.43.0 + diff --git a/queue-6.11/series b/queue-6.11/series index 05c2f394f2b..50dfcb4356e 100644 --- a/queue-6.11/series +++ b/queue-6.11/series @@ -763,3 +763,54 @@ ipc-fix-memleak-if-msg_init_ns-failed-in-create_ipc_ns.patch input-cs40l50-fix-wrong-usage-of-init_work.patch nfsd-prevent-a-potential-integer-overflow.patch sunrpc-make-sure-cache-entry-active-before-cache_show.patch +um-fix-potential-integer-overflow-during-physmem-set.patch +um-fix-the-return-value-of-elf_core_copy_task_fpregs.patch +kfifo-don-t-include-dma-mapping.h-in-kfifo.h.patch +um-ubd-initialize-ubd-s-disk-pointer-in-ubd_add.patch +um-always-dump-trace-for-specified-task-in-show_stac.patch +nfsv4.0-fix-a-use-after-free-problem-in-the-asynchro.patch +rtc-st-lpc-use-irqf_no_autoen-flag-in-request_irq.patch +rtc-abx80x-fix-wdt-bit-position-of-the-status-regist.patch +rtc-check-if-__rtc_read_time-was-successful-in-rtc_t.patch +ubi-fastmap-wl-schedule-fm_work-if-wear-leveling-poo.patch +ubifs-correct-the-total-block-count-by-deducting-jou.patch +ubi-fastmap-fix-duplicate-slab-cache-names-while-att.patch +ubifs-authentication-fix-use-after-free-in-ubifs_tnc.patch +jffs2-fix-use-of-uninitialized-variable.patch +rtc-rzn1-fix-bcd-to-rtc_time-conversion-errors.patch +revert-nfs-don-t-reuse-partially-completed-requests-.patch +nvme-multipath-avoid-hang-on-inaccessible-namespaces.patch +nvme-multipath-fix-rcu-list-traversal-to-use-srcu-pr.patch +blk-mq-add-non_owner-variant-of-start_freeze-unfreez.patch +block-model-freeze-enter-queue-as-lock-for-supportin.patch +block-fix-uaf-for-flush-rq-while-iterating-tags.patch +block-return-unsigned-int-from-bdev_io_min.patch +nvme-fabrics-fix-kernel-crash-while-shutting-down-co.patch +9p-xen-fix-init-sequence.patch +9p-xen-fix-release-of-irq.patch +perf-arm-smmuv3-fix-lockdep-assert-in-event_init.patch +perf-arm-cmn-ensure-port-and-device-id-bits-are-set-.patch +smb-client-disable-directory-caching-when-dir_cache_.patch +x86-documentation-update-algo-in-init_size-descripti.patch +cifs-fix-parsing-native-symlinks-relative-to-the-exp.patch +cifs-fix-parsing-reparse-point-with-native-symlink-i.patch +rtc-ab-eoz9-don-t-fail-temperature-reads-on-undervol.patch +rename-.data.unlikely-to-.data.unlikely.patch +rename-.data.once-to-.data.once-to-fix-resetting-war.patch +kbuild-deb-pkg-don-t-fail-if-modules.order-is-missin.patch +smb-initialize-cfid-tcon-before-performing-network-o.patch +block-don-t-allow-an-atomic-write-be-truncated-in-bl.patch +modpost-remove-incorrect-code-in-do_eisa_entry.patch +cifs-during-remount-make-sure-passwords-are-in-sync.patch +cifs-unlock-on-error-in-smb3_reconfigure.patch +nfs-ignore-sb_rdonly-when-mounting-nfs.patch +sunrpc-clear-xprt_sock_upd_timeout-when-reset-transp.patch +sunrpc-timeout-and-cancel-tls-handshake-with-etimedo.patch +sunrpc-fix-one-uaf-issue-caused-by-sunrpc-kernel-tcp.patch +nfs-blocklayout-don-t-attempt-unregister-for-invalid.patch +nfs-blocklayout-limit-repeat-device-registration-on-.patch +block-bfq-fix-bfqq-uaf-in-bfq_limit_depth.patch +brd-decrease-the-number-of-allocated-pages-which-dis.patch +sh-intc-fix-use-after-free-bug-in-register_intc_cont.patch +tools-power-turbostat-fix-trailing-n-parsing.patch +tools-power-turbostat-fix-child-s-argument-forwardin.patch diff --git a/queue-6.11/sh-intc-fix-use-after-free-bug-in-register_intc_cont.patch b/queue-6.11/sh-intc-fix-use-after-free-bug-in-register_intc_cont.patch new file mode 100644 index 00000000000..29fa673afb3 --- /dev/null +++ b/queue-6.11/sh-intc-fix-use-after-free-bug-in-register_intc_cont.patch @@ -0,0 +1,46 @@ +From e1b6eecbea26687a7597cab469861f52be735699 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 23 Oct 2024 11:41:59 +0300 +Subject: sh: intc: Fix use-after-free bug in register_intc_controller() + +From: Dan Carpenter + +[ Upstream commit 63e72e551942642c48456a4134975136cdcb9b3c ] + +In the error handling for this function, d is freed without ever +removing it from intc_list which would lead to a use after free. +To fix this, let's only add it to the list after everything has +succeeded. + +Fixes: 2dcec7a988a1 ("sh: intc: set_irq_wake() support") +Signed-off-by: Dan Carpenter +Reviewed-by: John Paul Adrian Glaubitz +Signed-off-by: John Paul Adrian Glaubitz +Signed-off-by: Sasha Levin +--- + drivers/sh/intc/core.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/drivers/sh/intc/core.c b/drivers/sh/intc/core.c +index 74350b5871dc8..ea571eeb30787 100644 +--- a/drivers/sh/intc/core.c ++++ b/drivers/sh/intc/core.c +@@ -209,7 +209,6 @@ int __init register_intc_controller(struct intc_desc *desc) + goto err0; + + INIT_LIST_HEAD(&d->list); +- list_add_tail(&d->list, &intc_list); + + raw_spin_lock_init(&d->lock); + INIT_RADIX_TREE(&d->tree, GFP_ATOMIC); +@@ -369,6 +368,7 @@ int __init register_intc_controller(struct intc_desc *desc) + + d->skip_suspend = desc->skip_syscore_suspend; + ++ list_add_tail(&d->list, &intc_list); + nr_intc_controllers++; + + return 0; +-- +2.43.0 + diff --git a/queue-6.11/smb-client-disable-directory-caching-when-dir_cache_.patch b/queue-6.11/smb-client-disable-directory-caching-when-dir_cache_.patch new file mode 100644 index 00000000000..270479e7dfb --- /dev/null +++ b/queue-6.11/smb-client-disable-directory-caching-when-dir_cache_.patch @@ -0,0 +1,44 @@ +From 2da3ad3457d010536d4cba8685c13346e24abc63 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 22 Nov 2024 22:14:35 -0300 +Subject: smb: client: disable directory caching when dir_cache_timeout is zero + +From: Henrique Carvalho + +[ Upstream commit ceaf1451990e3ea7fb50aebb5a149f57945f6e9f ] + +Setting dir_cache_timeout to zero should disable the caching of +directory contents. Currently, even when dir_cache_timeout is zero, +some caching related functions are still invoked, which is unintended +behavior. + +Fix the issue by setting tcon->nohandlecache to true when +dir_cache_timeout is zero, ensuring that directory handle caching +is properly disabled. + +Fixes: 238b351d0935 ("smb3: allow controlling length of time directory entries are cached with dir leases") +Reviewed-by: Paulo Alcantara (Red Hat) +Reviewed-by: Enzo Matsumiya +Signed-off-by: Henrique Carvalho +Signed-off-by: Steve French +Signed-off-by: Sasha Levin +--- + fs/smb/client/connect.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c +index be0f91000171b..9cf87ba0fb8c9 100644 +--- a/fs/smb/client/connect.c ++++ b/fs/smb/client/connect.c +@@ -2601,7 +2601,7 @@ cifs_get_tcon(struct cifs_ses *ses, struct smb3_fs_context *ctx) + + if (ses->server->dialect >= SMB20_PROT_ID && + (ses->server->capabilities & SMB2_GLOBAL_CAP_DIRECTORY_LEASING)) +- nohandlecache = ctx->nohandlecache; ++ nohandlecache = ctx->nohandlecache || !dir_cache_timeout; + else + nohandlecache = true; + tcon = tcon_info_alloc(!nohandlecache, netfs_trace_tcon_ref_new); +-- +2.43.0 + diff --git a/queue-6.11/smb-initialize-cfid-tcon-before-performing-network-o.patch b/queue-6.11/smb-initialize-cfid-tcon-before-performing-network-o.patch new file mode 100644 index 00000000000..ce7ce6d2445 --- /dev/null +++ b/queue-6.11/smb-initialize-cfid-tcon-before-performing-network-o.patch @@ -0,0 +1,45 @@ +From 3cab3a730605a75984bcda027533d5f875c38d60 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 26 Nov 2024 18:50:31 -0600 +Subject: smb: Initialize cfid->tcon before performing network ops + +From: Paul Aurich + +[ Upstream commit c353ee4fb119a2582d0e011f66a76a38f5cf984d ] + +Avoid leaking a tcon ref when a lease break races with opening the +cached directory. Processing the leak break might take a reference to +the tcon in cached_dir_lease_break() and then fail to release the ref in +cached_dir_offload_close, since cfid->tcon is still NULL. + +Fixes: ebe98f1447bb ("cifs: enable caching of directories for which a lease is held") +Signed-off-by: Paul Aurich +Signed-off-by: Steve French +Signed-off-by: Sasha Levin +--- + fs/smb/client/cached_dir.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c +index 004349a7ab69d..9c0ef4195b582 100644 +--- a/fs/smb/client/cached_dir.c ++++ b/fs/smb/client/cached_dir.c +@@ -227,6 +227,7 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon, + } + } + cfid->dentry = dentry; ++ cfid->tcon = tcon; + + /* + * We do not hold the lock for the open because in case +@@ -298,7 +299,6 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon, + } + goto oshr_free; + } +- cfid->tcon = tcon; + cfid->is_open = true; + + spin_lock(&cfids->cfid_list_lock); +-- +2.43.0 + diff --git a/queue-6.11/sunrpc-clear-xprt_sock_upd_timeout-when-reset-transp.patch b/queue-6.11/sunrpc-clear-xprt_sock_upd_timeout-when-reset-transp.patch new file mode 100644 index 00000000000..4ec3825a50c --- /dev/null +++ b/queue-6.11/sunrpc-clear-xprt_sock_upd_timeout-when-reset-transp.patch @@ -0,0 +1,38 @@ +From 0f19dd37bbdaa08b125a36a66417ee624c0f6d9d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 15 Nov 2024 17:38:04 +0800 +Subject: sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reset transport + +From: Liu Jian + +[ Upstream commit 4db9ad82a6c823094da27de4825af693a3475d51 ] + +Since transport->sock has been set to NULL during reset transport, +XPRT_SOCK_UPD_TIMEOUT also needs to be cleared. Otherwise, the +xs_tcp_set_socket_timeouts() may be triggered in xs_tcp_send_request() +to dereference the transport->sock that has been set to NULL. + +Fixes: 7196dbb02ea0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly") +Signed-off-by: Li Lingfeng +Signed-off-by: Liu Jian +Signed-off-by: Trond Myklebust +Signed-off-by: Sasha Levin +--- + net/sunrpc/xprtsock.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c +index 1326fbf45a347..539cdda2093e5 100644 +--- a/net/sunrpc/xprtsock.c ++++ b/net/sunrpc/xprtsock.c +@@ -1198,6 +1198,7 @@ static void xs_sock_reset_state_flags(struct rpc_xprt *xprt) + clear_bit(XPRT_SOCK_WAKE_WRITE, &transport->sock_state); + clear_bit(XPRT_SOCK_WAKE_DISCONNECT, &transport->sock_state); + clear_bit(XPRT_SOCK_NOSPACE, &transport->sock_state); ++ clear_bit(XPRT_SOCK_UPD_TIMEOUT, &transport->sock_state); + } + + static void xs_run_error_worker(struct sock_xprt *transport, unsigned int nr) +-- +2.43.0 + diff --git a/queue-6.11/sunrpc-fix-one-uaf-issue-caused-by-sunrpc-kernel-tcp.patch b/queue-6.11/sunrpc-fix-one-uaf-issue-caused-by-sunrpc-kernel-tcp.patch new file mode 100644 index 00000000000..a443d0c3b9a --- /dev/null +++ b/queue-6.11/sunrpc-fix-one-uaf-issue-caused-by-sunrpc-kernel-tcp.patch @@ -0,0 +1,165 @@ +From 72905193f9838e12a6dd9cec40065514294dee22 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 12 Nov 2024 21:54:34 +0800 +Subject: sunrpc: fix one UAF issue caused by sunrpc kernel tcp socket + +From: Liu Jian + +[ Upstream commit 3f23f96528e8fcf8619895c4c916c52653892ec1 ] + +BUG: KASAN: slab-use-after-free in tcp_write_timer_handler+0x156/0x3e0 +Read of size 1 at addr ffff888111f322cd by task swapper/0/0 + +CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc4-dirty #7 +Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 +Call Trace: + + dump_stack_lvl+0x68/0xa0 + print_address_description.constprop.0+0x2c/0x3d0 + print_report+0xb4/0x270 + kasan_report+0xbd/0xf0 + tcp_write_timer_handler+0x156/0x3e0 + tcp_write_timer+0x66/0x170 + call_timer_fn+0xfb/0x1d0 + __run_timers+0x3f8/0x480 + run_timer_softirq+0x9b/0x100 + handle_softirqs+0x153/0x390 + __irq_exit_rcu+0x103/0x120 + irq_exit_rcu+0xe/0x20 + sysvec_apic_timer_interrupt+0x76/0x90 + + + asm_sysvec_apic_timer_interrupt+0x1a/0x20 +RIP: 0010:default_idle+0xf/0x20 +Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 + 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 33 f8 25 00 fb f4 c3 cc cc cc + cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 +RSP: 0018:ffffffffa2007e28 EFLAGS: 00000242 +RAX: 00000000000f3b31 RBX: 1ffffffff4400fc7 RCX: ffffffffa09c3196 +RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9f00590f +RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed102360835d +R10: ffff88811b041aeb R11: 0000000000000001 R12: 0000000000000000 +R13: ffffffffa202d7c0 R14: 0000000000000000 R15: 00000000000147d0 + default_idle_call+0x6b/0xa0 + cpuidle_idle_call+0x1af/0x1f0 + do_idle+0xbc/0x130 + cpu_startup_entry+0x33/0x40 + rest_init+0x11f/0x210 + start_kernel+0x39a/0x420 + x86_64_start_reservations+0x18/0x30 + x86_64_start_kernel+0x97/0xa0 + common_startup_64+0x13e/0x141 + + +Allocated by task 595: + kasan_save_stack+0x24/0x50 + kasan_save_track+0x14/0x30 + __kasan_slab_alloc+0x87/0x90 + kmem_cache_alloc_noprof+0x12b/0x3f0 + copy_net_ns+0x94/0x380 + create_new_namespaces+0x24c/0x500 + unshare_nsproxy_namespaces+0x75/0xf0 + ksys_unshare+0x24e/0x4f0 + __x64_sys_unshare+0x1f/0x30 + do_syscall_64+0x70/0x180 + entry_SYSCALL_64_after_hwframe+0x76/0x7e + +Freed by task 100: + kasan_save_stack+0x24/0x50 + kasan_save_track+0x14/0x30 + kasan_save_free_info+0x3b/0x60 + __kasan_slab_free+0x54/0x70 + kmem_cache_free+0x156/0x5d0 + cleanup_net+0x5d3/0x670 + process_one_work+0x776/0xa90 + worker_thread+0x2e2/0x560 + kthread+0x1a8/0x1f0 + ret_from_fork+0x34/0x60 + ret_from_fork_asm+0x1a/0x30 + +Reproduction script: + +mkdir -p /mnt/nfsshare +mkdir -p /mnt/nfs/netns_1 +mkfs.ext4 /dev/sdb +mount /dev/sdb /mnt/nfsshare +systemctl restart nfs-server +chmod 777 /mnt/nfsshare +exportfs -i -o rw,no_root_squash *:/mnt/nfsshare + +ip netns add netns_1 +ip link add name veth_1_peer type veth peer veth_1 +ifconfig veth_1_peer 11.11.0.254 up +ip link set veth_1 netns netns_1 +ip netns exec netns_1 ifconfig veth_1 11.11.0.1 + +ip netns exec netns_1 /root/iptables -A OUTPUT -d 11.11.0.254 -p tcp \ + --tcp-flags FIN FIN -j DROP + +(note: In my environment, a DESTROY_CLIENTID operation is always sent + immediately, breaking the nfs tcp connection.) +ip netns exec netns_1 timeout -s 9 300 mount -t nfs -o proto=tcp,vers=4.1 \ + 11.11.0.254:/mnt/nfsshare /mnt/nfs/netns_1 + +ip netns del netns_1 + +The reason here is that the tcp socket in netns_1 (nfs side) has been +shutdown and closed (done in xs_destroy), but the FIN message (with ack) +is discarded, and the nfsd side keeps sending retransmission messages. +As a result, when the tcp sock in netns_1 processes the received message, +it sends the message (FIN message) in the sending queue, and the tcp timer +is re-established. When the network namespace is deleted, the net structure +accessed by tcp's timer handler function causes problems. + +To fix this problem, let's hold netns refcnt for the tcp kernel socket as +done in other modules. This is an ugly hack which can easily be backported +to earlier kernels. A proper fix which cleans up the interfaces will +follow, but may not be so easy to backport. + +Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") +Signed-off-by: Liu Jian +Acked-by: Jeff Layton +Reviewed-by: Kuniyuki Iwashima +Signed-off-by: Trond Myklebust +Signed-off-by: Sasha Levin +--- + net/sunrpc/svcsock.c | 4 ++++ + net/sunrpc/xprtsock.c | 7 +++++++ + 2 files changed, 11 insertions(+) + +diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c +index 6b3f01beb294b..9a97ffef3adf0 100644 +--- a/net/sunrpc/svcsock.c ++++ b/net/sunrpc/svcsock.c +@@ -1552,6 +1552,10 @@ static struct svc_xprt *svc_create_socket(struct svc_serv *serv, + newlen = error; + + if (protocol == IPPROTO_TCP) { ++ __netns_tracker_free(net, &sock->sk->ns_tracker, false); ++ sock->sk->sk_net_refcnt = 1; ++ get_net_track(net, &sock->sk->ns_tracker, GFP_KERNEL); ++ sock_inuse_add(net, 1); + if ((error = kernel_listen(sock, 64)) < 0) + goto bummer; + } +diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c +index 43fb96de8ebe5..b69e6290acfab 100644 +--- a/net/sunrpc/xprtsock.c ++++ b/net/sunrpc/xprtsock.c +@@ -1940,6 +1940,13 @@ static struct socket *xs_create_sock(struct rpc_xprt *xprt, + goto out; + } + ++ if (protocol == IPPROTO_TCP) { ++ __netns_tracker_free(xprt->xprt_net, &sock->sk->ns_tracker, false); ++ sock->sk->sk_net_refcnt = 1; ++ get_net_track(xprt->xprt_net, &sock->sk->ns_tracker, GFP_KERNEL); ++ sock_inuse_add(xprt->xprt_net, 1); ++ } ++ + filp = sock_alloc_file(sock, O_NONBLOCK, NULL); + if (IS_ERR(filp)) + return ERR_CAST(filp); +-- +2.43.0 + diff --git a/queue-6.11/sunrpc-timeout-and-cancel-tls-handshake-with-etimedo.patch b/queue-6.11/sunrpc-timeout-and-cancel-tls-handshake-with-etimedo.patch new file mode 100644 index 00000000000..f5169eca04c --- /dev/null +++ b/queue-6.11/sunrpc-timeout-and-cancel-tls-handshake-with-etimedo.patch @@ -0,0 +1,64 @@ +From 6e944b2cf9d4537ac883cbb7fb66ac11d534be10 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 15 Nov 2024 08:59:36 -0500 +Subject: SUNRPC: timeout and cancel TLS handshake with -ETIMEDOUT + +From: Benjamin Coddington + +[ Upstream commit d7bdd849ef1b681da03ac05ca0957b2cbe2d24b6 ] + +We've noticed a situation where an unstable TCP connection can cause the +TLS handshake to timeout waiting for userspace to complete it. When this +happens, we don't want to return from xs_tls_handshake_sync() with zero, as +this will cause the upper xprt to be set CONNECTED, and subsequent attempts +to transmit will be returned with -EPIPE. The sunrpc machine does not +recover from this situation and will spin attempting to transmit. + +The return value of tls_handshake_cancel() can be used to detect a race +with completion: + + * tls_handshake_cancel - cancel a pending handshake + * Return values: + * %true - Uncompleted handshake request was canceled + * %false - Handshake request already completed or not found + +If true, we do not want the upper xprt to be connected, so return +-ETIMEDOUT. If false, its possible the handshake request was lost and +that may be the reason for our timeout. Again we do not want the upper +xprt to be connected, so return -ETIMEDOUT. + +Ensure that we alway return an error from xs_tls_handshake_sync() if we +call tls_handshake_cancel(). + +Signed-off-by: Benjamin Coddington +Reviewed-by: Chuck Lever +Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class") +Signed-off-by: Trond Myklebust +Signed-off-by: Sasha Levin +--- + net/sunrpc/xprtsock.c | 9 ++++----- + 1 file changed, 4 insertions(+), 5 deletions(-) + +diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c +index 539cdda2093e5..43fb96de8ebe5 100644 +--- a/net/sunrpc/xprtsock.c ++++ b/net/sunrpc/xprtsock.c +@@ -2615,11 +2615,10 @@ static int xs_tls_handshake_sync(struct rpc_xprt *lower_xprt, struct xprtsec_par + rc = wait_for_completion_interruptible_timeout(&lower_transport->handshake_done, + XS_TLS_HANDSHAKE_TO); + if (rc <= 0) { +- if (!tls_handshake_cancel(sk)) { +- if (rc == 0) +- rc = -ETIMEDOUT; +- goto out_put_xprt; +- } ++ tls_handshake_cancel(sk); ++ if (rc == 0) ++ rc = -ETIMEDOUT; ++ goto out_put_xprt; + } + + rc = lower_transport->xprt_err; +-- +2.43.0 + diff --git a/queue-6.11/tools-power-turbostat-fix-child-s-argument-forwardin.patch b/queue-6.11/tools-power-turbostat-fix-child-s-argument-forwardin.patch new file mode 100644 index 00000000000..f5f0e733b0f --- /dev/null +++ b/queue-6.11/tools-power-turbostat-fix-child-s-argument-forwardin.patch @@ -0,0 +1,37 @@ +From a980ad7721b6a3a432fb13f84f89a8cc08ed3363 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 13 Nov 2024 15:48:22 +0100 +Subject: tools/power turbostat: Fix child's argument forwarding + +From: Patryk Wlazlyn + +[ Upstream commit 1da0daf746342dfdc114e4dc8fbf3ece28666d4f ] + +Add '+' to optstring when early scanning for --no-msr and --no-perf. +It causes option processing to stop as soon as a nonoption argument is +encountered, effectively skipping child's arguments. + +Fixes: 3e4048466c39 ("tools/power turbostat: Add --no-msr option") +Signed-off-by: Patryk Wlazlyn +Signed-off-by: Len Brown +Signed-off-by: Sasha Levin +--- + tools/power/x86/turbostat/turbostat.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c +index aa9200319d0ea..a5ebee8b23bbe 100644 +--- a/tools/power/x86/turbostat/turbostat.c ++++ b/tools/power/x86/turbostat/turbostat.c +@@ -9784,7 +9784,7 @@ void cmdline(int argc, char **argv) + * Parse some options early, because they may make other options invalid, + * like adding the MSR counter with --add and at the same time using --no-msr. + */ +- while ((opt = getopt_long_only(argc, argv, "MPn:", long_options, &option_index)) != -1) { ++ while ((opt = getopt_long_only(argc, argv, "+MPn:", long_options, &option_index)) != -1) { + switch (opt) { + case 'M': + no_msr = 1; +-- +2.43.0 + diff --git a/queue-6.11/tools-power-turbostat-fix-trailing-n-parsing.patch b/queue-6.11/tools-power-turbostat-fix-trailing-n-parsing.patch new file mode 100644 index 00000000000..b4e6b651f66 --- /dev/null +++ b/queue-6.11/tools-power-turbostat-fix-trailing-n-parsing.patch @@ -0,0 +1,55 @@ +From 2419eb923150e8a35ed061a8a5f69a4adce05e19 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 27 Aug 2024 13:07:51 +0800 +Subject: tools/power turbostat: Fix trailing '\n' parsing + +From: Zhang Rui + +[ Upstream commit fed8511cc8996989178823052dc0200643e1389a ] + +parse_cpu_string() parses the string input either from command line or +from /sys/fs/cgroup/cpuset.cpus.effective to get a list of CPUs that +turbostat can run with. + +The cpu string returned by /sys/fs/cgroup/cpuset.cpus.effective contains +a trailing '\n', but strtoul() fails to treat this as an error. + +That says, for the code below + val = ("\n", NULL, 10); +val returns 0, and errno is also not set. + +As a result, CPU0 is erroneously considered as allowed CPU and this +causes failures when turbostat tries to run on CPU0. + + get_counters: Could not migrate to CPU 0 + ... + turbostat: re-initialized with num_cpus 8, allowed_cpus 5 + get_counters: Could not migrate to CPU 0 + +Add a check to return immediately if '\n' or '\0' is detected. + +Fixes: 8c3dd2c9e542 ("tools/power/turbostat: Abstrct function for parsing cpu string") +Signed-off-by: Zhang Rui +Signed-off-by: Len Brown +Signed-off-by: Sasha Levin +--- + tools/power/x86/turbostat/turbostat.c | 3 +++ + 1 file changed, 3 insertions(+) + +diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c +index 089220aaa5c92..aa9200319d0ea 100644 +--- a/tools/power/x86/turbostat/turbostat.c ++++ b/tools/power/x86/turbostat/turbostat.c +@@ -5385,6 +5385,9 @@ static int parse_cpu_str(char *cpu_str, cpu_set_t *cpu_set, int cpu_set_size) + if (*next == '-') /* no negative cpu numbers */ + return 1; + ++ if (*next == '\0' || *next == '\n') ++ break; ++ + start = strtoul(next, &next, 10); + + if (start >= CPU_SUBSET_MAXCPUS) +-- +2.43.0 + diff --git a/queue-6.11/ubi-fastmap-fix-duplicate-slab-cache-names-while-att.patch b/queue-6.11/ubi-fastmap-fix-duplicate-slab-cache-names-while-att.patch new file mode 100644 index 00000000000..6830c040954 --- /dev/null +++ b/queue-6.11/ubi-fastmap-fix-duplicate-slab-cache-names-while-att.patch @@ -0,0 +1,104 @@ +From a75a96d92d232b2a26aa8509f1cc05be1ce4888c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 11 Oct 2024 12:50:02 +0800 +Subject: ubi: fastmap: Fix duplicate slab cache names while attaching + +From: Zhihao Cheng + +[ Upstream commit bcddf52b7a17adcebc768d26f4e27cf79adb424c ] + +Since commit 4c39529663b9 ("slab: Warn on duplicate cache names when +DEBUG_VM=y"), the duplicate slab cache names can be detected and a +kernel WARNING is thrown out. +In UBI fast attaching process, alloc_ai() could be invoked twice +with the same slab cache name 'ubi_aeb_slab_cache', which will trigger +following warning messages: + kmem_cache of name 'ubi_aeb_slab_cache' already exists + WARNING: CPU: 0 PID: 7519 at mm/slab_common.c:107 + __kmem_cache_create_args+0x100/0x5f0 + Modules linked in: ubi(+) nandsim [last unloaded: nandsim] + CPU: 0 UID: 0 PID: 7519 Comm: modprobe Tainted: G 6.12.0-rc2 + RIP: 0010:__kmem_cache_create_args+0x100/0x5f0 + Call Trace: + __kmem_cache_create_args+0x100/0x5f0 + alloc_ai+0x295/0x3f0 [ubi] + ubi_attach+0x3c3/0xcc0 [ubi] + ubi_attach_mtd_dev+0x17cf/0x3fa0 [ubi] + ubi_init+0x3fb/0x800 [ubi] + do_init_module+0x265/0x7d0 + __x64_sys_finit_module+0x7a/0xc0 + +The problem could be easily reproduced by loading UBI device by fastmap +with CONFIG_DEBUG_VM=y. +Fix it by using different slab names for alloc_ai() callers. + +Fixes: d2158f69a7d4 ("UBI: Remove alloc_ai() slab name from parameter list") +Fixes: fdf10ed710c0 ("ubi: Rework Fastmap attach base code") +Signed-off-by: Zhihao Cheng +Signed-off-by: Richard Weinberger +Signed-off-by: Sasha Levin +--- + drivers/mtd/ubi/attach.c | 12 ++++++------ + 1 file changed, 6 insertions(+), 6 deletions(-) + +diff --git a/drivers/mtd/ubi/attach.c b/drivers/mtd/ubi/attach.c +index ae5abe492b52a..adc47b87b38a5 100644 +--- a/drivers/mtd/ubi/attach.c ++++ b/drivers/mtd/ubi/attach.c +@@ -1447,7 +1447,7 @@ static int scan_all(struct ubi_device *ubi, struct ubi_attach_info *ai, + return err; + } + +-static struct ubi_attach_info *alloc_ai(void) ++static struct ubi_attach_info *alloc_ai(const char *slab_name) + { + struct ubi_attach_info *ai; + +@@ -1461,7 +1461,7 @@ static struct ubi_attach_info *alloc_ai(void) + INIT_LIST_HEAD(&ai->alien); + INIT_LIST_HEAD(&ai->fastmap); + ai->volumes = RB_ROOT; +- ai->aeb_slab_cache = kmem_cache_create("ubi_aeb_slab_cache", ++ ai->aeb_slab_cache = kmem_cache_create(slab_name, + sizeof(struct ubi_ainf_peb), + 0, 0, NULL); + if (!ai->aeb_slab_cache) { +@@ -1491,7 +1491,7 @@ static int scan_fast(struct ubi_device *ubi, struct ubi_attach_info **ai) + + err = -ENOMEM; + +- scan_ai = alloc_ai(); ++ scan_ai = alloc_ai("ubi_aeb_slab_cache_fastmap"); + if (!scan_ai) + goto out; + +@@ -1557,7 +1557,7 @@ int ubi_attach(struct ubi_device *ubi, int force_scan) + int err; + struct ubi_attach_info *ai; + +- ai = alloc_ai(); ++ ai = alloc_ai("ubi_aeb_slab_cache"); + if (!ai) + return -ENOMEM; + +@@ -1575,7 +1575,7 @@ int ubi_attach(struct ubi_device *ubi, int force_scan) + if (err > 0 || mtd_is_eccerr(err)) { + if (err != UBI_NO_FASTMAP) { + destroy_ai(ai); +- ai = alloc_ai(); ++ ai = alloc_ai("ubi_aeb_slab_cache"); + if (!ai) + return -ENOMEM; + +@@ -1614,7 +1614,7 @@ int ubi_attach(struct ubi_device *ubi, int force_scan) + if (ubi->fm && ubi_dbg_chk_fastmap(ubi)) { + struct ubi_attach_info *scan_ai; + +- scan_ai = alloc_ai(); ++ scan_ai = alloc_ai("ubi_aeb_slab_cache_dbg_chk_fastmap"); + if (!scan_ai) { + err = -ENOMEM; + goto out_wl; +-- +2.43.0 + diff --git a/queue-6.11/ubi-fastmap-wl-schedule-fm_work-if-wear-leveling-poo.patch b/queue-6.11/ubi-fastmap-wl-schedule-fm_work-if-wear-leveling-poo.patch new file mode 100644 index 00000000000..712402c519d --- /dev/null +++ b/queue-6.11/ubi-fastmap-wl-schedule-fm_work-if-wear-leveling-poo.patch @@ -0,0 +1,98 @@ +From ca0d0479d366d8ab7ae20a8b6d50649af7da1f67 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 19 Aug 2024 11:26:22 +0800 +Subject: ubi: fastmap: wl: Schedule fm_work if wear-leveling pool is empty + +From: Zhihao Cheng + +[ Upstream commit c4595fe394a289927077e3da561db27811919ee0 ] + +Since commit 14072ee33d5a ("ubi: fastmap: Check wl_pool for free peb +before wear leveling"), wear_leveling_worker() won't schedule fm_work +if wear-leveling pool is empty, which could temporarily disable the +wear-leveling until the fastmap is updated(eg. pool becomes empty). +Fix it by scheduling fm_work if wl_pool is empty during wear-leveing. + +Fixes: 14072ee33d5a ("ubi: fastmap: Check wl_pool for free peb before wear leveling") +Signed-off-by: Zhihao Cheng +Signed-off-by: Richard Weinberger +Signed-off-by: Sasha Levin +--- + drivers/mtd/ubi/fastmap-wl.c | 19 ++++++++++++++++--- + drivers/mtd/ubi/wl.c | 2 +- + drivers/mtd/ubi/wl.h | 3 ++- + 3 files changed, 19 insertions(+), 5 deletions(-) + +diff --git a/drivers/mtd/ubi/fastmap-wl.c b/drivers/mtd/ubi/fastmap-wl.c +index 2a9cc9413c427..9bdb6525f1281 100644 +--- a/drivers/mtd/ubi/fastmap-wl.c ++++ b/drivers/mtd/ubi/fastmap-wl.c +@@ -346,14 +346,27 @@ int ubi_wl_get_peb(struct ubi_device *ubi) + * WL sub-system. + * + * @ubi: UBI device description object ++ * @need_fill: whether to fill wear-leveling pool when no PEBs are found + */ +-static struct ubi_wl_entry *next_peb_for_wl(struct ubi_device *ubi) ++static struct ubi_wl_entry *next_peb_for_wl(struct ubi_device *ubi, ++ bool need_fill) + { + struct ubi_fm_pool *pool = &ubi->fm_wl_pool; + int pnum; + +- if (pool->used == pool->size) ++ if (pool->used == pool->size) { ++ if (need_fill && !ubi->fm_work_scheduled) { ++ /* ++ * We cannot update the fastmap here because this ++ * function is called in atomic context. ++ * Let's fail here and refill/update it as soon as ++ * possible. ++ */ ++ ubi->fm_work_scheduled = 1; ++ schedule_work(&ubi->fm_work); ++ } + return NULL; ++ } + + pnum = pool->pebs[pool->used]; + return ubi->lookuptbl[pnum]; +@@ -375,7 +388,7 @@ static bool need_wear_leveling(struct ubi_device *ubi) + if (!ubi->used.rb_node) + return false; + +- e = next_peb_for_wl(ubi); ++ e = next_peb_for_wl(ubi, false); + if (!e) { + if (!ubi->free.rb_node) + return false; +diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c +index 8a26968aba11f..fbd399cf65033 100644 +--- a/drivers/mtd/ubi/wl.c ++++ b/drivers/mtd/ubi/wl.c +@@ -683,7 +683,7 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk, + ubi_assert(!ubi->move_to_put); + + #ifdef CONFIG_MTD_UBI_FASTMAP +- if (!next_peb_for_wl(ubi) || ++ if (!next_peb_for_wl(ubi, true) || + #else + if (!ubi->free.rb_node || + #endif +diff --git a/drivers/mtd/ubi/wl.h b/drivers/mtd/ubi/wl.h +index 7b6715ef6d4a3..a69169c35e310 100644 +--- a/drivers/mtd/ubi/wl.h ++++ b/drivers/mtd/ubi/wl.h +@@ -5,7 +5,8 @@ + static void update_fastmap_work_fn(struct work_struct *wrk); + static struct ubi_wl_entry *find_anchor_wl_entry(struct rb_root *root); + static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi); +-static struct ubi_wl_entry *next_peb_for_wl(struct ubi_device *ubi); ++static struct ubi_wl_entry *next_peb_for_wl(struct ubi_device *ubi, ++ bool need_fill); + static bool need_wear_leveling(struct ubi_device *ubi); + static void ubi_fastmap_close(struct ubi_device *ubi); + static inline void ubi_fastmap_init(struct ubi_device *ubi, int *count) +-- +2.43.0 + diff --git a/queue-6.11/ubifs-authentication-fix-use-after-free-in-ubifs_tnc.patch b/queue-6.11/ubifs-authentication-fix-use-after-free-in-ubifs_tnc.patch new file mode 100644 index 00000000000..7470a9d86c8 --- /dev/null +++ b/queue-6.11/ubifs-authentication-fix-use-after-free-in-ubifs_tnc.patch @@ -0,0 +1,171 @@ +From c2129cf9b564c5961cd3c9afc78acc7c5894c80d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 9 Oct 2024 16:46:59 +0200 +Subject: ubifs: authentication: Fix use-after-free in ubifs_tnc_end_commit + +From: Waqar Hameed + +[ Upstream commit 4617fb8fc15effe8eda4dd898d4e33eb537a7140 ] + +After an insertion in TNC, the tree might split and cause a node to +change its `znode->parent`. A further deletion of other nodes in the +tree (which also could free the nodes), the aforementioned node's +`znode->cparent` could still point to a freed node. This +`znode->cparent` may not be updated when getting nodes to commit in +`ubifs_tnc_start_commit()`. This could then trigger a use-after-free +when accessing the `znode->cparent` in `write_index()` in +`ubifs_tnc_end_commit()`. + +This can be triggered by running + + rm -f /etc/test-file.bin + dd if=/dev/urandom of=/etc/test-file.bin bs=1M count=60 conv=fsync + +in a loop, and with `CONFIG_UBIFS_FS_AUTHENTICATION`. KASAN then +reports: + + BUG: KASAN: use-after-free in ubifs_tnc_end_commit+0xa5c/0x1950 + Write of size 32 at addr ffffff800a3af86c by task ubifs_bgt0_20/153 + + Call trace: + dump_backtrace+0x0/0x340 + show_stack+0x18/0x24 + dump_stack_lvl+0x9c/0xbc + print_address_description.constprop.0+0x74/0x2b0 + kasan_report+0x1d8/0x1f0 + kasan_check_range+0xf8/0x1a0 + memcpy+0x84/0xf4 + ubifs_tnc_end_commit+0xa5c/0x1950 + do_commit+0x4e0/0x1340 + ubifs_bg_thread+0x234/0x2e0 + kthread+0x36c/0x410 + ret_from_fork+0x10/0x20 + + Allocated by task 401: + kasan_save_stack+0x38/0x70 + __kasan_kmalloc+0x8c/0xd0 + __kmalloc+0x34c/0x5bc + tnc_insert+0x140/0x16a4 + ubifs_tnc_add+0x370/0x52c + ubifs_jnl_write_data+0x5d8/0x870 + do_writepage+0x36c/0x510 + ubifs_writepage+0x190/0x4dc + __writepage+0x58/0x154 + write_cache_pages+0x394/0x830 + do_writepages+0x1f0/0x5b0 + filemap_fdatawrite_wbc+0x170/0x25c + file_write_and_wait_range+0x140/0x190 + ubifs_fsync+0xe8/0x290 + vfs_fsync_range+0xc0/0x1e4 + do_fsync+0x40/0x90 + __arm64_sys_fsync+0x34/0x50 + invoke_syscall.constprop.0+0xa8/0x260 + do_el0_svc+0xc8/0x1f0 + el0_svc+0x34/0x70 + el0t_64_sync_handler+0x108/0x114 + el0t_64_sync+0x1a4/0x1a8 + + Freed by task 403: + kasan_save_stack+0x38/0x70 + kasan_set_track+0x28/0x40 + kasan_set_free_info+0x28/0x4c + __kasan_slab_free+0xd4/0x13c + kfree+0xc4/0x3a0 + tnc_delete+0x3f4/0xe40 + ubifs_tnc_remove_range+0x368/0x73c + ubifs_tnc_remove_ino+0x29c/0x2e0 + ubifs_jnl_delete_inode+0x150/0x260 + ubifs_evict_inode+0x1d4/0x2e4 + evict+0x1c8/0x450 + iput+0x2a0/0x3c4 + do_unlinkat+0x2cc/0x490 + __arm64_sys_unlinkat+0x90/0x100 + invoke_syscall.constprop.0+0xa8/0x260 + do_el0_svc+0xc8/0x1f0 + el0_svc+0x34/0x70 + el0t_64_sync_handler+0x108/0x114 + el0t_64_sync+0x1a4/0x1a8 + +The offending `memcpy()` in `ubifs_copy_hash()` has a use-after-free +when a node becomes root in TNC but still has a `cparent` to an already +freed node. More specifically, consider the following TNC: + + zroot + / + / + zp1 + / + / + zn + +Inserting a new node `zn_new` with a key smaller then `zn` will trigger +a split in `tnc_insert()` if `zp1` is full: + + zroot + / \ + / \ + zp1 zp2 + / \ + / \ + zn_new zn + +`zn->parent` has now been moved to `zp2`, *but* `zn->cparent` still +points to `zp1`. + +Now, consider a removal of all the nodes _except_ `zn`. Just when +`tnc_delete()` is about to delete `zroot` and `zp2`: + + zroot + \ + \ + zp2 + \ + \ + zn + +`zroot` and `zp2` get freed and the tree collapses: + + zn + +`zn` now becomes the new `zroot`. + +`get_znodes_to_commit()` will now only find `zn`, the new `zroot`, and +`write_index()` will check its `znode->cparent` that wrongly points to +the already freed `zp1`. `ubifs_copy_hash()` thus gets wrongly called +with `znode->cparent->zbranch[znode->iip].hash` that triggers the +use-after-free! + +Fix this by explicitly setting `znode->cparent` to `NULL` in +`get_znodes_to_commit()` for the root node. The search for the dirty +nodes is bottom-up in the tree. Thus, when `find_next_dirty(znode)` +returns NULL, the current `znode` _is_ the root node. Add an assert for +this. + +Fixes: 16a26b20d2af ("ubifs: authentication: Add hashes to index nodes") +Tested-by: Waqar Hameed +Co-developed-by: Zhihao Cheng +Signed-off-by: Zhihao Cheng +Signed-off-by: Waqar Hameed +Reviewed-by: Zhihao Cheng +Signed-off-by: Richard Weinberger +Signed-off-by: Sasha Levin +--- + fs/ubifs/tnc_commit.c | 2 ++ + 1 file changed, 2 insertions(+) + +diff --git a/fs/ubifs/tnc_commit.c b/fs/ubifs/tnc_commit.c +index a55e04822d16e..7c43e0ccf6d47 100644 +--- a/fs/ubifs/tnc_commit.c ++++ b/fs/ubifs/tnc_commit.c +@@ -657,6 +657,8 @@ static int get_znodes_to_commit(struct ubifs_info *c) + znode->alt = 0; + cnext = find_next_dirty(znode); + if (!cnext) { ++ ubifs_assert(c, !znode->parent); ++ znode->cparent = NULL; + znode->cnext = c->cnext; + break; + } +-- +2.43.0 + diff --git a/queue-6.11/ubifs-correct-the-total-block-count-by-deducting-jou.patch b/queue-6.11/ubifs-correct-the-total-block-count-by-deducting-jou.patch new file mode 100644 index 00000000000..e6a5370ace8 --- /dev/null +++ b/queue-6.11/ubifs-correct-the-total-block-count-by-deducting-jou.patch @@ -0,0 +1,46 @@ +From 4e647ead6f4d7a3510670accacf23846f11d34be Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 5 Sep 2024 09:09:09 +0800 +Subject: ubifs: Correct the total block count by deducting journal reservation + +From: Zhihao Cheng + +[ Upstream commit 84a2bee9c49769310efa19601157ef50a1df1267 ] + +Since commit e874dcde1cbf ("ubifs: Reserve one leb for each journal +head while doing budget"), available space is calulated by deducting +reservation for all journal heads. However, the total block count ( +which is only used by statfs) is not updated yet, which will cause +the wrong displaying for used space(total - available). +Fix it by deducting reservation for all journal heads from total +block count. + +Fixes: e874dcde1cbf ("ubifs: Reserve one leb for each journal head while doing budget") +Signed-off-by: Zhihao Cheng +Signed-off-by: Richard Weinberger +Signed-off-by: Sasha Levin +--- + fs/ubifs/super.c | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c +index 291583005dd12..245a10cc1eeb4 100644 +--- a/fs/ubifs/super.c ++++ b/fs/ubifs/super.c +@@ -773,10 +773,10 @@ static void init_constants_master(struct ubifs_info *c) + * necessary to report something for the 'statfs()' call. + * + * Subtract the LEB reserved for GC, the LEB which is reserved for +- * deletions, minimum LEBs for the index, and assume only one journal +- * head is available. ++ * deletions, minimum LEBs for the index, the LEBs which are reserved ++ * for each journal head. + */ +- tmp64 = c->main_lebs - 1 - 1 - MIN_INDEX_LEBS - c->jhead_cnt + 1; ++ tmp64 = c->main_lebs - 1 - 1 - MIN_INDEX_LEBS - c->jhead_cnt; + tmp64 *= (long long)c->leb_size - c->leb_overhead; + tmp64 = ubifs_reported_space(c, tmp64); + c->block_cnt = tmp64 >> UBIFS_BLOCK_SHIFT; +-- +2.43.0 + diff --git a/queue-6.11/um-always-dump-trace-for-specified-task-in-show_stac.patch b/queue-6.11/um-always-dump-trace-for-specified-task-in-show_stac.patch new file mode 100644 index 00000000000..c495816c84c --- /dev/null +++ b/queue-6.11/um-always-dump-trace-for-specified-task-in-show_stac.patch @@ -0,0 +1,37 @@ +From e95befe203723d0d25ab6777b5add4e6953b9d5b Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 6 Nov 2024 18:39:33 +0800 +Subject: um: Always dump trace for specified task in show_stack + +From: Tiwei Bie + +[ Upstream commit 0f659ff362eac69777c4c191b7e5ccb19d76c67d ] + +Currently, show_stack() always dumps the trace of the current task. +However, it should dump the trace of the specified task if one is +provided. Otherwise, things like running "echo t > sysrq-trigger" +won't work as expected. + +Fixes: 970e51feaddb ("um: Add support for CONFIG_STACKTRACE") +Signed-off-by: Tiwei Bie +Link: https://patch.msgid.link/20241106103933.1132365-1-tiwei.btw@antgroup.com +Signed-off-by: Johannes Berg +Signed-off-by: Sasha Levin +--- + arch/um/kernel/sysrq.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/arch/um/kernel/sysrq.c b/arch/um/kernel/sysrq.c +index 746715379f12a..7e897e44a03da 100644 +--- a/arch/um/kernel/sysrq.c ++++ b/arch/um/kernel/sysrq.c +@@ -53,5 +53,5 @@ void show_stack(struct task_struct *task, unsigned long *stack, + } + + printk("%sCall Trace:\n", loglvl); +- dump_trace(current, &stackops, (void *)loglvl); ++ dump_trace(task ?: current, &stackops, (void *)loglvl); + } +-- +2.43.0 + diff --git a/queue-6.11/um-fix-potential-integer-overflow-during-physmem-set.patch b/queue-6.11/um-fix-potential-integer-overflow-during-physmem-set.patch new file mode 100644 index 00000000000..bfd3eabd8b6 --- /dev/null +++ b/queue-6.11/um-fix-potential-integer-overflow-during-physmem-set.patch @@ -0,0 +1,50 @@ +From 7459237f22f1debe21d07cebadcaec8c6096ded0 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 16 Sep 2024 12:59:48 +0800 +Subject: um: Fix potential integer overflow during physmem setup + +From: Tiwei Bie + +[ Upstream commit a98b7761f697e590ed5d610d87fa12be66f23419 ] + +This issue happens when the real map size is greater than LONG_MAX, +which can be easily triggered on UML/i386. + +Fixes: fe205bdd1321 ("um: Print minimum physical memory requirement") +Signed-off-by: Tiwei Bie +Link: https://patch.msgid.link/20240916045950.508910-3-tiwei.btw@antgroup.com +Signed-off-by: Johannes Berg +Signed-off-by: Sasha Levin +--- + arch/um/kernel/physmem.c | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +diff --git a/arch/um/kernel/physmem.c b/arch/um/kernel/physmem.c +index fb2adfb499452..ee693e0b2b58b 100644 +--- a/arch/um/kernel/physmem.c ++++ b/arch/um/kernel/physmem.c +@@ -81,10 +81,10 @@ void __init setup_physmem(unsigned long start, unsigned long reserve_end, + unsigned long len, unsigned long long highmem) + { + unsigned long reserve = reserve_end - start; +- long map_size = len - reserve; ++ unsigned long map_size = len - reserve; + int err; + +- if(map_size <= 0) { ++ if (len <= reserve) { + os_warn("Too few physical memory! Needed=%lu, given=%lu\n", + reserve, len); + exit(1); +@@ -95,7 +95,7 @@ void __init setup_physmem(unsigned long start, unsigned long reserve_end, + err = os_map_memory((void *) reserve_end, physmem_fd, reserve, + map_size, 1, 1, 1); + if (err < 0) { +- os_warn("setup_physmem - mapping %ld bytes of memory at 0x%p " ++ os_warn("setup_physmem - mapping %lu bytes of memory at 0x%p " + "failed - errno = %d\n", map_size, + (void *) reserve_end, err); + exit(1); +-- +2.43.0 + diff --git a/queue-6.11/um-fix-the-return-value-of-elf_core_copy_task_fpregs.patch b/queue-6.11/um-fix-the-return-value-of-elf_core_copy_task_fpregs.patch new file mode 100644 index 00000000000..7268914f1fc --- /dev/null +++ b/queue-6.11/um-fix-the-return-value-of-elf_core_copy_task_fpregs.patch @@ -0,0 +1,36 @@ +From 58436c431ffded6942f4aaf3212254baba537dca Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 13 Sep 2024 10:33:02 +0800 +Subject: um: Fix the return value of elf_core_copy_task_fpregs + +From: Tiwei Bie + +[ Upstream commit 865e3845eeaa21e9a62abc1361644e67124f1ec0 ] + +This function is expected to return a boolean value, which should be +true on success and false on failure. + +Fixes: d1254b12c93e ("uml: fix x86_64 core dump crash") +Signed-off-by: Tiwei Bie +Link: https://patch.msgid.link/20240913023302.130300-1-tiwei.btw@antgroup.com +Signed-off-by: Johannes Berg +Signed-off-by: Sasha Levin +--- + arch/um/kernel/process.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c +index f36b63f53babd..d3786577c6fbf 100644 +--- a/arch/um/kernel/process.c ++++ b/arch/um/kernel/process.c +@@ -292,6 +292,6 @@ int elf_core_copy_task_fpregs(struct task_struct *t, elf_fpregset_t *fpu) + { + int cpu = current_thread_info()->cpu; + +- return save_i387_registers(userspace_pid[cpu], (unsigned long *) fpu); ++ return save_i387_registers(userspace_pid[cpu], (unsigned long *) fpu) == 0; + } + +-- +2.43.0 + diff --git a/queue-6.11/um-ubd-initialize-ubd-s-disk-pointer-in-ubd_add.patch b/queue-6.11/um-ubd-initialize-ubd-s-disk-pointer-in-ubd_add.patch new file mode 100644 index 00000000000..56e00e100d7 --- /dev/null +++ b/queue-6.11/um-ubd-initialize-ubd-s-disk-pointer-in-ubd_add.patch @@ -0,0 +1,39 @@ +From c0306aab8cc56b4ef282a7c33ac30f38c0c3a01c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 5 Nov 2024 00:32:00 +0800 +Subject: um: ubd: Initialize ubd's disk pointer in ubd_add + +From: Tiwei Bie + +[ Upstream commit df700802abcac3c7c4a4ced099aa42b9a144eea8 ] + +Currently, the initialization of the disk pointer in the ubd structure +is missing. It should be initialized with the allocated gendisk pointer +in ubd_add(). + +Fixes: 32621ad7a7ea ("ubd: remove the ubd_gendisk array") +Signed-off-by: Tiwei Bie +Acked-By: Anton Ivanov +Link: https://patch.msgid.link/20241104163203.435515-2-tiwei.btw@antgroup.com +Signed-off-by: Johannes Berg +Signed-off-by: Sasha Levin +--- + arch/um/drivers/ubd_kern.c | 2 ++ + 1 file changed, 2 insertions(+) + +diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c +index 119df76627002..2bfb17373244b 100644 +--- a/arch/um/drivers/ubd_kern.c ++++ b/arch/um/drivers/ubd_kern.c +@@ -898,6 +898,8 @@ static int ubd_add(int n, char **error_out) + if (err) + goto out_cleanup_disk; + ++ ubd_dev->disk = disk; ++ + return 0; + + out_cleanup_disk: +-- +2.43.0 + diff --git a/queue-6.11/x86-documentation-update-algo-in-init_size-descripti.patch b/queue-6.11/x86-documentation-update-algo-in-init_size-descripti.patch new file mode 100644 index 00000000000..98f431dc798 --- /dev/null +++ b/queue-6.11/x86-documentation-update-algo-in-init_size-descripti.patch @@ -0,0 +1,77 @@ +From bfc88ce3b77296309dd677432fdf7e5a0dc26c84 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 25 Nov 2024 12:49:14 +0200 +Subject: x86/Documentation: Update algo in init_size description of boot + protocol + +From: Andy Shevchenko + +[ Upstream commit be4ca6c53e66cb275cf0d71f32dac0c4606b9dc0 ] + +The init_size description of boot protocol has an example of the runtime +start address for the compressed bzImage. For non-relocatable kernel +it relies on the pref_address value (if not 0), but for relocatable case +only pays respect to the load_addres and kernel_alignment, and it is +inaccurate for the latter. Boot loader must consider the pref_address +as the Linux kernel relocates to it before being decompressed as nicely +described in this commit message a year ago: + + 43b1d3e68ee7 ("kexec: Allocate kernel above bzImage's pref_address") + +Due to this documentation inaccuracy some of the bootloaders (*) made a +mistake in the calculations and if kernel image is big enough, this may +lead to unbootable configurations. + +*) + In particular, kexec-tools missed that and resently got a couple of + changes which will be part of v2.0.30 release. For the record, + commit 43b1d3e68ee7 only fixed the kernel kexec implementation and + also missed to update the init_size description. + +While at it, make an example C-like looking as it's done elsewhere in +the document and fix indentation as presribed by the reStructuredText +specifications, so the syntax highliting will work properly. + +Fixes: 43b1d3e68ee7 ("kexec: Allocate kernel above bzImage's pref_address") +Fixes: d297366ba692 ("x86: document new bzImage fields") +Signed-off-by: Andy Shevchenko +Signed-off-by: Ingo Molnar +Acked-by: Randy Dunlap +Cc: "H. Peter Anvin" +Link: https://lore.kernel.org/r/20241125105005.1616154-1-andriy.shevchenko@linux.intel.com +Signed-off-by: Sasha Levin +--- + Documentation/arch/x86/boot.rst | 17 +++++++++++++---- + 1 file changed, 13 insertions(+), 4 deletions(-) + +diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst +index 4fd492cb49704..ad2d8ddad27fe 100644 +--- a/Documentation/arch/x86/boot.rst ++++ b/Documentation/arch/x86/boot.rst +@@ -896,10 +896,19 @@ Offset/size: 0x260/4 + + The kernel runtime start address is determined by the following algorithm:: + +- if (relocatable_kernel) +- runtime_start = align_up(load_address, kernel_alignment) +- else +- runtime_start = pref_address ++ if (relocatable_kernel) { ++ if (load_address < pref_address) ++ load_address = pref_address; ++ runtime_start = align_up(load_address, kernel_alignment); ++ } else { ++ runtime_start = pref_address; ++ } ++ ++Hence the necessary memory window location and size can be estimated by ++a boot loader as:: ++ ++ memory_window_start = runtime_start; ++ memory_window_size = init_size; + + ============ =============== + Field name: handover_offset +-- +2.43.0 + -- 2.47.3