From: Sasha Levin Date: Tue, 28 Apr 2026 10:52:51 +0000 (-0400) Subject: Fixes for all trees X-Git-Url: http://git.ipfire.org/gitweb/?a=commitdiff_plain;h=8518a857460a79df17849e6df41f1583ac7a959f;p=thirdparty%2Fkernel%2Fstable-queue.git Fixes for all trees Signed-off-by: Sasha Levin --- diff --git a/queue-5.10/driver-core-don-t-let-a-device-probe-until-it-s-read.patch b/queue-5.10/driver-core-don-t-let-a-device-probe-until-it-s-read.patch new file mode 100644 index 0000000000..661e0bb9b9 --- /dev/null +++ b/queue-5.10/driver-core-don-t-let-a-device-probe-until-it-s-read.patch @@ -0,0 +1,216 @@ +From 52720704b7dfc7214c896f7aac25ac30bbcb3916 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 27 Apr 2026 10:15:33 -0700 +Subject: driver core: Don't let a device probe until it's ready + +From: Douglas Anderson + +[ Upstream commit a2225b6e834a838ae3c93709760edc0a169eb2f2 ] + +The moment we link a "struct device" into the list of devices for the +bus, it's possible probe can happen. This is because another thread +can load the driver at any time and that can cause the device to +probe. This has been seen in practice with a stack crawl that looks +like this [1]: + + really_probe() + __driver_probe_device() + driver_probe_device() + __driver_attach() + bus_for_each_dev() + driver_attach() + bus_add_driver() + driver_register() + __platform_driver_register() + init_module() [some module] + do_one_initcall() + do_init_module() + load_module() + __arm64_sys_finit_module() + invoke_syscall() + +As a result of the above, it was seen that device_links_driver_bound() +could be called for the device before "dev->fwnode->dev" was +assigned. This prevented __fw_devlink_pickup_dangling_consumers() from +being called which meant that other devices waiting on our driver's +sub-nodes were stuck deferring forever. + +It's believed that this problem is showing up suddenly for two +reasons: +1. Android has recently (last ~1 year) implemented an optimization to + the order it loads modules [2]. When devices opt-in to this faster + loading, modules are loaded one-after-the-other very quickly. This + is unlike how other distributions do it. The reproduction of this + problem has only been seen on devices that opt-in to Android's + "parallel module loading". +2. Android devices typically opt-in to fw_devlink, and the most + noticeable issue is the NULL "dev->fwnode->dev" in + device_links_driver_bound(). fw_devlink is somewhat new code and + also not in use by all Linux devices. + +Even though the specific symptom where "dev->fwnode->dev" wasn't +assigned could be fixed by moving that assignment higher in +device_add(), other parts of device_add() (like the call to +device_pm_add()) are also important to run before probe. Only moving +the "dev->fwnode->dev" assignment would likely fix the current +symptoms but lead to difficult-to-debug problems in the future. + +Fix the problem by preventing probe until device_add() has run far +enough that the device is ready to probe. If somehow we end up trying +to probe before we're allowed, __driver_probe_device() will return +-EPROBE_DEFER which will make certain the device is noticed. + +In the race condition that was seen with Android's faster module +loading, we will temporarily add the device to the deferred list and +then take it off immediately when device_add() probes the device. + +Instead of adding another flag to the bitfields already in "struct +device", instead add a new "flags" field and use that. This allows us +to freely change the bit from different thread without worrying about +corrupting nearby bits (and means threads changing other bit won't +corrupt us). + +[1] Captured on a machine running a downstream 6.6 kernel +[2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libmodprobe/libmodprobe.cpp?q=LoadModulesParallel + +Cc: stable@vger.kernel.org +Fixes: 2023c610dc54 ("Driver core: add new device to bus's list before probing") +Reviewed-by: Alan Stern +Reviewed-by: Rafael J. Wysocki (Intel) +Reviewed-by: Danilo Krummrich +Acked-by: Greg Kroah-Hartman +Acked-by: Marek Szyprowski +Signed-off-by: Douglas Anderson +Link: https://patch.msgid.link/20260406162231.v5.1.Id750b0fbcc94f23ed04b7aecabcead688d0d8c17@changeid +Signed-off-by: Danilo Krummrich +Signed-off-by: Douglas Anderson +Signed-off-by: Sasha Levin +--- + drivers/base/core.c | 15 ++++++++++++++ + drivers/base/dd.c | 12 ++++++++++++ + include/linux/device.h | 44 ++++++++++++++++++++++++++++++++++++++++++ + 3 files changed, 71 insertions(+) + +diff --git a/drivers/base/core.c b/drivers/base/core.c +index 3521d4c00c2e9..a900bde641491 100644 +--- a/drivers/base/core.c ++++ b/drivers/base/core.c +@@ -3008,6 +3008,21 @@ int device_add(struct device *dev) + fw_devlink_link_device(dev); + } + ++ /* ++ * The moment the device was linked into the bus's "klist_devices" in ++ * bus_add_device() then it's possible that probe could have been ++ * attempted in a different thread via userspace loading a driver ++ * matching the device. "ready_to_probe" being unset would have ++ * blocked those attempts. Now that all of the above initialization has ++ * happened, unblock probe. If probe happens through another thread ++ * after this point but before bus_probe_device() runs then it's fine. ++ * bus_probe_device() -> device_initial_probe() -> __device_attach() ++ * will notice (under device_lock) that the device is already bound. ++ */ ++ device_lock(dev); ++ dev_set_ready_to_probe(dev); ++ device_unlock(dev); ++ + bus_probe_device(dev); + if (parent) + klist_add_tail(&dev->p->knode_parent, +diff --git a/drivers/base/dd.c b/drivers/base/dd.c +index 1e8318acf6218..0398f2c985b38 100644 +--- a/drivers/base/dd.c ++++ b/drivers/base/dd.c +@@ -738,6 +738,18 @@ int driver_probe_device(struct device_driver *drv, struct device *dev) + if (!device_is_registered(dev)) + return -ENODEV; + ++ /* ++ * In device_add(), the "struct device" gets linked into the subsystem's ++ * list of devices and broadcast to userspace (via uevent) before we're ++ * quite ready to probe. Those open pathways to driver probe before ++ * we've finished enough of device_add() to reliably support probe. ++ * Detect this and tell other pathways to try again later. device_add() ++ * itself will also try to probe immediately after setting ++ * "ready_to_probe". ++ */ ++ if (!dev_ready_to_probe(dev)) ++ return dev_err_probe(dev, -EPROBE_DEFER, "Device not ready to probe\n"); ++ + pr_debug("bus: '%s': %s: matched device %s with driver %s\n", + drv->bus->name, __func__, dev_name(dev), drv->name); + +diff --git a/include/linux/device.h b/include/linux/device.h +index 047a8f1ef8f28..ff7cae0431abb 100644 +--- a/include/linux/device.h ++++ b/include/linux/device.h +@@ -385,6 +385,21 @@ struct dev_links_info { + enum dl_dev_state status; + }; + ++/** ++ * enum struct_device_flags - Flags in struct device ++ * ++ * Each flag should have a set of accessor functions created via ++ * __create_dev_flag_accessors() for each access. ++ * ++ * @DEV_FLAG_READY_TO_PROBE: If set then device_add() has finished enough ++ * initialization that probe could be called. ++ */ ++enum struct_device_flags { ++ DEV_FLAG_READY_TO_PROBE = 0, ++ ++ DEV_FLAG_COUNT ++}; ++ + /** + * struct device - The basic device structure + * @parent: The device's "parent" device, the device to which it is attached. +@@ -470,6 +485,7 @@ struct dev_links_info { + * and optionall (if the coherent mask is large enough) also + * for dma allocations. This flag is managed by the dma ops + * instance from ->dma_supported. ++ * @flags: DEV_FLAG_XXX flags. Use atomic bitfield operations to modify. + * + * At the lowest level, every device in a Linux system is represented by an + * instance of struct device. The device structure contains the information +@@ -580,8 +596,36 @@ struct device { + #ifdef CONFIG_DMA_OPS_BYPASS + bool dma_ops_bypass : 1; + #endif ++ ++ DECLARE_BITMAP(flags, DEV_FLAG_COUNT); + }; + ++#define __create_dev_flag_accessors(accessor_name, flag_name) \ ++static inline bool dev_##accessor_name(const struct device *dev) \ ++{ \ ++ return test_bit(flag_name, dev->flags); \ ++} \ ++static inline void dev_set_##accessor_name(struct device *dev) \ ++{ \ ++ set_bit(flag_name, dev->flags); \ ++} \ ++static inline void dev_clear_##accessor_name(struct device *dev) \ ++{ \ ++ clear_bit(flag_name, dev->flags); \ ++} \ ++static inline void dev_assign_##accessor_name(struct device *dev, bool value) \ ++{ \ ++ assign_bit(flag_name, dev->flags, value); \ ++} \ ++static inline bool dev_test_and_set_##accessor_name(struct device *dev) \ ++{ \ ++ return test_and_set_bit(flag_name, dev->flags); \ ++} ++ ++__create_dev_flag_accessors(ready_to_probe, DEV_FLAG_READY_TO_PROBE); ++ ++#undef __create_dev_flag_accessors ++ + /** + * struct device_link - Device link representation. + * @supplier: The device on the supplier end of the link. +-- +2.53.0 + diff --git a/queue-5.10/padata-fix-pd-uaf-once-and-for-all.patch b/queue-5.10/padata-fix-pd-uaf-once-and-for-all.patch new file mode 100644 index 0000000000..f7f7771d58 --- /dev/null +++ b/queue-5.10/padata-fix-pd-uaf-once-and-for-all.patch @@ -0,0 +1,280 @@ +From 958051bc7ff1387cd28f21f38737ab3a336789fd Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 28 Apr 2026 16:57:00 +0800 +Subject: padata: Fix pd UAF once and for all + +From: Herbert Xu + +[ Upstream commit 71203f68c7749609d7fc8ae6ad054bdedeb24f91 ] + +There is a race condition/UAF in padata_reorder that goes back +to the initial commit. A reference count is taken at the start +of the process in padata_do_parallel, and released at the end in +padata_serial_worker. + +This reference count is (and only is) required for padata_replace +to function correctly. If padata_replace is never called then +there is no issue. + +In the function padata_reorder which serves as the core of padata, +as soon as padata is added to queue->serial.list, and the associated +spin lock released, that padata may be processed and the reference +count on pd would go away. + +Fix this by getting the next padata before the squeue->serial lock +is released. + +In order to make this possible, simplify padata_reorder by only +calling it once the next padata arrives. + +Fixes: 16295bec6398 ("padata: Generic parallelization/serialization interface") +Signed-off-by: Herbert Xu +[ Adjust context of padata_find_next(). Replace +cpumask_next_wrap(cpu, pd->cpumask.pcpu) with +cpumask_next_wrap(cpu, pd->cpumask.pcpu, -1, false) in padata_reorder() in +v5.10 according to dc5bb9b769c9 ("cpumask: deprecate cpumask_next_wrap()") and +f954a2d37637 ("padata: switch padata_find_next() to using cpumask_next_wrap()") +. ] +Signed-off-by: Bin Lan +Signed-off-by: Sasha Levin +--- + include/linux/padata.h | 3 - + kernel/padata.c | 136 +++++++++++------------------------------ + 2 files changed, 37 insertions(+), 102 deletions(-) + +diff --git a/include/linux/padata.h b/include/linux/padata.h +index 495b16b6b4d72..9ca779d7e310e 100644 +--- a/include/linux/padata.h ++++ b/include/linux/padata.h +@@ -91,7 +91,6 @@ struct padata_cpumask { + * @cpu: Next CPU to be processed. + * @cpumask: The cpumasks in use for parallel and serial workers. + * @reorder_work: work struct for reordering. +- * @lock: Reorder lock. + */ + struct parallel_data { + struct padata_shell *ps; +@@ -102,8 +101,6 @@ struct parallel_data { + unsigned int processed; + int cpu; + struct padata_cpumask cpumask; +- struct work_struct reorder_work; +- spinlock_t ____cacheline_aligned lock; + }; + + /** +diff --git a/kernel/padata.c b/kernel/padata.c +index 6c8a141b5c4b2..6d8af344498b7 100644 +--- a/kernel/padata.c ++++ b/kernel/padata.c +@@ -266,20 +266,17 @@ EXPORT_SYMBOL(padata_do_parallel); + * be parallel processed by another cpu and is not yet present in + * the cpu's reorder queue. + */ +-static struct padata_priv *padata_find_next(struct parallel_data *pd, +- bool remove_object) ++static struct padata_priv *padata_find_next(struct parallel_data *pd, int cpu, ++ unsigned int processed) + { + struct padata_priv *padata; + struct padata_list *reorder; +- int cpu = pd->cpu; + + reorder = per_cpu_ptr(pd->reorder_list, cpu); + + spin_lock(&reorder->lock); +- if (list_empty(&reorder->list)) { +- spin_unlock(&reorder->lock); +- return NULL; +- } ++ if (list_empty(&reorder->list)) ++ goto notfound; + + padata = list_entry(reorder->list.next, struct padata_priv, list); + +@@ -287,101 +284,52 @@ static struct padata_priv *padata_find_next(struct parallel_data *pd, + * Checks the rare case where two or more parallel jobs have hashed to + * the same CPU and one of the later ones finishes first. + */ +- if (padata->seq_nr != pd->processed) { +- spin_unlock(&reorder->lock); +- return NULL; +- } +- +- if (remove_object) { +- list_del_init(&padata->list); +- ++pd->processed; +- /* When sequence wraps around, reset to the first CPU. */ +- if (unlikely(pd->processed == 0)) +- pd->cpu = cpumask_first(pd->cpumask.pcpu); +- else +- pd->cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, -1, false); +- } ++ if (padata->seq_nr != processed) ++ goto notfound; + ++ list_del_init(&padata->list); + spin_unlock(&reorder->lock); + return padata; ++ ++notfound: ++ pd->processed = processed; ++ pd->cpu = cpu; ++ spin_unlock(&reorder->lock); ++ return NULL; + } + +-static void padata_reorder(struct parallel_data *pd) ++static void padata_reorder(struct padata_priv *padata) + { ++ struct parallel_data *pd = padata->pd; + struct padata_instance *pinst = pd->ps->pinst; +- int cb_cpu; +- struct padata_priv *padata; +- struct padata_serial_queue *squeue; +- struct padata_list *reorder; ++ unsigned int processed; ++ int cpu; + +- /* +- * We need to ensure that only one cpu can work on dequeueing of +- * the reorder queue the time. Calculating in which percpu reorder +- * queue the next object will arrive takes some time. A spinlock +- * would be highly contended. Also it is not clear in which order +- * the objects arrive to the reorder queues. So a cpu could wait to +- * get the lock just to notice that there is nothing to do at the +- * moment. Therefore we use a trylock and let the holder of the lock +- * care for all the objects enqueued during the holdtime of the lock. +- */ +- if (!spin_trylock_bh(&pd->lock)) +- return; ++ processed = pd->processed; ++ cpu = pd->cpu; + +- while (1) { +- padata = padata_find_next(pd, true); ++ do { ++ struct padata_serial_queue *squeue; ++ int cb_cpu; + +- /* +- * If the next object that needs serialization is parallel +- * processed by another cpu and is still on it's way to the +- * cpu's reorder queue, nothing to do for now. +- */ +- if (!padata) +- break; ++ cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, -1, false); ++ processed++; + + cb_cpu = padata->cb_cpu; + squeue = per_cpu_ptr(pd->squeue, cb_cpu); + + spin_lock(&squeue->serial.lock); + list_add_tail(&padata->list, &squeue->serial.list); +- spin_unlock(&squeue->serial.lock); +- + queue_work_on(cb_cpu, pinst->serial_wq, &squeue->work); +- } + +- spin_unlock_bh(&pd->lock); +- +- /* +- * The next object that needs serialization might have arrived to +- * the reorder queues in the meantime. +- * +- * Ensure reorder queue is read after pd->lock is dropped so we see +- * new objects from another task in padata_do_serial. Pairs with +- * smp_mb in padata_do_serial. +- */ +- smp_mb(); +- +- reorder = per_cpu_ptr(pd->reorder_list, pd->cpu); +- if (!list_empty(&reorder->list) && padata_find_next(pd, false)) { + /* +- * Other context(eg. the padata_serial_worker) can finish the request. +- * To avoid UAF issue, add pd ref here, and put pd ref after reorder_work finish. ++ * If the next object that needs serialization is parallel ++ * processed by another cpu and is still on it's way to the ++ * cpu's reorder queue, end the loop. + */ +- padata_get_pd(pd); +- if (!queue_work(pinst->serial_wq, &pd->reorder_work)) +- padata_put_pd(pd); +- } +-} +- +-static void invoke_padata_reorder(struct work_struct *work) +-{ +- struct parallel_data *pd; +- +- local_bh_disable(); +- pd = container_of(work, struct parallel_data, reorder_work); +- padata_reorder(pd); +- local_bh_enable(); +- /* Pairs with putting the reorder_work in the serial_wq */ +- padata_put_pd(pd); ++ padata = padata_find_next(pd, cpu, processed); ++ spin_unlock(&squeue->serial.lock); ++ } while (padata); + } + + static void padata_serial_worker(struct work_struct *serial_work) +@@ -432,6 +380,7 @@ void padata_do_serial(struct padata_priv *padata) + struct padata_list *reorder = per_cpu_ptr(pd->reorder_list, hashed_cpu); + struct padata_priv *cur; + struct list_head *pos; ++ bool gotit = true; + + spin_lock(&reorder->lock); + /* Sort in ascending order of sequence number. */ +@@ -441,17 +390,14 @@ void padata_do_serial(struct padata_priv *padata) + if ((signed int)(cur->seq_nr - padata->seq_nr) < 0) + break; + } +- list_add(&padata->list, pos); ++ if (padata->seq_nr != pd->processed) { ++ gotit = false; ++ list_add(&padata->list, pos); ++ } + spin_unlock(&reorder->lock); + +- /* +- * Ensure the addition to the reorder list is ordered correctly +- * with the trylock of pd->lock in padata_reorder. Pairs with smp_mb +- * in padata_reorder. +- */ +- smp_mb(); +- +- padata_reorder(pd); ++ if (gotit) ++ padata_reorder(padata); + } + EXPORT_SYMBOL(padata_do_serial); + +@@ -638,9 +584,7 @@ static struct parallel_data *padata_alloc_pd(struct padata_shell *ps) + padata_init_squeues(pd); + pd->seq_nr = -1; + refcount_set(&pd->refcnt, 1); +- spin_lock_init(&pd->lock); + pd->cpu = cpumask_first(pd->cpumask.pcpu); +- INIT_WORK(&pd->reorder_work, invoke_padata_reorder); + + return pd; + +@@ -1150,12 +1094,6 @@ void padata_free_shell(struct padata_shell *ps) + if (!ps) + return; + +- /* +- * Wait for all _do_serial calls to finish to avoid touching +- * freed pd's and ps's. +- */ +- synchronize_rcu(); +- + mutex_lock(&ps->pinst->lock); + list_del(&ps->list); + pd = rcu_dereference_protected(ps->pd, 1); +-- +2.53.0 + diff --git a/queue-5.10/padata-remove-comment-for-reorder_work.patch b/queue-5.10/padata-remove-comment-for-reorder_work.patch new file mode 100644 index 0000000000..a286ccd5c1 --- /dev/null +++ b/queue-5.10/padata-remove-comment-for-reorder_work.patch @@ -0,0 +1,35 @@ +From b1e8b361e4213f6a0176490ed3e4ca2314bef5ad Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 28 Apr 2026 16:57:01 +0800 +Subject: padata: Remove comment for reorder_work + +From: Herbert Xu + +[ Upstream commit 82a0302e7167d0b7c6cde56613db3748f8dd806d ] + +Remove comment for reorder_work which no longer exists. + +Reported-by: Stephen Rothwell +Fixes: 71203f68c774 ("padata: Fix pd UAF once and for all") +Signed-off-by: Herbert Xu +Signed-off-by: Bin Lan +Signed-off-by: Sasha Levin +--- + include/linux/padata.h | 1 - + 1 file changed, 1 deletion(-) + +diff --git a/include/linux/padata.h b/include/linux/padata.h +index 9ca779d7e310e..6f07e12a43819 100644 +--- a/include/linux/padata.h ++++ b/include/linux/padata.h +@@ -90,7 +90,6 @@ struct padata_cpumask { + * @processed: Number of already processed objects. + * @cpu: Next CPU to be processed. + * @cpumask: The cpumasks in use for parallel and serial workers. +- * @reorder_work: work struct for reordering. + */ + struct parallel_data { + struct padata_shell *ps; +-- +2.53.0 + diff --git a/queue-5.10/series b/queue-5.10/series index 26f8812184..bb8fc862f6 100644 --- a/queue-5.10/series +++ b/queue-5.10/series @@ -146,3 +146,6 @@ firmware-google-framebuffer-do-not-mark-framebuffer-as-busy.patch io_uring-poll-fix-epoll_uring_wake-sometimes-not-bei.patch io_uring-poll-fix-backport-of-io_poll_add-changes.patch revert-riscv-sparse-memory-vmemmap-out-of-bounds-fix.patch +padata-fix-pd-uaf-once-and-for-all.patch +padata-remove-comment-for-reorder_work.patch +driver-core-don-t-let-a-device-probe-until-it-s-read.patch diff --git a/queue-5.15/driver-core-don-t-let-a-device-probe-until-it-s-read.patch b/queue-5.15/driver-core-don-t-let-a-device-probe-until-it-s-read.patch new file mode 100644 index 0000000000..dff870a3a9 --- /dev/null +++ b/queue-5.15/driver-core-don-t-let-a-device-probe-until-it-s-read.patch @@ -0,0 +1,224 @@ +From 2b4671bddb03828dfb5d5bae95544ac992ce133e Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 27 Apr 2026 10:01:34 -0700 +Subject: driver core: Don't let a device probe until it's ready + +From: Douglas Anderson + +[ Upstream commit a2225b6e834a838ae3c93709760edc0a169eb2f2 ] + +The moment we link a "struct device" into the list of devices for the +bus, it's possible probe can happen. This is because another thread +can load the driver at any time and that can cause the device to +probe. This has been seen in practice with a stack crawl that looks +like this [1]: + + really_probe() + __driver_probe_device() + driver_probe_device() + __driver_attach() + bus_for_each_dev() + driver_attach() + bus_add_driver() + driver_register() + __platform_driver_register() + init_module() [some module] + do_one_initcall() + do_init_module() + load_module() + __arm64_sys_finit_module() + invoke_syscall() + +As a result of the above, it was seen that device_links_driver_bound() +could be called for the device before "dev->fwnode->dev" was +assigned. This prevented __fw_devlink_pickup_dangling_consumers() from +being called which meant that other devices waiting on our driver's +sub-nodes were stuck deferring forever. + +It's believed that this problem is showing up suddenly for two +reasons: +1. Android has recently (last ~1 year) implemented an optimization to + the order it loads modules [2]. When devices opt-in to this faster + loading, modules are loaded one-after-the-other very quickly. This + is unlike how other distributions do it. The reproduction of this + problem has only been seen on devices that opt-in to Android's + "parallel module loading". +2. Android devices typically opt-in to fw_devlink, and the most + noticeable issue is the NULL "dev->fwnode->dev" in + device_links_driver_bound(). fw_devlink is somewhat new code and + also not in use by all Linux devices. + +Even though the specific symptom where "dev->fwnode->dev" wasn't +assigned could be fixed by moving that assignment higher in +device_add(), other parts of device_add() (like the call to +device_pm_add()) are also important to run before probe. Only moving +the "dev->fwnode->dev" assignment would likely fix the current +symptoms but lead to difficult-to-debug problems in the future. + +Fix the problem by preventing probe until device_add() has run far +enough that the device is ready to probe. If somehow we end up trying +to probe before we're allowed, __driver_probe_device() will return +-EPROBE_DEFER which will make certain the device is noticed. + +In the race condition that was seen with Android's faster module +loading, we will temporarily add the device to the deferred list and +then take it off immediately when device_add() probes the device. + +Instead of adding another flag to the bitfields already in "struct +device", instead add a new "flags" field and use that. This allows us +to freely change the bit from different thread without worrying about +corrupting nearby bits (and means threads changing other bit won't +corrupt us). + +[1] Captured on a machine running a downstream 6.6 kernel +[2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libmodprobe/libmodprobe.cpp?q=LoadModulesParallel + +Cc: stable@vger.kernel.org +Fixes: 2023c610dc54 ("Driver core: add new device to bus's list before probing") +Reviewed-by: Alan Stern +Reviewed-by: Rafael J. Wysocki (Intel) +Reviewed-by: Danilo Krummrich +Acked-by: Greg Kroah-Hartman +Acked-by: Marek Szyprowski +Signed-off-by: Douglas Anderson +Link: https://patch.msgid.link/20260406162231.v5.1.Id750b0fbcc94f23ed04b7aecabcead688d0d8c17@changeid +Signed-off-by: Danilo Krummrich +Signed-off-by: Douglas Anderson +Signed-off-by: Sasha Levin +--- + drivers/base/core.c | 15 ++++++++++++++ + drivers/base/dd.c | 20 +++++++++++++++++++ + include/linux/device.h | 44 ++++++++++++++++++++++++++++++++++++++++++ + 3 files changed, 79 insertions(+) + +diff --git a/drivers/base/core.c b/drivers/base/core.c +index 9ec8a9eced42f..d11cf07e1441c 100644 +--- a/drivers/base/core.c ++++ b/drivers/base/core.c +@@ -3409,6 +3409,21 @@ int device_add(struct device *dev) + fw_devlink_link_device(dev); + } + ++ /* ++ * The moment the device was linked into the bus's "klist_devices" in ++ * bus_add_device() then it's possible that probe could have been ++ * attempted in a different thread via userspace loading a driver ++ * matching the device. "ready_to_probe" being unset would have ++ * blocked those attempts. Now that all of the above initialization has ++ * happened, unblock probe. If probe happens through another thread ++ * after this point but before bus_probe_device() runs then it's fine. ++ * bus_probe_device() -> device_initial_probe() -> __device_attach() ++ * will notice (under device_lock) that the device is already bound. ++ */ ++ device_lock(dev); ++ dev_set_ready_to_probe(dev); ++ device_unlock(dev); ++ + bus_probe_device(dev); + + /* +diff --git a/drivers/base/dd.c b/drivers/base/dd.c +index 0bd166ad6f130..daa5ef3f38e92 100644 +--- a/drivers/base/dd.c ++++ b/drivers/base/dd.c +@@ -740,6 +740,26 @@ static int __driver_probe_device(struct device_driver *drv, struct device *dev) + if (dev->driver) + return -EBUSY; + ++ /* ++ * In device_add(), the "struct device" gets linked into the subsystem's ++ * list of devices and broadcast to userspace (via uevent) before we're ++ * quite ready to probe. Those open pathways to driver probe before ++ * we've finished enough of device_add() to reliably support probe. ++ * Detect this and tell other pathways to try again later. device_add() ++ * itself will also try to probe immediately after setting ++ * "ready_to_probe". ++ */ ++ if (!dev_ready_to_probe(dev)) ++ return dev_err_probe(dev, -EPROBE_DEFER, "Device not ready to probe\n"); ++ ++ /* ++ * Set can_match = true after calling dev_ready_to_probe(), so ++ * driver_deferred_probe_add() won't actually add the device to the ++ * deferred probe list when dev_ready_to_probe() returns false. ++ * ++ * When dev_ready_to_probe() returns false, it means that device_add() ++ * will do another probe() attempt for us. ++ */ + dev->can_match = true; + pr_debug("bus: '%s': %s: matched device %s with driver %s\n", + drv->bus->name, __func__, dev_name(dev), drv->name); +diff --git a/include/linux/device.h b/include/linux/device.h +index 89864b9185462..58211946b1325 100644 +--- a/include/linux/device.h ++++ b/include/linux/device.h +@@ -372,6 +372,21 @@ struct dev_links_info { + enum dl_dev_state status; + }; + ++/** ++ * enum struct_device_flags - Flags in struct device ++ * ++ * Each flag should have a set of accessor functions created via ++ * __create_dev_flag_accessors() for each access. ++ * ++ * @DEV_FLAG_READY_TO_PROBE: If set then device_add() has finished enough ++ * initialization that probe could be called. ++ */ ++enum struct_device_flags { ++ DEV_FLAG_READY_TO_PROBE = 0, ++ ++ DEV_FLAG_COUNT ++}; ++ + /** + * struct device - The basic device structure + * @parent: The device's "parent" device, the device to which it is attached. +@@ -462,6 +477,7 @@ struct dev_links_info { + * and optionall (if the coherent mask is large enough) also + * for dma allocations. This flag is managed by the dma ops + * instance from ->dma_supported. ++ * @flags: DEV_FLAG_XXX flags. Use atomic bitfield operations to modify. + * + * At the lowest level, every device in a Linux system is represented by an + * instance of struct device. The device structure contains the information +@@ -576,8 +592,36 @@ struct device { + #ifdef CONFIG_DMA_OPS_BYPASS + bool dma_ops_bypass : 1; + #endif ++ ++ DECLARE_BITMAP(flags, DEV_FLAG_COUNT); + }; + ++#define __create_dev_flag_accessors(accessor_name, flag_name) \ ++static inline bool dev_##accessor_name(const struct device *dev) \ ++{ \ ++ return test_bit(flag_name, dev->flags); \ ++} \ ++static inline void dev_set_##accessor_name(struct device *dev) \ ++{ \ ++ set_bit(flag_name, dev->flags); \ ++} \ ++static inline void dev_clear_##accessor_name(struct device *dev) \ ++{ \ ++ clear_bit(flag_name, dev->flags); \ ++} \ ++static inline void dev_assign_##accessor_name(struct device *dev, bool value) \ ++{ \ ++ assign_bit(flag_name, dev->flags, value); \ ++} \ ++static inline bool dev_test_and_set_##accessor_name(struct device *dev) \ ++{ \ ++ return test_and_set_bit(flag_name, dev->flags); \ ++} ++ ++__create_dev_flag_accessors(ready_to_probe, DEV_FLAG_READY_TO_PROBE); ++ ++#undef __create_dev_flag_accessors ++ + /** + * struct device_link - Device link representation. + * @supplier: The device on the supplier end of the link. +-- +2.53.0 + diff --git a/queue-5.15/padata-fix-pd-uaf-once-and-for-all.patch b/queue-5.15/padata-fix-pd-uaf-once-and-for-all.patch new file mode 100644 index 0000000000..182d359b90 --- /dev/null +++ b/queue-5.15/padata-fix-pd-uaf-once-and-for-all.patch @@ -0,0 +1,280 @@ +From c38801f909cd4cd4693bbee1941ea3aed89fed27 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 28 Apr 2026 13:07:58 +0800 +Subject: padata: Fix pd UAF once and for all + +From: Herbert Xu + +[ Upstream commit 71203f68c7749609d7fc8ae6ad054bdedeb24f91 ] + +There is a race condition/UAF in padata_reorder that goes back +to the initial commit. A reference count is taken at the start +of the process in padata_do_parallel, and released at the end in +padata_serial_worker. + +This reference count is (and only is) required for padata_replace +to function correctly. If padata_replace is never called then +there is no issue. + +In the function padata_reorder which serves as the core of padata, +as soon as padata is added to queue->serial.list, and the associated +spin lock released, that padata may be processed and the reference +count on pd would go away. + +Fix this by getting the next padata before the squeue->serial lock +is released. + +In order to make this possible, simplify padata_reorder by only +calling it once the next padata arrives. + +Fixes: 16295bec6398 ("padata: Generic parallelization/serialization interface") +Signed-off-by: Herbert Xu +[ Adjust context of padata_find_next(). Replace +cpumask_next_wrap(cpu, pd->cpumask.pcpu) with +cpumask_next_wrap(cpu, pd->cpumask.pcpu, -1, false) in padata_reorder() in +v5.15 according to dc5bb9b769c9 ("cpumask: deprecate cpumask_next_wrap()") and +f954a2d37637 ("padata: switch padata_find_next() to using cpumask_next_wrap()") +. ] +Signed-off-by: Bin Lan +Signed-off-by: Sasha Levin +--- + include/linux/padata.h | 3 - + kernel/padata.c | 136 +++++++++++------------------------------ + 2 files changed, 37 insertions(+), 102 deletions(-) + +diff --git a/include/linux/padata.h b/include/linux/padata.h +index 495b16b6b4d72..9ca779d7e310e 100644 +--- a/include/linux/padata.h ++++ b/include/linux/padata.h +@@ -91,7 +91,6 @@ struct padata_cpumask { + * @cpu: Next CPU to be processed. + * @cpumask: The cpumasks in use for parallel and serial workers. + * @reorder_work: work struct for reordering. +- * @lock: Reorder lock. + */ + struct parallel_data { + struct padata_shell *ps; +@@ -102,8 +101,6 @@ struct parallel_data { + unsigned int processed; + int cpu; + struct padata_cpumask cpumask; +- struct work_struct reorder_work; +- spinlock_t ____cacheline_aligned lock; + }; + + /** +diff --git a/kernel/padata.c b/kernel/padata.c +index 5453f57509067..93af1e9bb3aeb 100644 +--- a/kernel/padata.c ++++ b/kernel/padata.c +@@ -253,20 +253,17 @@ EXPORT_SYMBOL(padata_do_parallel); + * be parallel processed by another cpu and is not yet present in + * the cpu's reorder queue. + */ +-static struct padata_priv *padata_find_next(struct parallel_data *pd, +- bool remove_object) ++static struct padata_priv *padata_find_next(struct parallel_data *pd, int cpu, ++ unsigned int processed) + { + struct padata_priv *padata; + struct padata_list *reorder; +- int cpu = pd->cpu; + + reorder = per_cpu_ptr(pd->reorder_list, cpu); + + spin_lock(&reorder->lock); +- if (list_empty(&reorder->list)) { +- spin_unlock(&reorder->lock); +- return NULL; +- } ++ if (list_empty(&reorder->list)) ++ goto notfound; + + padata = list_entry(reorder->list.next, struct padata_priv, list); + +@@ -274,101 +271,52 @@ static struct padata_priv *padata_find_next(struct parallel_data *pd, + * Checks the rare case where two or more parallel jobs have hashed to + * the same CPU and one of the later ones finishes first. + */ +- if (padata->seq_nr != pd->processed) { +- spin_unlock(&reorder->lock); +- return NULL; +- } +- +- if (remove_object) { +- list_del_init(&padata->list); +- ++pd->processed; +- /* When sequence wraps around, reset to the first CPU. */ +- if (unlikely(pd->processed == 0)) +- pd->cpu = cpumask_first(pd->cpumask.pcpu); +- else +- pd->cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, -1, false); +- } ++ if (padata->seq_nr != processed) ++ goto notfound; + ++ list_del_init(&padata->list); + spin_unlock(&reorder->lock); + return padata; ++ ++notfound: ++ pd->processed = processed; ++ pd->cpu = cpu; ++ spin_unlock(&reorder->lock); ++ return NULL; + } + +-static void padata_reorder(struct parallel_data *pd) ++static void padata_reorder(struct padata_priv *padata) + { ++ struct parallel_data *pd = padata->pd; + struct padata_instance *pinst = pd->ps->pinst; +- int cb_cpu; +- struct padata_priv *padata; +- struct padata_serial_queue *squeue; +- struct padata_list *reorder; ++ unsigned int processed; ++ int cpu; + +- /* +- * We need to ensure that only one cpu can work on dequeueing of +- * the reorder queue the time. Calculating in which percpu reorder +- * queue the next object will arrive takes some time. A spinlock +- * would be highly contended. Also it is not clear in which order +- * the objects arrive to the reorder queues. So a cpu could wait to +- * get the lock just to notice that there is nothing to do at the +- * moment. Therefore we use a trylock and let the holder of the lock +- * care for all the objects enqueued during the holdtime of the lock. +- */ +- if (!spin_trylock_bh(&pd->lock)) +- return; ++ processed = pd->processed; ++ cpu = pd->cpu; + +- while (1) { +- padata = padata_find_next(pd, true); ++ do { ++ struct padata_serial_queue *squeue; ++ int cb_cpu; + +- /* +- * If the next object that needs serialization is parallel +- * processed by another cpu and is still on it's way to the +- * cpu's reorder queue, nothing to do for now. +- */ +- if (!padata) +- break; ++ cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, -1, false); ++ processed++; + + cb_cpu = padata->cb_cpu; + squeue = per_cpu_ptr(pd->squeue, cb_cpu); + + spin_lock(&squeue->serial.lock); + list_add_tail(&padata->list, &squeue->serial.list); +- spin_unlock(&squeue->serial.lock); +- + queue_work_on(cb_cpu, pinst->serial_wq, &squeue->work); +- } + +- spin_unlock_bh(&pd->lock); +- +- /* +- * The next object that needs serialization might have arrived to +- * the reorder queues in the meantime. +- * +- * Ensure reorder queue is read after pd->lock is dropped so we see +- * new objects from another task in padata_do_serial. Pairs with +- * smp_mb in padata_do_serial. +- */ +- smp_mb(); +- +- reorder = per_cpu_ptr(pd->reorder_list, pd->cpu); +- if (!list_empty(&reorder->list) && padata_find_next(pd, false)) { + /* +- * Other context(eg. the padata_serial_worker) can finish the request. +- * To avoid UAF issue, add pd ref here, and put pd ref after reorder_work finish. ++ * If the next object that needs serialization is parallel ++ * processed by another cpu and is still on it's way to the ++ * cpu's reorder queue, end the loop. + */ +- padata_get_pd(pd); +- if (!queue_work(pinst->serial_wq, &pd->reorder_work)) +- padata_put_pd(pd); +- } +-} +- +-static void invoke_padata_reorder(struct work_struct *work) +-{ +- struct parallel_data *pd; +- +- local_bh_disable(); +- pd = container_of(work, struct parallel_data, reorder_work); +- padata_reorder(pd); +- local_bh_enable(); +- /* Pairs with putting the reorder_work in the serial_wq */ +- padata_put_pd(pd); ++ padata = padata_find_next(pd, cpu, processed); ++ spin_unlock(&squeue->serial.lock); ++ } while (padata); + } + + static void padata_serial_worker(struct work_struct *serial_work) +@@ -419,6 +367,7 @@ void padata_do_serial(struct padata_priv *padata) + struct padata_list *reorder = per_cpu_ptr(pd->reorder_list, hashed_cpu); + struct padata_priv *cur; + struct list_head *pos; ++ bool gotit = true; + + spin_lock(&reorder->lock); + /* Sort in ascending order of sequence number. */ +@@ -428,17 +377,14 @@ void padata_do_serial(struct padata_priv *padata) + if ((signed int)(cur->seq_nr - padata->seq_nr) < 0) + break; + } +- list_add(&padata->list, pos); ++ if (padata->seq_nr != pd->processed) { ++ gotit = false; ++ list_add(&padata->list, pos); ++ } + spin_unlock(&reorder->lock); + +- /* +- * Ensure the addition to the reorder list is ordered correctly +- * with the trylock of pd->lock in padata_reorder. Pairs with smp_mb +- * in padata_reorder. +- */ +- smp_mb(); +- +- padata_reorder(pd); ++ if (gotit) ++ padata_reorder(padata); + } + EXPORT_SYMBOL(padata_do_serial); + +@@ -625,9 +571,7 @@ static struct parallel_data *padata_alloc_pd(struct padata_shell *ps) + padata_init_squeues(pd); + pd->seq_nr = -1; + refcount_set(&pd->refcnt, 1); +- spin_lock_init(&pd->lock); + pd->cpu = cpumask_first(pd->cpumask.pcpu); +- INIT_WORK(&pd->reorder_work, invoke_padata_reorder); + + return pd; + +@@ -1137,12 +1081,6 @@ void padata_free_shell(struct padata_shell *ps) + if (!ps) + return; + +- /* +- * Wait for all _do_serial calls to finish to avoid touching +- * freed pd's and ps's. +- */ +- synchronize_rcu(); +- + mutex_lock(&ps->pinst->lock); + list_del(&ps->list); + pd = rcu_dereference_protected(ps->pd, 1); +-- +2.53.0 + diff --git a/queue-5.15/padata-remove-comment-for-reorder_work.patch b/queue-5.15/padata-remove-comment-for-reorder_work.patch new file mode 100644 index 0000000000..96ab2252b4 --- /dev/null +++ b/queue-5.15/padata-remove-comment-for-reorder_work.patch @@ -0,0 +1,35 @@ +From dea2675be6762451916d65a8904e099c5aba8e99 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 28 Apr 2026 13:07:59 +0800 +Subject: padata: Remove comment for reorder_work + +From: Herbert Xu + +[ Upstream commit 82a0302e7167d0b7c6cde56613db3748f8dd806d ] + +Remove comment for reorder_work which no longer exists. + +Reported-by: Stephen Rothwell +Fixes: 71203f68c774 ("padata: Fix pd UAF once and for all") +Signed-off-by: Herbert Xu +Signed-off-by: Bin Lan +Signed-off-by: Sasha Levin +--- + include/linux/padata.h | 1 - + 1 file changed, 1 deletion(-) + +diff --git a/include/linux/padata.h b/include/linux/padata.h +index 9ca779d7e310e..6f07e12a43819 100644 +--- a/include/linux/padata.h ++++ b/include/linux/padata.h +@@ -90,7 +90,6 @@ struct padata_cpumask { + * @processed: Number of already processed objects. + * @cpu: Next CPU to be processed. + * @cpumask: The cpumasks in use for parallel and serial workers. +- * @reorder_work: work struct for reordering. + */ + struct parallel_data { + struct padata_shell *ps; +-- +2.53.0 + diff --git a/queue-5.15/series b/queue-5.15/series index 04f18fb528..4661678fad 100644 --- a/queue-5.15/series +++ b/queue-5.15/series @@ -193,3 +193,6 @@ ibmasm-fix-heap-over-read-in-ibmasm_send_i2o_message.patch firmware-google-framebuffer-do-not-mark-framebuffer-as-busy.patch scsi-ufs-core-fix-use-after-free-in-init-error-and-r.patch device-property-make-modifications-of-fwnode-flags-thread-safe.patch +padata-fix-pd-uaf-once-and-for-all.patch +padata-remove-comment-for-reorder_work.patch +driver-core-don-t-let-a-device-probe-until-it-s-read.patch diff --git a/queue-6.1/driver-core-don-t-let-a-device-probe-until-it-s-read.patch b/queue-6.1/driver-core-don-t-let-a-device-probe-until-it-s-read.patch new file mode 100644 index 0000000000..f30c877862 --- /dev/null +++ b/queue-6.1/driver-core-don-t-let-a-device-probe-until-it-s-read.patch @@ -0,0 +1,224 @@ +From f7ec631fcf71f6b6829792f4b2a4dc31e323416a Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 27 Apr 2026 10:17:02 -0700 +Subject: driver core: Don't let a device probe until it's ready + +From: Douglas Anderson + +[ Upstream commit a2225b6e834a838ae3c93709760edc0a169eb2f2 ] + +The moment we link a "struct device" into the list of devices for the +bus, it's possible probe can happen. This is because another thread +can load the driver at any time and that can cause the device to +probe. This has been seen in practice with a stack crawl that looks +like this [1]: + + really_probe() + __driver_probe_device() + driver_probe_device() + __driver_attach() + bus_for_each_dev() + driver_attach() + bus_add_driver() + driver_register() + __platform_driver_register() + init_module() [some module] + do_one_initcall() + do_init_module() + load_module() + __arm64_sys_finit_module() + invoke_syscall() + +As a result of the above, it was seen that device_links_driver_bound() +could be called for the device before "dev->fwnode->dev" was +assigned. This prevented __fw_devlink_pickup_dangling_consumers() from +being called which meant that other devices waiting on our driver's +sub-nodes were stuck deferring forever. + +It's believed that this problem is showing up suddenly for two +reasons: +1. Android has recently (last ~1 year) implemented an optimization to + the order it loads modules [2]. When devices opt-in to this faster + loading, modules are loaded one-after-the-other very quickly. This + is unlike how other distributions do it. The reproduction of this + problem has only been seen on devices that opt-in to Android's + "parallel module loading". +2. Android devices typically opt-in to fw_devlink, and the most + noticeable issue is the NULL "dev->fwnode->dev" in + device_links_driver_bound(). fw_devlink is somewhat new code and + also not in use by all Linux devices. + +Even though the specific symptom where "dev->fwnode->dev" wasn't +assigned could be fixed by moving that assignment higher in +device_add(), other parts of device_add() (like the call to +device_pm_add()) are also important to run before probe. Only moving +the "dev->fwnode->dev" assignment would likely fix the current +symptoms but lead to difficult-to-debug problems in the future. + +Fix the problem by preventing probe until device_add() has run far +enough that the device is ready to probe. If somehow we end up trying +to probe before we're allowed, __driver_probe_device() will return +-EPROBE_DEFER which will make certain the device is noticed. + +In the race condition that was seen with Android's faster module +loading, we will temporarily add the device to the deferred list and +then take it off immediately when device_add() probes the device. + +Instead of adding another flag to the bitfields already in "struct +device", instead add a new "flags" field and use that. This allows us +to freely change the bit from different thread without worrying about +corrupting nearby bits (and means threads changing other bit won't +corrupt us). + +[1] Captured on a machine running a downstream 6.6 kernel +[2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libmodprobe/libmodprobe.cpp?q=LoadModulesParallel + +Cc: stable@vger.kernel.org +Fixes: 2023c610dc54 ("Driver core: add new device to bus's list before probing") +Reviewed-by: Alan Stern +Reviewed-by: Rafael J. Wysocki (Intel) +Reviewed-by: Danilo Krummrich +Acked-by: Greg Kroah-Hartman +Acked-by: Marek Szyprowski +Signed-off-by: Douglas Anderson +Link: https://patch.msgid.link/20260406162231.v5.1.Id750b0fbcc94f23ed04b7aecabcead688d0d8c17@changeid +Signed-off-by: Danilo Krummrich +Signed-off-by: Douglas Anderson +Signed-off-by: Sasha Levin +--- + drivers/base/core.c | 15 ++++++++++++++ + drivers/base/dd.c | 20 +++++++++++++++++++ + include/linux/device.h | 44 ++++++++++++++++++++++++++++++++++++++++++ + 3 files changed, 79 insertions(+) + +diff --git a/drivers/base/core.c b/drivers/base/core.c +index 157775dc401b2..81a8fe313f6a4 100644 +--- a/drivers/base/core.c ++++ b/drivers/base/core.c +@@ -3694,6 +3694,21 @@ int device_add(struct device *dev) + fw_devlink_link_device(dev); + } + ++ /* ++ * The moment the device was linked into the bus's "klist_devices" in ++ * bus_add_device() then it's possible that probe could have been ++ * attempted in a different thread via userspace loading a driver ++ * matching the device. "ready_to_probe" being unset would have ++ * blocked those attempts. Now that all of the above initialization has ++ * happened, unblock probe. If probe happens through another thread ++ * after this point but before bus_probe_device() runs then it's fine. ++ * bus_probe_device() -> device_initial_probe() -> __device_attach() ++ * will notice (under device_lock) that the device is already bound. ++ */ ++ device_lock(dev); ++ dev_set_ready_to_probe(dev); ++ device_unlock(dev); ++ + bus_probe_device(dev); + + /* +diff --git a/drivers/base/dd.c b/drivers/base/dd.c +index dbbe2cebb8917..1c6f266f9367f 100644 +--- a/drivers/base/dd.c ++++ b/drivers/base/dd.c +@@ -770,6 +770,26 @@ static int __driver_probe_device(struct device_driver *drv, struct device *dev) + if (dev->driver) + return -EBUSY; + ++ /* ++ * In device_add(), the "struct device" gets linked into the subsystem's ++ * list of devices and broadcast to userspace (via uevent) before we're ++ * quite ready to probe. Those open pathways to driver probe before ++ * we've finished enough of device_add() to reliably support probe. ++ * Detect this and tell other pathways to try again later. device_add() ++ * itself will also try to probe immediately after setting ++ * "ready_to_probe". ++ */ ++ if (!dev_ready_to_probe(dev)) ++ return dev_err_probe(dev, -EPROBE_DEFER, "Device not ready to probe\n"); ++ ++ /* ++ * Set can_match = true after calling dev_ready_to_probe(), so ++ * driver_deferred_probe_add() won't actually add the device to the ++ * deferred probe list when dev_ready_to_probe() returns false. ++ * ++ * When dev_ready_to_probe() returns false, it means that device_add() ++ * will do another probe() attempt for us. ++ */ + dev->can_match = true; + pr_debug("bus: '%s': %s: matched device %s with driver %s\n", + drv->bus->name, __func__, dev_name(dev), drv->name); +diff --git a/include/linux/device.h b/include/linux/device.h +index cc84521795b14..528e0dad742e1 100644 +--- a/include/linux/device.h ++++ b/include/linux/device.h +@@ -457,6 +457,21 @@ struct device_physical_location { + bool lid; + }; + ++/** ++ * enum struct_device_flags - Flags in struct device ++ * ++ * Each flag should have a set of accessor functions created via ++ * __create_dev_flag_accessors() for each access. ++ * ++ * @DEV_FLAG_READY_TO_PROBE: If set then device_add() has finished enough ++ * initialization that probe could be called. ++ */ ++enum struct_device_flags { ++ DEV_FLAG_READY_TO_PROBE = 0, ++ ++ DEV_FLAG_COUNT ++}; ++ + /** + * struct device - The basic device structure + * @parent: The device's "parent" device, the device to which it is attached. +@@ -545,6 +560,7 @@ struct device_physical_location { + * and optionall (if the coherent mask is large enough) also + * for dma allocations. This flag is managed by the dma ops + * instance from ->dma_supported. ++ * @flags: DEV_FLAG_XXX flags. Use atomic bitfield operations to modify. + * + * At the lowest level, every device in a Linux system is represented by an + * instance of struct device. The device structure contains the information +@@ -652,8 +668,36 @@ struct device { + #ifdef CONFIG_DMA_OPS_BYPASS + bool dma_ops_bypass : 1; + #endif ++ ++ DECLARE_BITMAP(flags, DEV_FLAG_COUNT); + }; + ++#define __create_dev_flag_accessors(accessor_name, flag_name) \ ++static inline bool dev_##accessor_name(const struct device *dev) \ ++{ \ ++ return test_bit(flag_name, dev->flags); \ ++} \ ++static inline void dev_set_##accessor_name(struct device *dev) \ ++{ \ ++ set_bit(flag_name, dev->flags); \ ++} \ ++static inline void dev_clear_##accessor_name(struct device *dev) \ ++{ \ ++ clear_bit(flag_name, dev->flags); \ ++} \ ++static inline void dev_assign_##accessor_name(struct device *dev, bool value) \ ++{ \ ++ assign_bit(flag_name, dev->flags, value); \ ++} \ ++static inline bool dev_test_and_set_##accessor_name(struct device *dev) \ ++{ \ ++ return test_and_set_bit(flag_name, dev->flags); \ ++} ++ ++__create_dev_flag_accessors(ready_to_probe, DEV_FLAG_READY_TO_PROBE); ++ ++#undef __create_dev_flag_accessors ++ + /** + * struct device_link - Device link representation. + * @supplier: The device on the supplier end of the link. +-- +2.53.0 + diff --git a/queue-6.1/series b/queue-6.1/series index 6286755b28..53f67508a8 100644 --- a/queue-6.1/series +++ b/queue-6.1/series @@ -176,3 +176,4 @@ blk-mq-fix-null-dereference-on-q-elevator-in-blk_mq_.patch arm64-set-__exception_irq_entry-with-__irq_entry-as-.patch regset-use-kvzalloc-for-regset_get_alloc.patch device-property-make-modifications-of-fwnode-flags-thread-safe.patch +driver-core-don-t-let-a-device-probe-until-it-s-read.patch diff --git a/queue-6.6/driver-core-don-t-let-a-device-probe-until-it-s-read.patch b/queue-6.6/driver-core-don-t-let-a-device-probe-until-it-s-read.patch new file mode 100644 index 0000000000..fd6167a9d6 --- /dev/null +++ b/queue-6.6/driver-core-don-t-let-a-device-probe-until-it-s-read.patch @@ -0,0 +1,224 @@ +From 4ecb43fc3e2323b5a1caedd92c806454adc896e0 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 27 Apr 2026 09:52:41 -0700 +Subject: driver core: Don't let a device probe until it's ready + +From: Douglas Anderson + +[ Upstream commit a2225b6e834a838ae3c93709760edc0a169eb2f2 ] + +The moment we link a "struct device" into the list of devices for the +bus, it's possible probe can happen. This is because another thread +can load the driver at any time and that can cause the device to +probe. This has been seen in practice with a stack crawl that looks +like this [1]: + + really_probe() + __driver_probe_device() + driver_probe_device() + __driver_attach() + bus_for_each_dev() + driver_attach() + bus_add_driver() + driver_register() + __platform_driver_register() + init_module() [some module] + do_one_initcall() + do_init_module() + load_module() + __arm64_sys_finit_module() + invoke_syscall() + +As a result of the above, it was seen that device_links_driver_bound() +could be called for the device before "dev->fwnode->dev" was +assigned. This prevented __fw_devlink_pickup_dangling_consumers() from +being called which meant that other devices waiting on our driver's +sub-nodes were stuck deferring forever. + +It's believed that this problem is showing up suddenly for two +reasons: +1. Android has recently (last ~1 year) implemented an optimization to + the order it loads modules [2]. When devices opt-in to this faster + loading, modules are loaded one-after-the-other very quickly. This + is unlike how other distributions do it. The reproduction of this + problem has only been seen on devices that opt-in to Android's + "parallel module loading". +2. Android devices typically opt-in to fw_devlink, and the most + noticeable issue is the NULL "dev->fwnode->dev" in + device_links_driver_bound(). fw_devlink is somewhat new code and + also not in use by all Linux devices. + +Even though the specific symptom where "dev->fwnode->dev" wasn't +assigned could be fixed by moving that assignment higher in +device_add(), other parts of device_add() (like the call to +device_pm_add()) are also important to run before probe. Only moving +the "dev->fwnode->dev" assignment would likely fix the current +symptoms but lead to difficult-to-debug problems in the future. + +Fix the problem by preventing probe until device_add() has run far +enough that the device is ready to probe. If somehow we end up trying +to probe before we're allowed, __driver_probe_device() will return +-EPROBE_DEFER which will make certain the device is noticed. + +In the race condition that was seen with Android's faster module +loading, we will temporarily add the device to the deferred list and +then take it off immediately when device_add() probes the device. + +Instead of adding another flag to the bitfields already in "struct +device", instead add a new "flags" field and use that. This allows us +to freely change the bit from different thread without worrying about +corrupting nearby bits (and means threads changing other bit won't +corrupt us). + +[1] Captured on a machine running a downstream 6.6 kernel +[2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libmodprobe/libmodprobe.cpp?q=LoadModulesParallel + +Cc: stable@vger.kernel.org +Fixes: 2023c610dc54 ("Driver core: add new device to bus's list before probing") +Reviewed-by: Alan Stern +Reviewed-by: Rafael J. Wysocki (Intel) +Reviewed-by: Danilo Krummrich +Acked-by: Greg Kroah-Hartman +Acked-by: Marek Szyprowski +Signed-off-by: Douglas Anderson +Link: https://patch.msgid.link/20260406162231.v5.1.Id750b0fbcc94f23ed04b7aecabcead688d0d8c17@changeid +Signed-off-by: Danilo Krummrich +Signed-off-by: Douglas Anderson +Signed-off-by: Sasha Levin +--- + drivers/base/core.c | 15 ++++++++++++++ + drivers/base/dd.c | 20 +++++++++++++++++++ + include/linux/device.h | 44 ++++++++++++++++++++++++++++++++++++++++++ + 3 files changed, 79 insertions(+) + +diff --git a/drivers/base/core.c b/drivers/base/core.c +index a7033e11e38f3..3c172e6d3fe0d 100644 +--- a/drivers/base/core.c ++++ b/drivers/base/core.c +@@ -3680,6 +3680,21 @@ int device_add(struct device *dev) + fw_devlink_link_device(dev); + } + ++ /* ++ * The moment the device was linked into the bus's "klist_devices" in ++ * bus_add_device() then it's possible that probe could have been ++ * attempted in a different thread via userspace loading a driver ++ * matching the device. "ready_to_probe" being unset would have ++ * blocked those attempts. Now that all of the above initialization has ++ * happened, unblock probe. If probe happens through another thread ++ * after this point but before bus_probe_device() runs then it's fine. ++ * bus_probe_device() -> device_initial_probe() -> __device_attach() ++ * will notice (under device_lock) that the device is already bound. ++ */ ++ device_lock(dev); ++ dev_set_ready_to_probe(dev); ++ device_unlock(dev); ++ + bus_probe_device(dev); + + /* +diff --git a/drivers/base/dd.c b/drivers/base/dd.c +index 7e2fb159bb895..d371c3437dc6b 100644 +--- a/drivers/base/dd.c ++++ b/drivers/base/dd.c +@@ -785,6 +785,26 @@ static int __driver_probe_device(struct device_driver *drv, struct device *dev) + if (dev->driver) + return -EBUSY; + ++ /* ++ * In device_add(), the "struct device" gets linked into the subsystem's ++ * list of devices and broadcast to userspace (via uevent) before we're ++ * quite ready to probe. Those open pathways to driver probe before ++ * we've finished enough of device_add() to reliably support probe. ++ * Detect this and tell other pathways to try again later. device_add() ++ * itself will also try to probe immediately after setting ++ * "ready_to_probe". ++ */ ++ if (!dev_ready_to_probe(dev)) ++ return dev_err_probe(dev, -EPROBE_DEFER, "Device not ready to probe\n"); ++ ++ /* ++ * Set can_match = true after calling dev_ready_to_probe(), so ++ * driver_deferred_probe_add() won't actually add the device to the ++ * deferred probe list when dev_ready_to_probe() returns false. ++ * ++ * When dev_ready_to_probe() returns false, it means that device_add() ++ * will do another probe() attempt for us. ++ */ + dev->can_match = true; + pr_debug("bus: '%s': %s: matched device %s with driver %s\n", + drv->bus->name, __func__, dev_name(dev), drv->name); +diff --git a/include/linux/device.h b/include/linux/device.h +index e5f1a773dc547..34a327f5797c7 100644 +--- a/include/linux/device.h ++++ b/include/linux/device.h +@@ -602,6 +602,21 @@ struct device_physical_location { + bool lid; + }; + ++/** ++ * enum struct_device_flags - Flags in struct device ++ * ++ * Each flag should have a set of accessor functions created via ++ * __create_dev_flag_accessors() for each access. ++ * ++ * @DEV_FLAG_READY_TO_PROBE: If set then device_add() has finished enough ++ * initialization that probe could be called. ++ */ ++enum struct_device_flags { ++ DEV_FLAG_READY_TO_PROBE = 0, ++ ++ DEV_FLAG_COUNT ++}; ++ + /** + * struct device - The basic device structure + * @parent: The device's "parent" device, the device to which it is attached. +@@ -693,6 +708,7 @@ struct device_physical_location { + * and optionall (if the coherent mask is large enough) also + * for dma allocations. This flag is managed by the dma ops + * instance from ->dma_supported. ++ * @flags: DEV_FLAG_XXX flags. Use atomic bitfield operations to modify. + * + * At the lowest level, every device in a Linux system is represented by an + * instance of struct device. The device structure contains the information +@@ -805,8 +821,36 @@ struct device { + #ifdef CONFIG_DMA_OPS_BYPASS + bool dma_ops_bypass : 1; + #endif ++ ++ DECLARE_BITMAP(flags, DEV_FLAG_COUNT); + }; + ++#define __create_dev_flag_accessors(accessor_name, flag_name) \ ++static inline bool dev_##accessor_name(const struct device *dev) \ ++{ \ ++ return test_bit(flag_name, dev->flags); \ ++} \ ++static inline void dev_set_##accessor_name(struct device *dev) \ ++{ \ ++ set_bit(flag_name, dev->flags); \ ++} \ ++static inline void dev_clear_##accessor_name(struct device *dev) \ ++{ \ ++ clear_bit(flag_name, dev->flags); \ ++} \ ++static inline void dev_assign_##accessor_name(struct device *dev, bool value) \ ++{ \ ++ assign_bit(flag_name, dev->flags, value); \ ++} \ ++static inline bool dev_test_and_set_##accessor_name(struct device *dev) \ ++{ \ ++ return test_and_set_bit(flag_name, dev->flags); \ ++} ++ ++__create_dev_flag_accessors(ready_to_probe, DEV_FLAG_READY_TO_PROBE); ++ ++#undef __create_dev_flag_accessors ++ + /** + * struct device_link - Device link representation. + * @supplier: The device on the supplier end of the link. +-- +2.53.0 + diff --git a/queue-6.6/loongarch-add-spectre-boundry-for-syscall-dispatch-t.patch b/queue-6.6/loongarch-add-spectre-boundry-for-syscall-dispatch-t.patch new file mode 100644 index 0000000000..7b0216c096 --- /dev/null +++ b/queue-6.6/loongarch-add-spectre-boundry-for-syscall-dispatch-t.patch @@ -0,0 +1,46 @@ +From 5e2f6d2510fd6e592db358becbbde8db7ef1bdb9 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 28 Apr 2026 05:02:27 -0400 +Subject: LoongArch: Add spectre boundry for syscall dispatch table + +From: Greg Kroah-Hartman + +[ Upstream commit 0c965d2784fbbd7f8e3b96d875c9cfdf7c00da3d ] + +The LoongArch syscall number is directly controlled by userspace, but +does not have a array_index_nospec() boundry to prevent access past the +syscall function pointer tables. + +Cc: stable@vger.kernel.org +Assisted-by: gkh_clanker_2000 +Signed-off-by: Greg Kroah-Hartman +Signed-off-by: Huacai Chen +Signed-off-by: Sasha Levin +--- + arch/loongarch/kernel/syscall.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/arch/loongarch/kernel/syscall.c b/arch/loongarch/kernel/syscall.c +index b4c5acd7aa3b3..f4e3bd219b1d7 100644 +--- a/arch/loongarch/kernel/syscall.c ++++ b/arch/loongarch/kernel/syscall.c +@@ -9,6 +9,7 @@ + #include + #include + #include ++#include + #include + #include + +@@ -55,7 +56,7 @@ void noinstr do_syscall(struct pt_regs *regs) + nr = syscall_enter_from_user_mode(regs, nr); + + if (nr < NR_syscalls) { +- syscall_fn = sys_call_table[nr]; ++ syscall_fn = sys_call_table[array_index_nospec(nr, NR_syscalls)]; + regs->regs[4] = syscall_fn(regs->orig_a0, regs->regs[5], regs->regs[6], + regs->regs[7], regs->regs[8], regs->regs[9]); + } +-- +2.53.0 + diff --git a/queue-6.6/series b/queue-6.6/series index 234d28b944..a563993909 100644 --- a/queue-6.6/series +++ b/queue-6.6/series @@ -18,3 +18,5 @@ drm-amdgpu-use-vmemdup_array_user-in-amdgpu_bo_creat.patch drm-amdgpu-limit-bo-list-entry-count-to-prevent-reso.patch regset-use-kvzalloc-for-regset_get_alloc.patch device-property-make-modifications-of-fwnode-flags-thread-safe.patch +driver-core-don-t-let-a-device-probe-until-it-s-read.patch +loongarch-add-spectre-boundry-for-syscall-dispatch-t.patch