From: Greg Kroah-Hartman Date: Mon, 24 Feb 2025 13:20:31 +0000 (+0100) Subject: 6.12-stable patches X-Git-Tag: v6.6.80~14 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=8fc3e353d3eaaacafc470cb6e4cc1a813fa2b0a2;p=thirdparty%2Fkernel%2Fstable-queue.git 6.12-stable patches added patches: net-pse-pd-fix-deadlock-in-current-limit-functions.patch sched_ext-fix-incorrect-assumption-about-migration-disabled-tasks-in-task_can_run_on_remote_rq.patch tracing-fix-using-ret-variable-in-tracing_set_tracer.patch --- diff --git a/queue-6.12/net-pse-pd-fix-deadlock-in-current-limit-functions.patch b/queue-6.12/net-pse-pd-fix-deadlock-in-current-limit-functions.patch new file mode 100644 index 0000000000..f307721135 --- /dev/null +++ b/queue-6.12/net-pse-pd-fix-deadlock-in-current-limit-functions.patch @@ -0,0 +1,45 @@ +From 488fb6effe03e20f38d34da7425de77bbd3e2665 Mon Sep 17 00:00:00 2001 +From: Kory Maincent +Date: Wed, 12 Feb 2025 16:17:51 +0100 +Subject: net: pse-pd: Fix deadlock in current limit functions + +From: Kory Maincent + +commit 488fb6effe03e20f38d34da7425de77bbd3e2665 upstream. + +Fix a deadlock in pse_pi_get_current_limit and pse_pi_set_current_limit +caused by consecutive mutex_lock calls. One in the function itself and +another in pse_pi_get_voltage. + +Resolve the issue by using the unlocked version of pse_pi_get_voltage +instead. + +Fixes: e0a5e2bba38a ("net: pse-pd: Use power limit at driver side instead of current limit") +Signed-off-by: Kory Maincent +Link: https://patch.msgid.link/20250212151751.1515008-1-kory.maincent@bootlin.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Greg Kroah-Hartman +--- + drivers/net/pse-pd/pse_core.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/drivers/net/pse-pd/pse_core.c ++++ b/drivers/net/pse-pd/pse_core.c +@@ -309,7 +309,7 @@ static int pse_pi_get_current_limit(stru + goto out; + mW = ret; + +- ret = pse_pi_get_voltage(rdev); ++ ret = _pse_pi_get_voltage(rdev); + if (!ret) { + dev_err(pcdev->dev, "Voltage null\n"); + ret = -ERANGE; +@@ -346,7 +346,7 @@ static int pse_pi_set_current_limit(stru + + id = rdev_get_id(rdev); + mutex_lock(&pcdev->lock); +- ret = pse_pi_get_voltage(rdev); ++ ret = _pse_pi_get_voltage(rdev); + if (!ret) { + dev_err(pcdev->dev, "Voltage null\n"); + ret = -ERANGE; diff --git a/queue-6.12/sched_ext-fix-incorrect-assumption-about-migration-disabled-tasks-in-task_can_run_on_remote_rq.patch b/queue-6.12/sched_ext-fix-incorrect-assumption-about-migration-disabled-tasks-in-task_can_run_on_remote_rq.patch new file mode 100644 index 0000000000..829f2974d1 --- /dev/null +++ b/queue-6.12/sched_ext-fix-incorrect-assumption-about-migration-disabled-tasks-in-task_can_run_on_remote_rq.patch @@ -0,0 +1,106 @@ +From f3f08c3acfb8860e07a22814a344e83c99ad7398 Mon Sep 17 00:00:00 2001 +From: Tejun Heo +Date: Mon, 10 Feb 2025 09:27:09 -1000 +Subject: sched_ext: Fix incorrect assumption about migration disabled tasks in task_can_run_on_remote_rq() + +From: Tejun Heo + +commit f3f08c3acfb8860e07a22814a344e83c99ad7398 upstream. + +While fixing migration disabled task handling, 32966821574c ("sched_ext: Fix +migration disabled handling in targeted dispatches") assumed that a +migration disabled task's ->cpus_ptr would only have the pinned CPU. While +this is eventually true for migration disabled tasks that are switched out, +->cpus_ptr update is performed by migrate_disable_switch() which is called +right before context_switch() in __scheduler(). However, the task is +enqueued earlier during pick_next_task() via put_prev_task_scx(), so there +is a race window where another CPU can see the task on a DSQ. + +If the CPU tries to dispatch the migration disabled task while in that +window, task_allowed_on_cpu() will succeed and task_can_run_on_remote_rq() +will subsequently trigger SCHED_WARN(is_migration_disabled()). + + WARNING: CPU: 8 PID: 1837 at kernel/sched/ext.c:2466 task_can_run_on_remote_rq+0x12e/0x140 + Sched_ext: layered (enabled+all), task: runnable_at=-10ms + RIP: 0010:task_can_run_on_remote_rq+0x12e/0x140 + ... + + consume_dispatch_q+0xab/0x220 + scx_bpf_dsq_move_to_local+0x58/0xd0 + bpf_prog_84dd17b0654b6cf0_layered_dispatch+0x290/0x1cfa + bpf__sched_ext_ops_dispatch+0x4b/0xab + balance_one+0x1fe/0x3b0 + balance_scx+0x61/0x1d0 + prev_balance+0x46/0xc0 + __pick_next_task+0x73/0x1c0 + __schedule+0x206/0x1730 + schedule+0x3a/0x160 + __do_sys_sched_yield+0xe/0x20 + do_syscall_64+0xbb/0x1e0 + entry_SYSCALL_64_after_hwframe+0x77/0x7f + +Fix it by converting the SCHED_WARN() back to a regular failure path. Also, +perform the migration disabled test before task_allowed_on_cpu() test so +that BPF schedulers which fail to handle migration disabled tasks can be +noticed easily. + +While at it, adjust scx_ops_error() message for !task_allowed_on_cpu() case +for brevity and consistency. + +Signed-off-by: Tejun Heo +Fixes: 32966821574c ("sched_ext: Fix migration disabled handling in targeted dispatches") +Acked-by: Andrea Righi +Reported-by: Jake Hillion +Signed-off-by: Greg Kroah-Hartman +--- + kernel/sched/ext.c | 29 +++++++++++++++++++++-------- + 1 file changed, 21 insertions(+), 8 deletions(-) + +--- a/kernel/sched/ext.c ++++ b/kernel/sched/ext.c +@@ -2311,6 +2311,25 @@ static bool task_can_run_on_remote_rq(st + SCHED_WARN_ON(task_cpu(p) == cpu); + + /* ++ * If @p has migration disabled, @p->cpus_ptr is updated to contain only ++ * the pinned CPU in migrate_disable_switch() while @p is being switched ++ * out. However, put_prev_task_scx() is called before @p->cpus_ptr is ++ * updated and thus another CPU may see @p on a DSQ inbetween leading to ++ * @p passing the below task_allowed_on_cpu() check while migration is ++ * disabled. ++ * ++ * Test the migration disabled state first as the race window is narrow ++ * and the BPF scheduler failing to check migration disabled state can ++ * easily be masked if task_allowed_on_cpu() is done first. ++ */ ++ if (unlikely(is_migration_disabled(p))) { ++ if (trigger_error) ++ scx_ops_error("SCX_DSQ_LOCAL[_ON] cannot move migration disabled %s[%d] from CPU %d to %d", ++ p->comm, p->pid, task_cpu(p), cpu); ++ return false; ++ } ++ ++ /* + * We don't require the BPF scheduler to avoid dispatching to offline + * CPUs mostly for convenience but also because CPUs can go offline + * between scx_bpf_dispatch() calls and here. Trigger error iff the +@@ -2318,17 +2337,11 @@ static bool task_can_run_on_remote_rq(st + */ + if (!task_allowed_on_cpu(p, cpu)) { + if (trigger_error) +- scx_ops_error("SCX_DSQ_LOCAL[_ON] verdict target cpu %d not allowed for %s[%d]", +- cpu_of(rq), p->comm, p->pid); ++ scx_ops_error("SCX_DSQ_LOCAL[_ON] target CPU %d not allowed for %s[%d]", ++ cpu, p->comm, p->pid); + return false; + } + +- /* +- * If @p has migration disabled, @p->cpus_ptr only contains its current +- * CPU and the above task_allowed_on_cpu() test should have failed. +- */ +- SCHED_WARN_ON(is_migration_disabled(p)); +- + if (!scx_rq_online(rq)) + return false; + diff --git a/queue-6.12/series b/queue-6.12/series index 0a681ff7ab..e09436ddaf 100644 --- a/queue-6.12/series +++ b/queue-6.12/series @@ -145,3 +145,6 @@ edac-qcom-correct-interrupt-enable-register-configuration.patch ftrace-correct-preemption-accounting-for-function-tracing.patch ftrace-fix-accounting-of-adding-subops-to-a-manager-ops.patch ftrace-do-not-add-duplicate-entries-in-subops-manager-ops.patch +tracing-fix-using-ret-variable-in-tracing_set_tracer.patch +net-pse-pd-fix-deadlock-in-current-limit-functions.patch +sched_ext-fix-incorrect-assumption-about-migration-disabled-tasks-in-task_can_run_on_remote_rq.patch diff --git a/queue-6.12/tracing-fix-using-ret-variable-in-tracing_set_tracer.patch b/queue-6.12/tracing-fix-using-ret-variable-in-tracing_set_tracer.patch new file mode 100644 index 0000000000..7097ee1a5b --- /dev/null +++ b/queue-6.12/tracing-fix-using-ret-variable-in-tracing_set_tracer.patch @@ -0,0 +1,47 @@ +From 22bec11a569983f39c6061cb82279e7de9e3bdfc Mon Sep 17 00:00:00 2001 +From: Steven Rostedt +Date: Mon, 6 Jan 2025 11:11:43 -0500 +Subject: tracing: Fix using ret variable in tracing_set_tracer() + +From: Steven Rostedt + +commit 22bec11a569983f39c6061cb82279e7de9e3bdfc upstream. + +When the function tracing_set_tracer() switched over to using the guard() +infrastructure, it did not need to save the 'ret' variable and would just +return the value when an error arised, instead of setting ret and jumping +to an out label. + +When CONFIG_TRACER_SNAPSHOT is enabled, it had code that expected the +"ret" variable to be initialized to zero and had set 'ret' while holding +an arch_spin_lock() (not used by guard), and then upon releasing the lock +it would check 'ret' and exit if set. But because ret was only set when an +error occurred while holding the locks, 'ret' would be used uninitialized +if there was no error. The code in the CONFIG_TRACER_SNAPSHOT block should +be self contain. Make sure 'ret' is also set when no error occurred. + +Cc: Mathieu Desnoyers +Link: https://lore.kernel.org/20250106111143.2f90ff65@gandalf.local.home +Reported-by: kernel test robot +Reported-by: Dan Carpenter +Closes: https://lore.kernel.org/r/202412271654.nJVBuwmF-lkp@intel.com/ +Fixes: d33b10c0c73ad ("tracing: Switch trace.c code over to use guard()") +Signed-off-by: Steven Rostedt (Google) +Acked-by: Masami Hiramatsu (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/trace.c | 3 +-- + 1 file changed, 1 insertion(+), 2 deletions(-) + +--- a/kernel/trace/trace.c ++++ b/kernel/trace/trace.c +@@ -6125,8 +6125,7 @@ int tracing_set_tracer(struct trace_arra + if (t->use_max_tr) { + local_irq_disable(); + arch_spin_lock(&tr->max_lock); +- if (tr->cond_snapshot) +- ret = -EBUSY; ++ ret = tr->cond_snapshot ? -EBUSY : 0; + arch_spin_unlock(&tr->max_lock); + local_irq_enable(); + if (ret)