Fixes for 6.1

author Sasha Levin <sashal@kernel.org>

Thu, 12 Dec 2024 01:10:53 +0000 (20:10 -0500)

committer Sasha Levin <sashal@kernel.org>

Thu, 12 Dec 2024 01:10:53 +0000 (20:10 -0500)
author Sasha Levin <sashal@kernel.org>
Thu, 12 Dec 2024 01:10:53 +0000 (20:10 -0500)
committer Sasha Levin <sashal@kernel.org>
Thu, 12 Dec 2024 01:10:53 +0000 (20:10 -0500)
diff --git a/queue-6.1/btrfs-fix-missing-snapshot-drew-unlock-when-root-is-.patch b/queue-6.1/btrfs-fix-missing-snapshot-drew-unlock-when-root-is-.patch

new file mode 100644 (file)

index 0000000..d5835a1
--- /dev/null
+++ b/queue-6.1/btrfs-fix-missing-snapshot-drew-unlock-when-root-is-.patch
@@ -0,0 +1,41 @@
+From 1455e08765b1f879f8a1944c720ae0b9f9b7f099 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 29 Nov 2024 13:33:03 +0000
+Subject: btrfs: fix missing snapshot drew unlock when root is dead during swap
+ activation
+
+From: Filipe Manana <fdmanana@suse.com>
+
+[ Upstream commit 9c803c474c6c002d8ade68ebe99026cc39c37f85 ]
+
+When activating a swap file we acquire the root's snapshot drew lock and
+then check if the root is dead, failing and returning with -EPERM if it's
+dead but without unlocking the root's snapshot lock. Fix this by adding
+the missing unlock.
+
+Fixes: 60021bd754c6 ("btrfs: prevent subvol with swapfile from being deleted")
+Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
+Reviewed-by: David Sterba <dsterba@suse.com>
+Reviewed-by: Qu Wenruo <wqu@suse.com>
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ fs/btrfs/inode.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
+index 8fc8a24a1afe8..eb5f03c3336cf 100644
+--- a/fs/btrfs/inode.c
++++ b/fs/btrfs/inode.c
+@@ -11228,6 +11228,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
+       if (btrfs_root_dead(root)) {
+               spin_unlock(&root->root_item_lock);
+ 
++              btrfs_drew_write_unlock(&root->snapshot_lock);
+               btrfs_exclop_finish(fs_info);
+               btrfs_warn(fs_info,
+               "cannot activate swapfile because subvolume %llu is being deleted",
+-- 
+2.43.0
+
diff --git a/queue-6.1/sched-core-prevent-wakeup-of-ksoftirqd-during-idle-l.patch b/queue-6.1/sched-core-prevent-wakeup-of-ksoftirqd-during-idle-l.patch

new file mode 100644 (file)

index 0000000..00d8315
--- /dev/null
+++ b/queue-6.1/sched-core-prevent-wakeup-of-ksoftirqd-during-idle-l.patch
@@ -0,0 +1,71 @@
+From abab207dcec1e85258b79b9db15ceef30ba2d88d Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 19 Nov 2024 05:44:32 +0000
+Subject: sched/core: Prevent wakeup of ksoftirqd during idle load balance
+
+From: K Prateek Nayak <kprateek.nayak@amd.com>
+
+[ Upstream commit e932c4ab38f072ce5894b2851fea8bc5754bb8e5 ]
+
+Scheduler raises a SCHED_SOFTIRQ to trigger a load balancing event on
+from the IPI handler on the idle CPU. If the SMP function is invoked
+from an idle CPU via flush_smp_call_function_queue() then the HARD-IRQ
+flag is not set and raise_softirq_irqoff() needlessly wakes ksoftirqd
+because soft interrupts are handled before ksoftirqd get on the CPU.
+
+Adding a trace_printk() in nohz_csd_func() at the spot of raising
+SCHED_SOFTIRQ and enabling trace events for sched_switch, sched_wakeup,
+and softirq_entry (for SCHED_SOFTIRQ vector alone) helps observing the
+current behavior:
+
+       <idle>-0   [000] dN.1.:  nohz_csd_func: Raising SCHED_SOFTIRQ from nohz_csd_func
+       <idle>-0   [000] dN.4.:  sched_wakeup: comm=ksoftirqd/0 pid=16 prio=120 target_cpu=000
+       <idle>-0   [000] .Ns1.:  softirq_entry: vec=7 [action=SCHED]
+       <idle>-0   [000] .Ns1.:  softirq_exit: vec=7  [action=SCHED]
+       <idle>-0   [000] d..2.:  sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=16 next_prio=120
+  ksoftirqd/0-16  [000] d..2.:  sched_switch: prev_comm=ksoftirqd/0 prev_pid=16 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
+       ...
+
+Use __raise_softirq_irqoff() to raise the softirq. The SMP function call
+is always invoked on the requested CPU in an interrupt handler. It is
+guaranteed that soft interrupts are handled at the end.
+
+Following are the observations with the changes when enabling the same
+set of events:
+
+       <idle>-0       [000] dN.1.: nohz_csd_func: Raising SCHED_SOFTIRQ for nohz_idle_balance
+       <idle>-0       [000] dN.1.: softirq_raise: vec=7 [action=SCHED]
+       <idle>-0       [000] .Ns1.: softirq_entry: vec=7 [action=SCHED]
+
+No unnecessary ksoftirqd wakeups are seen from idle task's context to
+service the softirq.
+
+Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
+Closes: https://lore.kernel.org/lkml/fcf823f-195e-6c9a-eac3-25f870cb35ac@inria.fr/ [1]
+Reported-by: Julia Lawall <julia.lawall@inria.fr>
+Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Link: https://lore.kernel.org/r/20241119054432.6405-5-kprateek.nayak@amd.com
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ kernel/sched/core.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/kernel/sched/core.c b/kernel/sched/core.c
+index 860391d057802..54af671e8d510 100644
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -1164,7 +1164,7 @@ static void nohz_csd_func(void *info)
+       rq->idle_balance = idle_cpu(cpu);
+       if (rq->idle_balance) {
+               rq->nohz_idle_balance = flags;
+-              raise_softirq_irqoff(SCHED_SOFTIRQ);
++              __raise_softirq_irqoff(SCHED_SOFTIRQ);
+       }
+ }
+ 
+-- 
+2.43.0
+
diff --git a/queue-6.1/sched-core-remove-the-unnecessary-need_resched-check.patch b/queue-6.1/sched-core-remove-the-unnecessary-need_resched-check.patch

new file mode 100644 (file)

index 0000000..8cb619b
--- /dev/null
+++ b/queue-6.1/sched-core-remove-the-unnecessary-need_resched-check.patch
@@ -0,0 +1,122 @@
+From 7b6a64ab08e9496f67cc835b9132fb18a3719979 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 19 Nov 2024 05:44:30 +0000
+Subject: sched/core: Remove the unnecessary need_resched() check in
+ nohz_csd_func()
+
+From: K Prateek Nayak <kprateek.nayak@amd.com>
+
+[ Upstream commit ea9cffc0a154124821531991d5afdd7e8b20d7aa ]
+
+The need_resched() check currently in nohz_csd_func() can be tracked
+to have been added in scheduler_ipi() back in 2011 via commit
+ca38062e57e9 ("sched: Use resched IPI to kick off the nohz idle balance")
+
+Since then, it has travelled quite a bit but it seems like an idle_cpu()
+check currently is sufficient to detect the need to bail out from an
+idle load balancing. To justify this removal, consider all the following
+case where an idle load balancing could race with a task wakeup:
+
+o Since commit f3dd3f674555b ("sched: Remove the limitation of WF_ON_CPU
+  on wakelist if wakee cpu is idle") a target perceived to be idle
+  (target_rq->nr_running == 0) will return true for
+  ttwu_queue_cond(target) which will offload the task wakeup to the idle
+  target via an IPI.
+
+  In all such cases target_rq->ttwu_pending will be set to 1 before
+  queuing the wake function.
+
+  If an idle load balance races here, following scenarios are possible:
+
+  - The CPU is not in TIF_POLLING_NRFLAG mode in which case an actual
+    IPI is sent to the CPU to wake it out of idle. If the
+    nohz_csd_func() queues before sched_ttwu_pending(), the idle load
+    balance will bail out since idle_cpu(target) returns 0 since
+    target_rq->ttwu_pending is 1. If the nohz_csd_func() is queued after
+    sched_ttwu_pending() it should see rq->nr_running to be non-zero and
+    bail out of idle load balancing.
+
+  - The CPU is in TIF_POLLING_NRFLAG mode and instead of an actual IPI,
+    the sender will simply set TIF_NEED_RESCHED for the target to put it
+    out of idle and flush_smp_call_function_queue() in do_idle() will
+    execute the call function. Depending on the ordering of the queuing
+    of nohz_csd_func() and sched_ttwu_pending(), the idle_cpu() check in
+    nohz_csd_func() should either see target_rq->ttwu_pending = 1 or
+    target_rq->nr_running to be non-zero if there is a genuine task
+    wakeup racing with the idle load balance kick.
+
+o The waker CPU perceives the target CPU to be busy
+  (targer_rq->nr_running != 0) but the CPU is in fact going idle and due
+  to a series of unfortunate events, the system reaches a case where the
+  waker CPU decides to perform the wakeup by itself in ttwu_queue() on
+  the target CPU but target is concurrently selected for idle load
+  balance (XXX: Can this happen? I'm not sure, but we'll consider the
+  mother of all coincidences to estimate the worst case scenario).
+
+  ttwu_do_activate() calls enqueue_task() which would increment
+  "rq->nr_running" post which it calls wakeup_preempt() which is
+  responsible for setting TIF_NEED_RESCHED (via a resched IPI or by
+  setting TIF_NEED_RESCHED on a TIF_POLLING_NRFLAG idle CPU) The key
+  thing to note in this case is that rq->nr_running is already non-zero
+  in case of a wakeup before TIF_NEED_RESCHED is set which would
+  lead to idle_cpu() check returning false.
+
+In all cases, it seems that need_resched() check is unnecessary when
+checking for idle_cpu() first since an impending wakeup racing with idle
+load balancer will either set the "rq->ttwu_pending" or indicate a newly
+woken task via "rq->nr_running".
+
+Chasing the reason why this check might have existed in the first place,
+I came across  Peter's suggestion on the fist iteration of Suresh's
+patch from 2011 [1] where the condition to raise the SCHED_SOFTIRQ was:
+
+       sched_ttwu_do_pending(list);
+
+       if (unlikely((rq->idle == current) &&
+           rq->nohz_balance_kick &&
+           !need_resched()))
+               raise_softirq_irqoff(SCHED_SOFTIRQ);
+
+Since the condition to raise the SCHED_SOFIRQ was preceded by
+sched_ttwu_do_pending() (which is equivalent of sched_ttwu_pending()) in
+the current upstream kernel, the need_resched() check was necessary to
+catch a newly queued task. Peter suggested modifying it to:
+
+       if (idle_cpu() && rq->nohz_balance_kick && !need_resched())
+               raise_softirq_irqoff(SCHED_SOFTIRQ);
+
+where idle_cpu() seems to have replaced "rq->idle == current" check.
+
+Even back then, the idle_cpu() check would have been sufficient to catch
+a new task being enqueued. Since commit b2a02fc43a1f ("smp: Optimize
+send_call_function_single_ipi()") overloads the interpretation of
+TIF_NEED_RESCHED for TIF_POLLING_NRFLAG idling, remove the
+need_resched() check in nohz_csd_func() to raise SCHED_SOFTIRQ based
+on Peter's suggestion.
+
+Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
+Suggested-by: Peter Zijlstra <peterz@infradead.org>
+Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Link: https://lore.kernel.org/r/20241119054432.6405-3-kprateek.nayak@amd.com
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ kernel/sched/core.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/kernel/sched/core.c b/kernel/sched/core.c
+index 8388575759378..860391d057802 100644
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -1162,7 +1162,7 @@ static void nohz_csd_func(void *info)
+       WARN_ON(!(flags & NOHZ_KICK_MASK));
+ 
+       rq->idle_balance = idle_cpu(cpu);
+-      if (rq->idle_balance && !need_resched()) {
++      if (rq->idle_balance) {
+               rq->nohz_idle_balance = flags;
+               raise_softirq_irqoff(SCHED_SOFTIRQ);
+       }
+-- 
+2.43.0
+
diff --git a/queue-6.1/sched-fair-check-idle_cpu-before-need_resched-to-det.patch b/queue-6.1/sched-fair-check-idle_cpu-before-need_resched-to-det.patch

new file mode 100644 (file)

index 0000000..c949a4e
--- /dev/null
+++ b/queue-6.1/sched-fair-check-idle_cpu-before-need_resched-to-det.patch
@@ -0,0 +1,60 @@
+From 7ffad9679312d3ebc1d49a62cd88f1742cff1553 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 19 Nov 2024 05:44:31 +0000
+Subject: sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU
+ turning busy
+
+From: K Prateek Nayak <kprateek.nayak@amd.com>
+
+[ Upstream commit ff47a0acfcce309cf9e175149c75614491953c8f ]
+
+Commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
+optimizes IPIs to idle CPUs in TIF_POLLING_NRFLAG mode by setting the
+TIF_NEED_RESCHED flag in idle task's thread info and relying on
+flush_smp_call_function_queue() in idle exit path to run the
+call-function. A softirq raised by the call-function is handled shortly
+after in do_softirq_post_smp_call_flush() but the TIF_NEED_RESCHED flag
+remains set and is only cleared later when schedule_idle() calls
+__schedule().
+
+need_resched() check in _nohz_idle_balance() exists to bail out of load
+balancing if another task has woken up on the CPU currently in-charge of
+idle load balancing which is being processed in SCHED_SOFTIRQ context.
+Since the optimization mentioned above overloads the interpretation of
+TIF_NEED_RESCHED, check for idle_cpu() before going with the existing
+need_resched() check which can catch a genuine task wakeup on an idle
+CPU processing SCHED_SOFTIRQ from do_softirq_post_smp_call_flush(), as
+well as the case where ksoftirqd needs to be preempted as a result of
+new task wakeup or slice expiry.
+
+In case of PREEMPT_RT or threadirqs, although the idle load balancing
+may be inhibited in some cases on the ilb CPU, the fact that ksoftirqd
+is the only fair task going back to sleep will trigger a newidle balance
+on the CPU which will alleviate some imbalance if it exists if idle
+balance fails to do so.
+
+Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
+Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Link: https://lore.kernel.org/r/20241119054432.6405-4-kprateek.nayak@amd.com
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ kernel/sched/fair.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
+index 1e12f731a0337..cf3bbddd4b7fc 100644
+--- a/kernel/sched/fair.c
++++ b/kernel/sched/fair.c
+@@ -11401,7 +11401,7 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags)
+                * work being done for other CPUs. Next load
+                * balancing owner will pick it up.
+                */
+-              if (need_resched()) {
++              if (!idle_cpu(this_cpu) && need_resched()) {
+                       if (flags & NOHZ_STATS_KICK)
+                               has_blocked_load = true;
+                       if (flags & NOHZ_NEXT_KICK)
+-- 
+2.43.0
+
diff --git a/queue-6.1/series b/queue-6.1/series

index 3414c1d217e39a8105226849fb9cd8932961147c..1fe96f98fea25a5bf10857f1c097ac3b87d4199f 100644 (file)
--- a/queue-6.1/series
+++ b/queue-6.1/series
@@ -733,3 +733,8 @@ serial-8250_dw-add-sophgo-sg2044-quirk.patch
  io_uring-tctx-work-around-xa_store-allocation-error-.patch
  kasan-suppress-recursive-reports-for-hw_tags.patch
  kasan-make-report_lock-a-raw-spinlock.patch
+sched-core-remove-the-unnecessary-need_resched-check.patch
+sched-fair-check-idle_cpu-before-need_resched-to-det.patch
+sched-core-prevent-wakeup-of-ksoftirqd-during-idle-l.patch
+btrfs-fix-missing-snapshot-drew-unlock-when-root-is-.patch
+tracing-eprobe-fix-to-release-eprobe-when-failed-to-.patch
diff --git a/queue-6.1/tracing-eprobe-fix-to-release-eprobe-when-failed-to-.patch b/queue-6.1/tracing-eprobe-fix-to-release-eprobe-when-failed-to-.patch

new file mode 100644 (file)

index 0000000..8fb60ed
--- /dev/null
+++ b/queue-6.1/tracing-eprobe-fix-to-release-eprobe-when-failed-to-.patch
@@ -0,0 +1,40 @@
+From 8781122757f793c16e696b136f66b9044797f87b Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Sat, 30 Nov 2024 01:47:47 +0900
+Subject: tracing/eprobe: Fix to release eprobe when failed to add dyn_event
+
+From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
+
+[ Upstream commit 494b332064c0ce2f7392fa92632bc50191c1b517 ]
+
+Fix eprobe event to unregister event call and release eprobe when it fails
+to add dynamic event correctly.
+
+Link: https://lore.kernel.org/all/173289886698.73724.1959899350183686006.stgit@devnote2/
+
+Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events")
+Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ kernel/trace/trace_eprobe.c | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/trace_eprobe.c
+index d2370cdb4c1d6..2a75cf3aa7bf8 100644
+--- a/kernel/trace/trace_eprobe.c
++++ b/kernel/trace/trace_eprobe.c
+@@ -1069,6 +1069,11 @@ static int __trace_eprobe_create(int argc, const char *argv[])
+               goto error;
+       }
+       ret = dyn_event_add(&ep->devent, &ep->tp.event->call);
++      if (ret < 0) {
++              trace_probe_unregister_event_call(&ep->tp);
++              mutex_unlock(&event_mutex);
++              goto error;
++      }
+       mutex_unlock(&event_mutex);
+       return ret;
+ parse_error:
+-- 
+2.43.0
+
author	Sasha Levin <sashal@kernel.org>
	Thu, 12 Dec 2024 01:10:53 +0000 (20:10 -0500)
committer	Sasha Levin <sashal@kernel.org>
	Thu, 12 Dec 2024 01:10:53 +0000 (20:10 -0500)
queue-6.1/btrfs-fix-missing-snapshot-drew-unlock-when-root-is-.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/sched-core-prevent-wakeup-of-ksoftirqd-during-idle-l.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/sched-core-remove-the-unnecessary-need_resched-check.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/sched-fair-check-idle_cpu-before-need_resched-to-det.patch	[new file with mode: 0644]	patch \| blob
queue-6.1/series		patch \| blob \| blame \| history
queue-6.1/tracing-eprobe-fix-to-release-eprobe-when-failed-to-.patch	[new file with mode: 0644]	patch \| blob