From: Greg Kroah-Hartman Date: Sun, 16 Oct 2022 15:39:48 +0000 (+0200) Subject: 6.0-stable patches X-Git-Tag: v5.4.219~88 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=20d0511e2d90c5d0d554cc00ed6f7225cefe5a2d;p=thirdparty%2Fkernel%2Fstable-queue.git 6.0-stable patches added patches: ftrace-properly-unset-ftrace_hash_fl_mod.patch ftrace-still-disable-enabled-records-marked-as-disabled.patch livepatch-fix-race-between-fork-and-klp-transition.patch ring-buffer-add-ring_buffer_wake_waiters.patch ring-buffer-allow-splice-to-read-previous-partially-read-pages.patch ring-buffer-check-pending-waiters-when-doing-wake-ups-as-well.patch ring-buffer-fix-race-between-reset-page-and-reading-page.patch ring-buffer-have-the-shortest_full-queue-be-the-shortest-not-longest.patch rpmsg-char-avoid-double-destroy-of-default-endpoint.patch thunderbolt-explicitly-enable-lane-adapter-hotplug-events-at-startup.patch tracing-add-fault-name-injection-to-kernel-probes.patch tracing-add-ioctl-to-force-ring-buffer-waiters-to-wake-up.patch tracing-disable-interrupt-or-preemption-before-acquiring-arch_spinlock_t.patch tracing-do-not-free-snapshot-if-tracer-is-on-cmdline.patch tracing-eprobe-fix-alloc-event-dir-failed-when-event-name-no-set.patch tracing-fix-reading-strings-from-synthetic-events.patch tracing-move-duplicate-code-of-trace_kprobe-eprobe.c-into-header.patch tracing-wake-up-ring-buffer-waiters-on-closing-of-the-file.patch tracing-wake-up-waiters-when-tracing-is-disabled.patch --- diff --git a/queue-6.0/ftrace-properly-unset-ftrace_hash_fl_mod.patch b/queue-6.0/ftrace-properly-unset-ftrace_hash_fl_mod.patch new file mode 100644 index 00000000000..a66da40eaa4 --- /dev/null +++ b/queue-6.0/ftrace-properly-unset-ftrace_hash_fl_mod.patch @@ -0,0 +1,51 @@ +From 0ce0638edf5ec83343302b884fa208179580700a Mon Sep 17 00:00:00 2001 +From: Zheng Yejian +Date: Mon, 26 Sep 2022 15:20:08 +0000 +Subject: ftrace: Properly unset FTRACE_HASH_FL_MOD + +From: Zheng Yejian + +commit 0ce0638edf5ec83343302b884fa208179580700a upstream. + +When executing following commands like what document said, but the log +"#### all functions enabled ####" was not shown as expect: + 1. Set a 'mod' filter: + $ echo 'write*:mod:ext3' > /sys/kernel/tracing/set_ftrace_filter + 2. Invert above filter: + $ echo '!write*:mod:ext3' >> /sys/kernel/tracing/set_ftrace_filter + 3. Read the file: + $ cat /sys/kernel/tracing/set_ftrace_filter + +By some debugging, I found that flag FTRACE_HASH_FL_MOD was not unset +after inversion like above step 2 and then result of ftrace_hash_empty() +is incorrect. + +Link: https://lkml.kernel.org/r/20220926152008.2239274-1-zhengyejian1@huawei.com + +Cc: +Cc: stable@vger.kernel.org +Fixes: 8c08f0d5c6fb ("ftrace: Have cached module filters be an active filter") +Signed-off-by: Zheng Yejian +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/ftrace.c | 8 ++++++-- + 1 file changed, 6 insertions(+), 2 deletions(-) + +--- a/kernel/trace/ftrace.c ++++ b/kernel/trace/ftrace.c +@@ -6081,8 +6081,12 @@ int ftrace_regex_release(struct inode *i + + if (filter_hash) { + orig_hash = &iter->ops->func_hash->filter_hash; +- if (iter->tr && !list_empty(&iter->tr->mod_trace)) +- iter->hash->flags |= FTRACE_HASH_FL_MOD; ++ if (iter->tr) { ++ if (list_empty(&iter->tr->mod_trace)) ++ iter->hash->flags &= ~FTRACE_HASH_FL_MOD; ++ else ++ iter->hash->flags |= FTRACE_HASH_FL_MOD; ++ } + } else + orig_hash = &iter->ops->func_hash->notrace_hash; + diff --git a/queue-6.0/ftrace-still-disable-enabled-records-marked-as-disabled.patch b/queue-6.0/ftrace-still-disable-enabled-records-marked-as-disabled.patch new file mode 100644 index 00000000000..e560689a9f4 --- /dev/null +++ b/queue-6.0/ftrace-still-disable-enabled-records-marked-as-disabled.patch @@ -0,0 +1,126 @@ +From cf04f2d5df0037741207382ac8fe289e8bf84ced Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Wed, 5 Oct 2022 00:38:09 -0400 +Subject: ftrace: Still disable enabled records marked as disabled + +From: Steven Rostedt (Google) + +commit cf04f2d5df0037741207382ac8fe289e8bf84ced upstream. + +Weak functions started causing havoc as they showed up in the +"available_filter_functions" and this confused people as to why some +functions marked as "notrace" were listed, but when enabled they did +nothing. This was because weak functions can still have fentry calls, and +these addresses get added to the "available_filter_functions" file. +kallsyms is what converts those addresses to names, and since the weak +functions are not listed in kallsyms, it would just pick the function +before that. + +To solve this, there was a trick to detect weak functions listed, and +these records would be marked as DISABLED so that they do not get enabled +and are mostly ignored. As the processing of the list of all functions to +figure out what is weak or not can take a long time, this process is put +off into a kernel thread and run in parallel with the rest of start up. + +Now the issue happens whet function tracing is enabled via the kernel +command line. As it starts very early in boot up, it can be enabled before +the records that are weak are marked to be disabled. This causes an issue +in the accounting, as the weak records are enabled by the command line +function tracing, but after boot up, they are not disabled. + +The ftrace records have several accounting flags and a ref count. The +DISABLED flag is just one. If the record is enabled before it is marked +DISABLED it will get an ENABLED flag and also have its ref counter +incremented. After it is marked for DISABLED, neither the ENABLED flag nor +the ref counter is cleared. There's sanity checks on the records that are +performed after an ftrace function is registered or unregistered, and this +detected that there were records marked as ENABLED with ref counter that +should not have been. + +Note, the module loading code uses the DISABLED flag as well to keep its +functions from being modified while its being loaded and some of these +flags may get set in this process. So changing the verification code to +ignore DISABLED records is a no go, as it still needs to verify that the +module records are working too. + +Also, the weak functions still are calling a trampoline. Even though they +should never be called, it is dangerous to leave these weak functions +calling a trampoline that is freed, so they should still be set back to +nops. + +There's two places that need to not skip records that have the ENABLED +and the DISABLED flags set. That is where the ftrace_ops is processed and +sets the records ref counts, and then later when the function itself is to +be updated, and the ENABLED flag gets removed. Add a helper function +"skip_record()" that returns true if the record has the DISABLED flag set +but not the ENABLED flag. + +Link: https://lkml.kernel.org/r/20221005003809.27d2b97b@gandalf.local.home + +Cc: Masami Hiramatsu +Cc: Andrew Morton +Cc: stable@vger.kernel.org +Fixes: b39181f7c6907 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function") +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/ftrace.c | 20 ++++++++++++++++---- + 1 file changed, 16 insertions(+), 4 deletions(-) + +--- a/kernel/trace/ftrace.c ++++ b/kernel/trace/ftrace.c +@@ -1644,6 +1644,18 @@ ftrace_find_tramp_ops_any_other(struct d + static struct ftrace_ops * + ftrace_find_tramp_ops_next(struct dyn_ftrace *rec, struct ftrace_ops *ops); + ++static bool skip_record(struct dyn_ftrace *rec) ++{ ++ /* ++ * At boot up, weak functions are set to disable. Function tracing ++ * can be enabled before they are, and they still need to be disabled now. ++ * If the record is disabled, still continue if it is marked as already ++ * enabled (this is needed to keep the accounting working). ++ */ ++ return rec->flags & FTRACE_FL_DISABLED && ++ !(rec->flags & FTRACE_FL_ENABLED); ++} ++ + static bool __ftrace_hash_rec_update(struct ftrace_ops *ops, + int filter_hash, + bool inc) +@@ -1693,7 +1705,7 @@ static bool __ftrace_hash_rec_update(str + int in_hash = 0; + int match = 0; + +- if (rec->flags & FTRACE_FL_DISABLED) ++ if (skip_record(rec)) + continue; + + if (all) { +@@ -2126,7 +2138,7 @@ static int ftrace_check_record(struct dy + + ftrace_bug_type = FTRACE_BUG_UNKNOWN; + +- if (rec->flags & FTRACE_FL_DISABLED) ++ if (skip_record(rec)) + return FTRACE_UPDATE_IGNORE; + + /* +@@ -2241,7 +2253,7 @@ static int ftrace_check_record(struct dy + if (update) { + /* If there's no more users, clear all flags */ + if (!ftrace_rec_count(rec)) +- rec->flags = 0; ++ rec->flags &= FTRACE_FL_DISABLED; + else + /* + * Just disable the record, but keep the ops TRAMP +@@ -2634,7 +2646,7 @@ void __weak ftrace_replace_code(int mod_ + + do_for_each_ftrace_rec(pg, rec) { + +- if (rec->flags & FTRACE_FL_DISABLED) ++ if (skip_record(rec)) + continue; + + failed = __ftrace_replace_code(rec, enable); diff --git a/queue-6.0/livepatch-fix-race-between-fork-and-klp-transition.patch b/queue-6.0/livepatch-fix-race-between-fork-and-klp-transition.patch new file mode 100644 index 00000000000..23466496acd --- /dev/null +++ b/queue-6.0/livepatch-fix-race-between-fork-and-klp-transition.patch @@ -0,0 +1,91 @@ +From 747f7a2901174c9afa805dddfb7b24db6f65e985 Mon Sep 17 00:00:00 2001 +From: Rik van Riel +Date: Mon, 8 Aug 2022 15:00:19 -0400 +Subject: livepatch: fix race between fork and KLP transition + +From: Rik van Riel + +commit 747f7a2901174c9afa805dddfb7b24db6f65e985 upstream. + +The KLP transition code depends on the TIF_PATCH_PENDING and +the task->patch_state to stay in sync. On a normal (forward) +transition, TIF_PATCH_PENDING will be set on every task in +the system, while on a reverse transition (after a failed +forward one) first TIF_PATCH_PENDING will be cleared from +every task, followed by it being set on tasks that need to +be transitioned back to the original code. + +However, the fork code copies over the TIF_PATCH_PENDING flag +from the parent to the child early on, in dup_task_struct and +setup_thread_stack. Much later, klp_copy_process will set +child->patch_state to match that of the parent. + +However, the parent's patch_state may have been changed by KLP loading +or unloading since it was initially copied over into the child. + +This results in the KLP code occasionally hitting this warning in +klp_complete_transition: + + for_each_process_thread(g, task) { + WARN_ON_ONCE(test_tsk_thread_flag(task, TIF_PATCH_PENDING)); + task->patch_state = KLP_UNDEFINED; + } + +Set, or clear, the TIF_PATCH_PENDING flag in the child task +depending on whether or not it is needed at the time +klp_copy_process is called, at a point in copy_process where the +tasklist_lock is held exclusively, preventing races with the KLP +code. + +The KLP code does have a few places where the state is changed +without the tasklist_lock held, but those should not cause +problems because klp_update_patch_state(current) cannot be +called while the current task is in the middle of fork, +klp_check_and_switch_task() which is called under the pi_lock, +which prevents rescheduling, and manipulation of the patch +state of idle tasks, which do not fork. + +This should prevent this warning from triggering again in the +future, and close the race for both normal and reverse transitions. + +Signed-off-by: Rik van Riel +Reported-by: Breno Leitao +Reviewed-by: Petr Mladek +Acked-by: Josh Poimboeuf +Fixes: d83a7cb375ee ("livepatch: change to a per-task consistency model") +Cc: stable@kernel.org +Signed-off-by: Petr Mladek +Link: https://lore.kernel.org/r/20220808150019.03d6a67b@imladris.surriel.com +Signed-off-by: Greg Kroah-Hartman +--- + kernel/livepatch/transition.c | 18 ++++++++++++++++-- + 1 file changed, 16 insertions(+), 2 deletions(-) + +--- a/kernel/livepatch/transition.c ++++ b/kernel/livepatch/transition.c +@@ -610,9 +610,23 @@ void klp_reverse_transition(void) + /* Called from copy_process() during fork */ + void klp_copy_process(struct task_struct *child) + { +- child->patch_state = current->patch_state; + +- /* TIF_PATCH_PENDING gets copied in setup_thread_stack() */ ++ /* ++ * The parent process may have gone through a KLP transition since ++ * the thread flag was copied in setup_thread_stack earlier. Bring ++ * the task flag up to date with the parent here. ++ * ++ * The operation is serialized against all klp_*_transition() ++ * operations by the tasklist_lock. The only exception is ++ * klp_update_patch_state(current), but we cannot race with ++ * that because we are current. ++ */ ++ if (test_tsk_thread_flag(current, TIF_PATCH_PENDING)) ++ set_tsk_thread_flag(child, TIF_PATCH_PENDING); ++ else ++ clear_tsk_thread_flag(child, TIF_PATCH_PENDING); ++ ++ child->patch_state = current->patch_state; + } + + /* diff --git a/queue-6.0/ring-buffer-add-ring_buffer_wake_waiters.patch b/queue-6.0/ring-buffer-add-ring_buffer_wake_waiters.patch new file mode 100644 index 00000000000..3b063d56b9a --- /dev/null +++ b/queue-6.0/ring-buffer-add-ring_buffer_wake_waiters.patch @@ -0,0 +1,116 @@ +From 7e9fbbb1b776d8d7969551565bc246f74ec53b27 Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Wed, 28 Sep 2022 13:39:38 -0400 +Subject: ring-buffer: Add ring_buffer_wake_waiters() + +From: Steven Rostedt (Google) + +commit 7e9fbbb1b776d8d7969551565bc246f74ec53b27 upstream. + +On closing of a file that represents a ring buffer or flushing the file, +there may be waiters on the ring buffer that needs to be woken up and exit +the ring_buffer_wait() function. + +Add ring_buffer_wake_waiters() to wake up the waiters on the ring buffer +and allow them to exit the wait loop. + +Link: https://lkml.kernel.org/r/20220928133938.28dc2c27@gandalf.local.home + +Cc: stable@vger.kernel.org +Cc: Ingo Molnar +Cc: Andrew Morton +Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code") +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/ring_buffer.h | 2 +- + kernel/trace/ring_buffer.c | 39 +++++++++++++++++++++++++++++++++++++++ + 2 files changed, 40 insertions(+), 1 deletion(-) + +--- a/include/linux/ring_buffer.h ++++ b/include/linux/ring_buffer.h +@@ -101,7 +101,7 @@ __ring_buffer_alloc(unsigned long size, + int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); + __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, + struct file *filp, poll_table *poll_table); +- ++void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu); + + #define RING_BUFFER_ALL_CPUS -1 + +--- a/kernel/trace/ring_buffer.c ++++ b/kernel/trace/ring_buffer.c +@@ -413,6 +413,7 @@ struct rb_irq_work { + struct irq_work work; + wait_queue_head_t waiters; + wait_queue_head_t full_waiters; ++ long wait_index; + bool waiters_pending; + bool full_waiters_pending; + bool wakeup_full; +@@ -925,6 +926,37 @@ static void rb_wake_up_waiters(struct ir + } + + /** ++ * ring_buffer_wake_waiters - wake up any waiters on this ring buffer ++ * @buffer: The ring buffer to wake waiters on ++ * ++ * In the case of a file that represents a ring buffer is closing, ++ * it is prudent to wake up any waiters that are on this. ++ */ ++void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu) ++{ ++ struct ring_buffer_per_cpu *cpu_buffer; ++ struct rb_irq_work *rbwork; ++ ++ if (cpu == RING_BUFFER_ALL_CPUS) { ++ ++ /* Wake up individual ones too. One level recursion */ ++ for_each_buffer_cpu(buffer, cpu) ++ ring_buffer_wake_waiters(buffer, cpu); ++ ++ rbwork = &buffer->irq_work; ++ } else { ++ cpu_buffer = buffer->buffers[cpu]; ++ rbwork = &cpu_buffer->irq_work; ++ } ++ ++ rbwork->wait_index++; ++ /* make sure the waiters see the new index */ ++ smp_wmb(); ++ ++ rb_wake_up_waiters(&rbwork->work); ++} ++ ++/** + * ring_buffer_wait - wait for input to the ring buffer + * @buffer: buffer to wait on + * @cpu: the cpu buffer to wait on +@@ -939,6 +971,7 @@ int ring_buffer_wait(struct trace_buffer + struct ring_buffer_per_cpu *cpu_buffer; + DEFINE_WAIT(wait); + struct rb_irq_work *work; ++ long wait_index; + int ret = 0; + + /* +@@ -957,6 +990,7 @@ int ring_buffer_wait(struct trace_buffer + work = &cpu_buffer->irq_work; + } + ++ wait_index = READ_ONCE(work->wait_index); + + while (true) { + if (full) +@@ -1021,6 +1055,11 @@ int ring_buffer_wait(struct trace_buffer + } + + schedule(); ++ ++ /* Make sure to see the new wait index */ ++ smp_rmb(); ++ if (wait_index != work->wait_index) ++ break; + } + + if (full) diff --git a/queue-6.0/ring-buffer-allow-splice-to-read-previous-partially-read-pages.patch b/queue-6.0/ring-buffer-allow-splice-to-read-previous-partially-read-pages.patch new file mode 100644 index 00000000000..e6afc98ba5a --- /dev/null +++ b/queue-6.0/ring-buffer-allow-splice-to-read-previous-partially-read-pages.patch @@ -0,0 +1,52 @@ +From fa8f4a89736b654125fb254b0db753ac68a5fced Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Tue, 27 Sep 2022 14:43:17 -0400 +Subject: ring-buffer: Allow splice to read previous partially read pages + +From: Steven Rostedt (Google) + +commit fa8f4a89736b654125fb254b0db753ac68a5fced upstream. + +If a page is partially read, and then the splice system call is run +against the ring buffer, it will always fail to read, no matter how much +is in the ring buffer. That's because the code path for a partial read of +the page does will fail if the "full" flag is set. + +The splice system call wants full pages, so if the read of the ring buffer +is not yet full, it should return zero, and the splice will block. But if +a previous read was done, where the beginning has been consumed, it should +still be given to the splice caller if the rest of the page has been +written to. + +This caused the splice command to never consume data in this scenario, and +let the ring buffer just fill up and lose events. + +Link: https://lkml.kernel.org/r/20220927144317.46be6b80@gandalf.local.home + +Cc: stable@vger.kernel.org +Fixes: 8789a9e7df6bf ("ring-buffer: read page interface") +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/ring_buffer.c | 10 +++++++++- + 1 file changed, 9 insertions(+), 1 deletion(-) + +--- a/kernel/trace/ring_buffer.c ++++ b/kernel/trace/ring_buffer.c +@@ -5616,7 +5616,15 @@ int ring_buffer_read_page(struct trace_b + unsigned int pos = 0; + unsigned int size; + +- if (full) ++ /* ++ * If a full page is expected, this can still be returned ++ * if there's been a previous partial read and the ++ * rest of the page can be read and the commit page is off ++ * the reader page. ++ */ ++ if (full && ++ (!read || (len < (commit - read)) || ++ cpu_buffer->reader_page == cpu_buffer->commit_page)) + goto out_unlock; + + if (len > (commit - read)) diff --git a/queue-6.0/ring-buffer-check-pending-waiters-when-doing-wake-ups-as-well.patch b/queue-6.0/ring-buffer-check-pending-waiters-when-doing-wake-ups-as-well.patch new file mode 100644 index 00000000000..24cba07543e --- /dev/null +++ b/queue-6.0/ring-buffer-check-pending-waiters-when-doing-wake-ups-as-well.patch @@ -0,0 +1,45 @@ +From ec0bbc5ec5664dcee344f79373852117dc672c86 Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Tue, 27 Sep 2022 19:15:25 -0400 +Subject: ring-buffer: Check pending waiters when doing wake ups as well + +From: Steven Rostedt (Google) + +commit ec0bbc5ec5664dcee344f79373852117dc672c86 upstream. + +The wake up waiters only checks the "wakeup_full" variable and not the +"full_waiters_pending". The full_waiters_pending is set when a waiter is +added to the wait queue. The wakeup_full is only set when an event is +triggered, and it clears the full_waiters_pending to avoid multiple calls +to irq_work_queue(). + +The irq_work callback really needs to check both wakeup_full as well as +full_waiters_pending such that this code can be used to wake up waiters +when a file is closed that represents the ring buffer and the waiters need +to be woken up. + +Link: https://lkml.kernel.org/r/20220927231824.209460321@goodmis.org + +Cc: stable@vger.kernel.org +Cc: Ingo Molnar +Cc: Andrew Morton +Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code") +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/ring_buffer.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +--- a/kernel/trace/ring_buffer.c ++++ b/kernel/trace/ring_buffer.c +@@ -917,8 +917,9 @@ static void rb_wake_up_waiters(struct ir + struct rb_irq_work *rbwork = container_of(work, struct rb_irq_work, work); + + wake_up_all(&rbwork->waiters); +- if (rbwork->wakeup_full) { ++ if (rbwork->full_waiters_pending || rbwork->wakeup_full) { + rbwork->wakeup_full = false; ++ rbwork->full_waiters_pending = false; + wake_up_all(&rbwork->full_waiters); + } + } diff --git a/queue-6.0/ring-buffer-fix-race-between-reset-page-and-reading-page.patch b/queue-6.0/ring-buffer-fix-race-between-reset-page-and-reading-page.patch new file mode 100644 index 00000000000..c47f95ded68 --- /dev/null +++ b/queue-6.0/ring-buffer-fix-race-between-reset-page-and-reading-page.patch @@ -0,0 +1,115 @@ +From a0fcaaed0c46cf9399d3a2d6e0c87ddb3df0e044 Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Thu, 29 Sep 2022 10:49:09 -0400 +Subject: ring-buffer: Fix race between reset page and reading page + +From: Steven Rostedt (Google) + +commit a0fcaaed0c46cf9399d3a2d6e0c87ddb3df0e044 upstream. + +The ring buffer is broken up into sub buffers (currently of page size). +Each sub buffer has a pointer to its "tail" (the last event written to the +sub buffer). When a new event is requested, the tail is locally +incremented to cover the size of the new event. This is done in a way that +there is no need for locking. + +If the tail goes past the end of the sub buffer, the process of moving to +the next sub buffer takes place. After setting the current sub buffer to +the next one, the previous one that had the tail go passed the end of the +sub buffer needs to be reset back to the original tail location (before +the new event was requested) and the rest of the sub buffer needs to be +"padded". + +The race happens when a reader takes control of the sub buffer. As readers +do a "swap" of sub buffers from the ring buffer to get exclusive access to +the sub buffer, it replaces the "head" sub buffer with an empty sub buffer +that goes back into the writable portion of the ring buffer. This swap can +happen as soon as the writer moves to the next sub buffer and before it +updates the last sub buffer with padding. + +Because the sub buffer can be released to the reader while the writer is +still updating the padding, it is possible for the reader to see the event +that goes past the end of the sub buffer. This can cause obvious issues. + +To fix this, add a few memory barriers so that the reader definitely sees +the updates to the sub buffer, and also waits until the writer has put +back the "tail" of the sub buffer back to the last event that was written +on it. + +To be paranoid, it will only spin for 1 second, otherwise it will +warn and shutdown the ring buffer code. 1 second should be enough as +the writer does have preemption disabled. If the writer doesn't move +within 1 second (with preemption disabled) something is horribly +wrong. No interrupt should last 1 second! + +Link: https://lore.kernel.org/all/20220830120854.7545-1-jiazi.li@transsion.com/ +Link: https://bugzilla.kernel.org/show_bug.cgi?id=216369 +Link: https://lkml.kernel.org/r/20220929104909.0650a36c@gandalf.local.home + +Cc: Ingo Molnar +Cc: Andrew Morton +Cc: stable@vger.kernel.org +Fixes: c7b0930857e22 ("ring-buffer: prevent adding write in discarded area") +Reported-by: Jiazi.Li +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/ring_buffer.c | 33 +++++++++++++++++++++++++++++++++ + 1 file changed, 33 insertions(+) + +--- a/kernel/trace/ring_buffer.c ++++ b/kernel/trace/ring_buffer.c +@@ -2648,6 +2648,9 @@ rb_reset_tail(struct ring_buffer_per_cpu + /* Mark the rest of the page with padding */ + rb_event_set_padding(event); + ++ /* Make sure the padding is visible before the write update */ ++ smp_wmb(); ++ + /* Set the write back to the previous setting */ + local_sub(length, &tail_page->write); + return; +@@ -2659,6 +2662,9 @@ rb_reset_tail(struct ring_buffer_per_cpu + /* time delta must be non zero */ + event->time_delta = 1; + ++ /* Make sure the padding is visible before the tail_page->write update */ ++ smp_wmb(); ++ + /* Set write to end of buffer */ + length = (tail + length) - BUF_PAGE_SIZE; + local_sub(length, &tail_page->write); +@@ -4627,6 +4633,33 @@ rb_get_reader_page(struct ring_buffer_pe + arch_spin_unlock(&cpu_buffer->lock); + local_irq_restore(flags); + ++ /* ++ * The writer has preempt disable, wait for it. But not forever ++ * Although, 1 second is pretty much "forever" ++ */ ++#define USECS_WAIT 1000000 ++ for (nr_loops = 0; nr_loops < USECS_WAIT; nr_loops++) { ++ /* If the write is past the end of page, a writer is still updating it */ ++ if (likely(!reader || rb_page_write(reader) <= BUF_PAGE_SIZE)) ++ break; ++ ++ udelay(1); ++ ++ /* Get the latest version of the reader write value */ ++ smp_rmb(); ++ } ++ ++ /* The writer is not moving forward? Something is wrong */ ++ if (RB_WARN_ON(cpu_buffer, nr_loops == USECS_WAIT)) ++ reader = NULL; ++ ++ /* ++ * Make sure we see any padding after the write update ++ * (see rb_reset_tail()) ++ */ ++ smp_rmb(); ++ ++ + return reader; + } + diff --git a/queue-6.0/ring-buffer-have-the-shortest_full-queue-be-the-shortest-not-longest.patch b/queue-6.0/ring-buffer-have-the-shortest_full-queue-be-the-shortest-not-longest.patch new file mode 100644 index 00000000000..016579379ed --- /dev/null +++ b/queue-6.0/ring-buffer-have-the-shortest_full-queue-be-the-shortest-not-longest.patch @@ -0,0 +1,36 @@ +From 3b19d614b61b93a131f463817e08219c9ce1fee3 Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Tue, 27 Sep 2022 19:15:24 -0400 +Subject: ring-buffer: Have the shortest_full queue be the shortest not longest + +From: Steven Rostedt (Google) + +commit 3b19d614b61b93a131f463817e08219c9ce1fee3 upstream. + +The logic to know when the shortest waiters on the ring buffer should be +woken up or not has uses a less than instead of a greater than compare, +which causes the shortest_full to actually be the longest. + +Link: https://lkml.kernel.org/r/20220927231823.718039222@goodmis.org + +Cc: stable@vger.kernel.org +Cc: Ingo Molnar +Cc: Andrew Morton +Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader") +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/ring_buffer.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/kernel/trace/ring_buffer.c ++++ b/kernel/trace/ring_buffer.c +@@ -1011,7 +1011,7 @@ int ring_buffer_wait(struct trace_buffer + nr_pages = cpu_buffer->nr_pages; + dirty = ring_buffer_nr_dirty_pages(buffer, cpu); + if (!cpu_buffer->shortest_full || +- cpu_buffer->shortest_full < full) ++ cpu_buffer->shortest_full > full) + cpu_buffer->shortest_full = full; + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); + if (!pagebusy && diff --git a/queue-6.0/rpmsg-char-avoid-double-destroy-of-default-endpoint.patch b/queue-6.0/rpmsg-char-avoid-double-destroy-of-default-endpoint.patch new file mode 100644 index 00000000000..6ebe116f94d --- /dev/null +++ b/queue-6.0/rpmsg-char-avoid-double-destroy-of-default-endpoint.patch @@ -0,0 +1,52 @@ +From 467233a4ac29b215d492843d067a9f091e6bf0c5 Mon Sep 17 00:00:00 2001 +From: Shengjiu Wang +Date: Wed, 21 Sep 2022 09:58:43 +0800 +Subject: rpmsg: char: Avoid double destroy of default endpoint + +From: Shengjiu Wang + +commit 467233a4ac29b215d492843d067a9f091e6bf0c5 upstream. + +The rpmsg_dev_remove() in rpmsg_core is the place for releasing +this default endpoint. + +So need to avoid destroying the default endpoint in +rpmsg_chrdev_eptdev_destroy(), this should be the same as +rpmsg_eptdev_release(). Otherwise there will be double destroy +issue that ept->refcount report warning: + +refcount_t: underflow; use-after-free. + +Call trace: + refcount_warn_saturate+0xf8/0x150 + virtio_rpmsg_destroy_ept+0xd4/0xec + rpmsg_dev_remove+0x60/0x70 + +The issue can be reproduced by stopping remoteproc before +closing the /dev/rpmsgX. + +Fixes: bea9b79c2d10 ("rpmsg: char: Add possibility to use default endpoint of the rpmsg device") +Signed-off-by: Shengjiu Wang +Reviewed-by: Arnaud Pouliquen +Reviewed-by: Peng Fan +Cc: stable +Link: https://lore.kernel.org/r/1663725523-6514-1-git-send-email-shengjiu.wang@nxp.com +Signed-off-by: Mathieu Poirier +Signed-off-by: Greg Kroah-Hartman +--- + drivers/rpmsg/rpmsg_char.c | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +--- a/drivers/rpmsg/rpmsg_char.c ++++ b/drivers/rpmsg/rpmsg_char.c +@@ -76,7 +76,9 @@ int rpmsg_chrdev_eptdev_destroy(struct d + + mutex_lock(&eptdev->ept_lock); + if (eptdev->ept) { +- rpmsg_destroy_ept(eptdev->ept); ++ /* The default endpoint is released by the rpmsg core */ ++ if (!eptdev->default_ept) ++ rpmsg_destroy_ept(eptdev->ept); + eptdev->ept = NULL; + } + mutex_unlock(&eptdev->ept_lock); diff --git a/queue-6.0/series b/queue-6.0/series index 3539e7a1a73..c4225d99f3e 100644 --- a/queue-6.0/series +++ b/queue-6.0/series @@ -146,3 +146,22 @@ ext4-fix-miss-release-buffer-head-in-ext4_fc_write_inode.patch ext4-fix-potential-memory-leak-in-ext4_fc_record_modified_inode.patch ext4-fix-potential-memory-leak-in-ext4_fc_record_regions.patch ext4-update-state-fc_regions_size-after-successful-memory-allocation.patch +livepatch-fix-race-between-fork-and-klp-transition.patch +ftrace-properly-unset-ftrace_hash_fl_mod.patch +ftrace-still-disable-enabled-records-marked-as-disabled.patch +ring-buffer-allow-splice-to-read-previous-partially-read-pages.patch +ring-buffer-have-the-shortest_full-queue-be-the-shortest-not-longest.patch +ring-buffer-check-pending-waiters-when-doing-wake-ups-as-well.patch +ring-buffer-add-ring_buffer_wake_waiters.patch +ring-buffer-fix-race-between-reset-page-and-reading-page.patch +tracing-eprobe-fix-alloc-event-dir-failed-when-event-name-no-set.patch +tracing-disable-interrupt-or-preemption-before-acquiring-arch_spinlock_t.patch +tracing-wake-up-ring-buffer-waiters-on-closing-of-the-file.patch +tracing-wake-up-waiters-when-tracing-is-disabled.patch +tracing-add-ioctl-to-force-ring-buffer-waiters-to-wake-up.patch +tracing-do-not-free-snapshot-if-tracer-is-on-cmdline.patch +tracing-move-duplicate-code-of-trace_kprobe-eprobe.c-into-header.patch +tracing-add-fault-name-injection-to-kernel-probes.patch +tracing-fix-reading-strings-from-synthetic-events.patch +rpmsg-char-avoid-double-destroy-of-default-endpoint.patch +thunderbolt-explicitly-enable-lane-adapter-hotplug-events-at-startup.patch diff --git a/queue-6.0/thunderbolt-explicitly-enable-lane-adapter-hotplug-events-at-startup.patch b/queue-6.0/thunderbolt-explicitly-enable-lane-adapter-hotplug-events-at-startup.patch new file mode 100644 index 00000000000..3672590b776 --- /dev/null +++ b/queue-6.0/thunderbolt-explicitly-enable-lane-adapter-hotplug-events-at-startup.patch @@ -0,0 +1,116 @@ +From 5d2569cb4a65c373896ec0217febdf88739ed295 Mon Sep 17 00:00:00 2001 +From: Mario Limonciello +Date: Mon, 26 Sep 2022 09:33:50 -0500 +Subject: thunderbolt: Explicitly enable lane adapter hotplug events at startup + +From: Mario Limonciello + +commit 5d2569cb4a65c373896ec0217febdf88739ed295 upstream. + +Software that has run before the USB4 CM in Linux runs may have disabled +hotplug events for a given lane adapter. + +Other CMs such as that one distributed with Windows 11 will enable hotplug +events. Do the same thing in the Linux CM which fixes hotplug events on +"AMD Pink Sardine". + +Cc: stable@vger.kernel.org +Signed-off-by: Mario Limonciello +Signed-off-by: Mika Westerberg +Signed-off-by: Greg Kroah-Hartman +--- + drivers/thunderbolt/switch.c | 24 ++++++++++++++++++++++++ + drivers/thunderbolt/tb.h | 1 + + drivers/thunderbolt/tb_regs.h | 1 + + drivers/thunderbolt/usb4.c | 20 ++++++++++++++++++++ + 4 files changed, 46 insertions(+) + +--- a/drivers/thunderbolt/switch.c ++++ b/drivers/thunderbolt/switch.c +@@ -2822,6 +2822,26 @@ static void tb_switch_credits_init(struc + tb_sw_info(sw, "failed to determine preferred buffer allocation, using defaults\n"); + } + ++static int tb_switch_port_hotplug_enable(struct tb_switch *sw) ++{ ++ struct tb_port *port; ++ ++ if (tb_switch_is_icm(sw)) ++ return 0; ++ ++ tb_switch_for_each_port(sw, port) { ++ int res; ++ ++ if (!port->cap_usb4) ++ continue; ++ ++ res = usb4_port_hotplug_enable(port); ++ if (res) ++ return res; ++ } ++ return 0; ++} ++ + /** + * tb_switch_add() - Add a switch to the domain + * @sw: Switch to add +@@ -2891,6 +2911,10 @@ int tb_switch_add(struct tb_switch *sw) + return ret; + } + ++ ret = tb_switch_port_hotplug_enable(sw); ++ if (ret) ++ return ret; ++ + ret = device_add(&sw->dev); + if (ret) { + dev_err(&sw->dev, "failed to add device: %d\n", ret); +--- a/drivers/thunderbolt/tb.h ++++ b/drivers/thunderbolt/tb.h +@@ -1174,6 +1174,7 @@ int usb4_switch_add_ports(struct tb_swit + void usb4_switch_remove_ports(struct tb_switch *sw); + + int usb4_port_unlock(struct tb_port *port); ++int usb4_port_hotplug_enable(struct tb_port *port); + int usb4_port_configure(struct tb_port *port); + void usb4_port_unconfigure(struct tb_port *port); + int usb4_port_configure_xdomain(struct tb_port *port); +--- a/drivers/thunderbolt/tb_regs.h ++++ b/drivers/thunderbolt/tb_regs.h +@@ -308,6 +308,7 @@ struct tb_regs_port_header { + #define ADP_CS_5 0x05 + #define ADP_CS_5_LCA_MASK GENMASK(28, 22) + #define ADP_CS_5_LCA_SHIFT 22 ++#define ADP_CS_5_DHP BIT(31) + + /* TMU adapter registers */ + #define TMU_ADP_CS_3 0x03 +--- a/drivers/thunderbolt/usb4.c ++++ b/drivers/thunderbolt/usb4.c +@@ -1046,6 +1046,26 @@ int usb4_port_unlock(struct tb_port *por + return tb_port_write(port, &val, TB_CFG_PORT, ADP_CS_4, 1); + } + ++/** ++ * usb4_port_hotplug_enable() - Enables hotplug for a port ++ * @port: USB4 port to operate on ++ * ++ * Enables hot plug events on a given port. This is only intended ++ * to be used on lane, DP-IN, and DP-OUT adapters. ++ */ ++int usb4_port_hotplug_enable(struct tb_port *port) ++{ ++ int ret; ++ u32 val; ++ ++ ret = tb_port_read(port, &val, TB_CFG_PORT, ADP_CS_5, 1); ++ if (ret) ++ return ret; ++ ++ val &= ~ADP_CS_5_DHP; ++ return tb_port_write(port, &val, TB_CFG_PORT, ADP_CS_5, 1); ++} ++ + static int usb4_port_set_configured(struct tb_port *port, bool configured) + { + int ret; diff --git a/queue-6.0/tracing-add-fault-name-injection-to-kernel-probes.patch b/queue-6.0/tracing-add-fault-name-injection-to-kernel-probes.patch new file mode 100644 index 00000000000..9c24162d1cd --- /dev/null +++ b/queue-6.0/tracing-add-fault-name-injection-to-kernel-probes.patch @@ -0,0 +1,98 @@ +From 2e9906f84fc7c99388bb7123ade167250d50f1c0 Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Wed, 12 Oct 2022 06:40:57 -0400 +Subject: tracing: Add "(fault)" name injection to kernel probes + +From: Steven Rostedt (Google) + +commit 2e9906f84fc7c99388bb7123ade167250d50f1c0 upstream. + +Have the specific functions for kernel probes that read strings to inject +the "(fault)" name directly. trace_probes.c does this too (for uprobes) +but as the code to read strings are going to be used by synthetic events +(and perhaps other utilities), it simplifies the code by making sure those +other uses do not need to implement the "(fault)" name injection as well. + +Link: https://lkml.kernel.org/r/20221012104534.644803645@goodmis.org + +Cc: stable@vger.kernel.org +Cc: Andrew Morton +Cc: Tom Zanussi +Acked-by: Masami Hiramatsu (Google) +Reviewed-by: Tom Zanussi +Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events") +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/trace_probe_kernel.h | 31 +++++++++++++++++++++++++------ + 1 file changed, 25 insertions(+), 6 deletions(-) + +--- a/kernel/trace/trace_probe_kernel.h ++++ b/kernel/trace/trace_probe_kernel.h +@@ -2,6 +2,8 @@ + #ifndef __TRACE_PROBE_KERNEL_H_ + #define __TRACE_PROBE_KERNEL_H_ + ++#define FAULT_STRING "(fault)" ++ + /* + * This depends on trace_probe.h, but can not include it due to + * the way trace_probe_tmpl.h is used by trace_kprobe.c and trace_eprobe.c. +@@ -13,8 +15,16 @@ static nokprobe_inline int + kern_fetch_store_strlen_user(unsigned long addr) + { + const void __user *uaddr = (__force const void __user *)addr; ++ int ret; + +- return strnlen_user_nofault(uaddr, MAX_STRING_SIZE); ++ ret = strnlen_user_nofault(uaddr, MAX_STRING_SIZE); ++ /* ++ * strnlen_user_nofault returns zero on fault, insert the ++ * FAULT_STRING when that occurs. ++ */ ++ if (ret <= 0) ++ return strlen(FAULT_STRING) + 1; ++ return ret; + } + + /* Return the length of string -- including null terminal byte */ +@@ -34,7 +44,18 @@ kern_fetch_store_strlen(unsigned long ad + len++; + } while (c && ret == 0 && len < MAX_STRING_SIZE); + +- return (ret < 0) ? ret : len; ++ /* For faults, return enough to hold the FAULT_STRING */ ++ return (ret < 0) ? strlen(FAULT_STRING) + 1 : len; ++} ++ ++static nokprobe_inline void set_data_loc(int ret, void *dest, void *__dest, void *base, int len) ++{ ++ if (ret >= 0) { ++ *(u32 *)dest = make_data_loc(ret, __dest - base); ++ } else { ++ strscpy(__dest, FAULT_STRING, len); ++ ret = strlen(__dest) + 1; ++ } + } + + /* +@@ -55,8 +76,7 @@ kern_fetch_store_string_user(unsigned lo + __dest = get_loc_data(dest, base); + + ret = strncpy_from_user_nofault(__dest, uaddr, maxlen); +- if (ret >= 0) +- *(u32 *)dest = make_data_loc(ret, __dest - base); ++ set_data_loc(ret, dest, __dest, base, maxlen); + + return ret; + } +@@ -87,8 +107,7 @@ kern_fetch_store_string(unsigned long ad + * probing. + */ + ret = strncpy_from_kernel_nofault(__dest, (void *)addr, maxlen); +- if (ret >= 0) +- *(u32 *)dest = make_data_loc(ret, __dest - base); ++ set_data_loc(ret, dest, __dest, base, maxlen); + + return ret; + } diff --git a/queue-6.0/tracing-add-ioctl-to-force-ring-buffer-waiters-to-wake-up.patch b/queue-6.0/tracing-add-ioctl-to-force-ring-buffer-waiters-to-wake-up.patch new file mode 100644 index 00000000000..4ba67d98b19 --- /dev/null +++ b/queue-6.0/tracing-add-ioctl-to-force-ring-buffer-waiters-to-wake-up.patch @@ -0,0 +1,62 @@ +From 01b2a52171735c6eea80ee2f355f32bea6c41418 Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Thu, 29 Sep 2022 09:50:29 -0400 +Subject: tracing: Add ioctl() to force ring buffer waiters to wake up + +From: Steven Rostedt (Google) + +commit 01b2a52171735c6eea80ee2f355f32bea6c41418 upstream. + +If a process is waiting on the ring buffer for data, there currently isn't +a clean way to force it to wake up. Add an ioctl call that will force any +tasks that are waiting on the trace_pipe_raw file to wake up. + +Link: https://lkml.kernel.org/r/20220929095029.117f913f@gandalf.local.home + +Cc: stable@vger.kernel.org +Cc: Ingo Molnar +Cc: Andrew Morton +Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/trace.c | 22 ++++++++++++++++++++++ + 1 file changed, 22 insertions(+) + +--- a/kernel/trace/trace.c ++++ b/kernel/trace/trace.c +@@ -8353,12 +8353,34 @@ out: + return ret; + } + ++/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */ ++static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned long arg) ++{ ++ struct ftrace_buffer_info *info = file->private_data; ++ struct trace_iterator *iter = &info->iter; ++ ++ if (cmd) ++ return -ENOIOCTLCMD; ++ ++ mutex_lock(&trace_types_lock); ++ ++ iter->wait_index++; ++ /* Make sure the waiters see the new wait_index */ ++ smp_wmb(); ++ ++ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); ++ ++ mutex_unlock(&trace_types_lock); ++ return 0; ++} ++ + static const struct file_operations tracing_buffers_fops = { + .open = tracing_buffers_open, + .read = tracing_buffers_read, + .poll = tracing_buffers_poll, + .release = tracing_buffers_release, + .splice_read = tracing_buffers_splice_read, ++ .unlocked_ioctl = tracing_buffers_ioctl, + .llseek = no_llseek, + }; + diff --git a/queue-6.0/tracing-disable-interrupt-or-preemption-before-acquiring-arch_spinlock_t.patch b/queue-6.0/tracing-disable-interrupt-or-preemption-before-acquiring-arch_spinlock_t.patch new file mode 100644 index 00000000000..df4b6b59ffc --- /dev/null +++ b/queue-6.0/tracing-disable-interrupt-or-preemption-before-acquiring-arch_spinlock_t.patch @@ -0,0 +1,159 @@ +From c0a581d7126c0bbc96163276f585fd7b4e4d8d0e Mon Sep 17 00:00:00 2001 +From: Waiman Long +Date: Thu, 22 Sep 2022 10:56:22 -0400 +Subject: tracing: Disable interrupt or preemption before acquiring arch_spinlock_t + +From: Waiman Long + +commit c0a581d7126c0bbc96163276f585fd7b4e4d8d0e upstream. + +It was found that some tracing functions in kernel/trace/trace.c acquire +an arch_spinlock_t with preemption and irqs enabled. An example is the +tracing_saved_cmdlines_size_read() function which intermittently causes +a "BUG: using smp_processor_id() in preemptible" warning when the LTP +read_all_proc test is run. + +That can be problematic in case preemption happens after acquiring the +lock. Add the necessary preemption or interrupt disabling code in the +appropriate places before acquiring an arch_spinlock_t. + +The convention here is to disable preemption for trace_cmdline_lock and +interupt for max_lock. + +Link: https://lkml.kernel.org/r/20220922145622.1744826-1-longman@redhat.com + +Cc: Peter Zijlstra +Cc: Ingo Molnar +Cc: Will Deacon +Cc: Boqun Feng +Cc: stable@vger.kernel.org +Fixes: a35873a0993b ("tracing: Add conditional snapshot") +Fixes: 939c7a4f04fc ("tracing: Introduce saved_cmdlines_size file") +Suggested-by: Steven Rostedt +Signed-off-by: Waiman Long +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/trace.c | 23 +++++++++++++++++++++++ + 1 file changed, 23 insertions(+) + +--- a/kernel/trace/trace.c ++++ b/kernel/trace/trace.c +@@ -1193,12 +1193,14 @@ void *tracing_cond_snapshot_data(struct + { + void *cond_data = NULL; + ++ local_irq_disable(); + arch_spin_lock(&tr->max_lock); + + if (tr->cond_snapshot) + cond_data = tr->cond_snapshot->cond_data; + + arch_spin_unlock(&tr->max_lock); ++ local_irq_enable(); + + return cond_data; + } +@@ -1334,9 +1336,11 @@ int tracing_snapshot_cond_enable(struct + goto fail_unlock; + } + ++ local_irq_disable(); + arch_spin_lock(&tr->max_lock); + tr->cond_snapshot = cond_snapshot; + arch_spin_unlock(&tr->max_lock); ++ local_irq_enable(); + + mutex_unlock(&trace_types_lock); + +@@ -1363,6 +1367,7 @@ int tracing_snapshot_cond_disable(struct + { + int ret = 0; + ++ local_irq_disable(); + arch_spin_lock(&tr->max_lock); + + if (!tr->cond_snapshot) +@@ -1373,6 +1378,7 @@ int tracing_snapshot_cond_disable(struct + } + + arch_spin_unlock(&tr->max_lock); ++ local_irq_enable(); + + return ret; + } +@@ -2200,6 +2206,11 @@ static size_t tgid_map_max; + + #define SAVED_CMDLINES_DEFAULT 128 + #define NO_CMDLINE_MAP UINT_MAX ++/* ++ * Preemption must be disabled before acquiring trace_cmdline_lock. ++ * The various trace_arrays' max_lock must be acquired in a context ++ * where interrupt is disabled. ++ */ + static arch_spinlock_t trace_cmdline_lock = __ARCH_SPIN_LOCK_UNLOCKED; + struct saved_cmdlines_buffer { + unsigned map_pid_to_cmdline[PID_MAX_DEFAULT+1]; +@@ -2412,7 +2423,11 @@ static int trace_save_cmdline(struct tas + * the lock, but we also don't want to spin + * nor do we want to disable interrupts, + * so if we miss here, then better luck next time. ++ * ++ * This is called within the scheduler and wake up, so interrupts ++ * had better been disabled and run queue lock been held. + */ ++ lockdep_assert_preemption_disabled(); + if (!arch_spin_trylock(&trace_cmdline_lock)) + return 0; + +@@ -5890,9 +5905,11 @@ tracing_saved_cmdlines_size_read(struct + char buf[64]; + int r; + ++ preempt_disable(); + arch_spin_lock(&trace_cmdline_lock); + r = scnprintf(buf, sizeof(buf), "%u\n", savedcmd->cmdline_num); + arch_spin_unlock(&trace_cmdline_lock); ++ preempt_enable(); + + return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); + } +@@ -5917,10 +5934,12 @@ static int tracing_resize_saved_cmdlines + return -ENOMEM; + } + ++ preempt_disable(); + arch_spin_lock(&trace_cmdline_lock); + savedcmd_temp = savedcmd; + savedcmd = s; + arch_spin_unlock(&trace_cmdline_lock); ++ preempt_enable(); + free_saved_cmdlines_buffer(savedcmd_temp); + + return 0; +@@ -6373,10 +6392,12 @@ int tracing_set_tracer(struct trace_arra + + #ifdef CONFIG_TRACER_SNAPSHOT + if (t->use_max_tr) { ++ local_irq_disable(); + arch_spin_lock(&tr->max_lock); + if (tr->cond_snapshot) + ret = -EBUSY; + arch_spin_unlock(&tr->max_lock); ++ local_irq_enable(); + if (ret) + goto out; + } +@@ -7436,10 +7457,12 @@ tracing_snapshot_write(struct file *filp + goto out; + } + ++ local_irq_disable(); + arch_spin_lock(&tr->max_lock); + if (tr->cond_snapshot) + ret = -EBUSY; + arch_spin_unlock(&tr->max_lock); ++ local_irq_enable(); + if (ret) + goto out; + diff --git a/queue-6.0/tracing-do-not-free-snapshot-if-tracer-is-on-cmdline.patch b/queue-6.0/tracing-do-not-free-snapshot-if-tracer-is-on-cmdline.patch new file mode 100644 index 00000000000..cc2b9a8eff8 --- /dev/null +++ b/queue-6.0/tracing-do-not-free-snapshot-if-tracer-is-on-cmdline.patch @@ -0,0 +1,82 @@ +From a541a9559bb0a8ecc434de01d3e4826c32e8bb53 Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Wed, 5 Oct 2022 11:37:57 -0400 +Subject: tracing: Do not free snapshot if tracer is on cmdline + +From: Steven Rostedt (Google) + +commit a541a9559bb0a8ecc434de01d3e4826c32e8bb53 upstream. + +The ftrace_boot_snapshot and alloc_snapshot cmdline options allocate the +snapshot buffer at boot up for use later. The ftrace_boot_snapshot in +particular requires the snapshot to be allocated because it will take a +snapshot at the end of boot up allowing to see the traces that happened +during boot so that it's not lost when user space takes over. + +When a tracer is registered (started) there's a path that checks if it +requires the snapshot buffer or not, and if it does not and it was +allocated it will do a synchronization and free the snapshot buffer. + +This is only required if the previous tracer was using it for "max +latency" snapshots, as it needs to make sure all max snapshots are +complete before freeing. But this is only needed if the previous tracer +was using the snapshot buffer for latency (like irqoff tracer and +friends). But it does not make sense to free it, if the previous tracer +was not using it, and the snapshot was allocated by the cmdline +parameters. This basically takes away the point of allocating it in the +first place! + +Note, the allocated snapshot worked fine for just trace events, but fails +when a tracer is enabled on the cmdline. + +Further investigation, this goes back even further and it does not require +a tracer on the cmdline to fail. Simply enable snapshots and then enable a +tracer, and it will remove the snapshot. + +Link: https://lkml.kernel.org/r/20221005113757.041df7fe@gandalf.local.home + +Cc: Masami Hiramatsu +Cc: Andrew Morton +Cc: stable@vger.kernel.org +Fixes: 45ad21ca5530 ("tracing: Have trace_array keep track if snapshot buffer is allocated") +Reported-by: Ross Zwisler +Tested-by: Ross Zwisler +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/trace.c | 10 ++++++---- + 1 file changed, 6 insertions(+), 4 deletions(-) + +--- a/kernel/trace/trace.c ++++ b/kernel/trace/trace.c +@@ -6428,12 +6428,12 @@ int tracing_set_tracer(struct trace_arra + if (tr->current_trace->reset) + tr->current_trace->reset(tr); + ++#ifdef CONFIG_TRACER_MAX_TRACE ++ had_max_tr = tr->current_trace->use_max_tr; ++ + /* Current trace needs to be nop_trace before synchronize_rcu */ + tr->current_trace = &nop_trace; + +-#ifdef CONFIG_TRACER_MAX_TRACE +- had_max_tr = tr->allocated_snapshot; +- + if (had_max_tr && !t->use_max_tr) { + /* + * We need to make sure that the update_max_tr sees that +@@ -6446,11 +6446,13 @@ int tracing_set_tracer(struct trace_arra + free_snapshot(tr); + } + +- if (t->use_max_tr && !had_max_tr) { ++ if (t->use_max_tr && !tr->allocated_snapshot) { + ret = tracing_alloc_snapshot_instance(tr); + if (ret < 0) + goto out; + } ++#else ++ tr->current_trace = &nop_trace; + #endif + + if (t->init) { diff --git a/queue-6.0/tracing-eprobe-fix-alloc-event-dir-failed-when-event-name-no-set.patch b/queue-6.0/tracing-eprobe-fix-alloc-event-dir-failed-when-event-name-no-set.patch new file mode 100644 index 00000000000..432d2e3fb38 --- /dev/null +++ b/queue-6.0/tracing-eprobe-fix-alloc-event-dir-failed-when-event-name-no-set.patch @@ -0,0 +1,44 @@ +From dc399adecd4e2826868e5d116a58e33071b18346 Mon Sep 17 00:00:00 2001 +From: Tao Chen +Date: Sat, 24 Sep 2022 22:13:34 +0800 +Subject: tracing/eprobe: Fix alloc event dir failed when event name no set + +From: Tao Chen + +commit dc399adecd4e2826868e5d116a58e33071b18346 upstream. + +The event dir will alloc failed when event name no set, using the +command: +"echo "e:esys/ syscalls/sys_enter_openat file=\$filename:string" +>> dynamic_events" +It seems that dir name="syscalls/sys_enter_openat" is not allowed +in debugfs. So just use the "sys_enter_openat" as the event name. + +Link: https://lkml.kernel.org/r/1664028814-45923-1-git-send-email-chentao.kernel@linux.alibaba.com + +Cc: Ingo Molnar +Cc: Tom Zanussi +Cc: Linyu Yuan +Cc: Tao Chen +Signed-off-by: Tao Chen +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/trace_eprobe.c | 3 +-- + 1 file changed, 1 insertion(+), 2 deletions(-) + +--- a/kernel/trace/trace_eprobe.c ++++ b/kernel/trace/trace_eprobe.c +@@ -968,8 +968,7 @@ static int __trace_eprobe_create(int arg + } + + if (!event) { +- strscpy(buf1, argv[1], MAX_EVENT_NAME_LEN); +- sanitize_event_name(buf1); ++ strscpy(buf1, sys_event, MAX_EVENT_NAME_LEN); + event = buf1; + } + diff --git a/queue-6.0/tracing-fix-reading-strings-from-synthetic-events.patch b/queue-6.0/tracing-fix-reading-strings-from-synthetic-events.patch new file mode 100644 index 00000000000..a0622ea1247 --- /dev/null +++ b/queue-6.0/tracing-fix-reading-strings-from-synthetic-events.patch @@ -0,0 +1,110 @@ +From 0934ae9977c27133449b6dd8c6213970e7eece38 Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Wed, 12 Oct 2022 06:40:58 -0400 +Subject: tracing: Fix reading strings from synthetic events + +From: Steven Rostedt (Google) + +commit 0934ae9977c27133449b6dd8c6213970e7eece38 upstream. + +The follow commands caused a crash: + + # cd /sys/kernel/tracing + # echo 's:open char file[]' > dynamic_events + # echo 'hist:keys=common_pid:file=filename:onchange($file).trace(open,$file)' > events/syscalls/sys_enter_openat/trigger' + # echo 1 > events/synthetic/open/enable + +BOOM! + +The problem is that the synthetic event field "char file[]" will read +the value given to it as a string without any memory checks to make sure +the address is valid. The above example will pass in the user space +address and the sythetic event code will happily call strlen() on it +and then strscpy() where either one will cause an oops when accessing +user space addresses. + +Use the helper functions from trace_kprobe and trace_eprobe that can +read strings safely (and actually succeed when the address is from user +space and the memory is mapped in). + +Now the above can show: + + packagekitd-1721 [000] ...2. 104.597170: open: file=/usr/lib/rpm/fileattrs/cmake.attr + in:imjournal-978 [006] ...2. 104.599642: open: file=/var/lib/rsyslog/imjournal.state.tmp + packagekitd-1721 [000] ...2. 104.626308: open: file=/usr/lib/rpm/fileattrs/debuginfo.attr + +Link: https://lkml.kernel.org/r/20221012104534.826549315@goodmis.org + +Cc: stable@vger.kernel.org +Cc: Andrew Morton +Cc: Tom Zanussi +Acked-by: Masami Hiramatsu (Google) +Reviewed-by: Tom Zanussi +Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events") +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/trace_events_synth.c | 23 +++++++++++++++++------ + 1 file changed, 17 insertions(+), 6 deletions(-) + +--- a/kernel/trace/trace_events_synth.c ++++ b/kernel/trace/trace_events_synth.c +@@ -17,6 +17,8 @@ + /* for gfp flag names */ + #include + #include ++#include "trace_probe.h" ++#include "trace_probe_kernel.h" + + #include "trace_synth.h" + +@@ -409,6 +411,7 @@ static unsigned int trace_string(struct + { + unsigned int len = 0; + char *str_field; ++ int ret; + + if (is_dynamic) { + u32 data_offset; +@@ -417,19 +420,27 @@ static unsigned int trace_string(struct + data_offset += event->n_u64 * sizeof(u64); + data_offset += data_size; + +- str_field = (char *)entry + data_offset; +- +- len = strlen(str_val) + 1; +- strscpy(str_field, str_val, len); ++ len = kern_fetch_store_strlen((unsigned long)str_val); + + data_offset |= len << 16; + *(u32 *)&entry->fields[*n_u64] = data_offset; + ++ ret = kern_fetch_store_string((unsigned long)str_val, &entry->fields[*n_u64], entry); ++ + (*n_u64)++; + } else { + str_field = (char *)&entry->fields[*n_u64]; + +- strscpy(str_field, str_val, STR_VAR_LEN_MAX); ++#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE ++ if ((unsigned long)str_val < TASK_SIZE) ++ ret = strncpy_from_user_nofault(str_field, str_val, STR_VAR_LEN_MAX); ++ else ++#endif ++ ret = strncpy_from_kernel_nofault(str_field, str_val, STR_VAR_LEN_MAX); ++ ++ if (ret < 0) ++ strcpy(str_field, FAULT_STRING); ++ + (*n_u64) += STR_VAR_LEN_MAX / sizeof(u64); + } + +@@ -462,7 +473,7 @@ static notrace void trace_event_raw_even + val_idx = var_ref_idx[field_pos]; + str_val = (char *)(long)var_ref_vals[val_idx]; + +- len = strlen(str_val) + 1; ++ len = kern_fetch_store_strlen((unsigned long)str_val); + + fields_size += len; + } diff --git a/queue-6.0/tracing-move-duplicate-code-of-trace_kprobe-eprobe.c-into-header.patch b/queue-6.0/tracing-move-duplicate-code-of-trace_kprobe-eprobe.c-into-header.patch new file mode 100644 index 00000000000..9e49b87f4d8 --- /dev/null +++ b/queue-6.0/tracing-move-duplicate-code-of-trace_kprobe-eprobe.c-into-header.patch @@ -0,0 +1,330 @@ +From f1d3cbfaafc10464550c6d3a125f4fc802bbaed5 Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Wed, 12 Oct 2022 06:40:56 -0400 +Subject: tracing: Move duplicate code of trace_kprobe/eprobe.c into header + +From: Steven Rostedt (Google) + +commit f1d3cbfaafc10464550c6d3a125f4fc802bbaed5 upstream. + +The functions: + + fetch_store_strlen_user() + fetch_store_strlen() + fetch_store_string_user() + fetch_store_string() + +are identical in both trace_kprobe.c and trace_eprobe.c. Move them into +a new header file trace_probe_kernel.h to share it. This code will later +be used by the synthetic events as well. + +Marked for stable as a fix for a crash in synthetic events requires it. + +Link: https://lkml.kernel.org/r/20221012104534.467668078@goodmis.org + +Cc: stable@vger.kernel.org +Cc: Andrew Morton +Cc: Tom Zanussi +Acked-by: Masami Hiramatsu (Google) +Reviewed-by: Tom Zanussi +Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events") +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/trace_eprobe.c | 60 +---------------------- + kernel/trace/trace_kprobe.c | 60 +---------------------- + kernel/trace/trace_probe_kernel.h | 96 ++++++++++++++++++++++++++++++++++++++ + 3 files changed, 106 insertions(+), 110 deletions(-) + create mode 100644 kernel/trace/trace_probe_kernel.h + +--- a/kernel/trace/trace_eprobe.c ++++ b/kernel/trace/trace_eprobe.c +@@ -16,6 +16,7 @@ + #include "trace_dynevent.h" + #include "trace_probe.h" + #include "trace_probe_tmpl.h" ++#include "trace_probe_kernel.h" + + #define EPROBE_EVENT_SYSTEM "eprobes" + +@@ -453,29 +454,14 @@ NOKPROBE_SYMBOL(process_fetch_insn) + static nokprobe_inline int + fetch_store_strlen_user(unsigned long addr) + { +- const void __user *uaddr = (__force const void __user *)addr; +- +- return strnlen_user_nofault(uaddr, MAX_STRING_SIZE); ++ return kern_fetch_store_strlen_user(addr); + } + + /* Return the length of string -- including null terminal byte */ + static nokprobe_inline int + fetch_store_strlen(unsigned long addr) + { +- int ret, len = 0; +- u8 c; +- +-#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE +- if (addr < TASK_SIZE) +- return fetch_store_strlen_user(addr); +-#endif +- +- do { +- ret = copy_from_kernel_nofault(&c, (u8 *)addr + len, 1); +- len++; +- } while (c && ret == 0 && len < MAX_STRING_SIZE); +- +- return (ret < 0) ? ret : len; ++ return kern_fetch_store_strlen(addr); + } + + /* +@@ -485,21 +471,7 @@ fetch_store_strlen(unsigned long addr) + static nokprobe_inline int + fetch_store_string_user(unsigned long addr, void *dest, void *base) + { +- const void __user *uaddr = (__force const void __user *)addr; +- int maxlen = get_loc_len(*(u32 *)dest); +- void *__dest; +- long ret; +- +- if (unlikely(!maxlen)) +- return -ENOMEM; +- +- __dest = get_loc_data(dest, base); +- +- ret = strncpy_from_user_nofault(__dest, uaddr, maxlen); +- if (ret >= 0) +- *(u32 *)dest = make_data_loc(ret, __dest - base); +- +- return ret; ++ return kern_fetch_store_string_user(addr, dest, base); + } + + /* +@@ -509,29 +481,7 @@ fetch_store_string_user(unsigned long ad + static nokprobe_inline int + fetch_store_string(unsigned long addr, void *dest, void *base) + { +- int maxlen = get_loc_len(*(u32 *)dest); +- void *__dest; +- long ret; +- +-#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE +- if ((unsigned long)addr < TASK_SIZE) +- return fetch_store_string_user(addr, dest, base); +-#endif +- +- if (unlikely(!maxlen)) +- return -ENOMEM; +- +- __dest = get_loc_data(dest, base); +- +- /* +- * Try to get string again, since the string can be changed while +- * probing. +- */ +- ret = strncpy_from_kernel_nofault(__dest, (void *)addr, maxlen); +- if (ret >= 0) +- *(u32 *)dest = make_data_loc(ret, __dest - base); +- +- return ret; ++ return kern_fetch_store_string(addr, dest, base); + } + + static nokprobe_inline int +--- a/kernel/trace/trace_kprobe.c ++++ b/kernel/trace/trace_kprobe.c +@@ -20,6 +20,7 @@ + #include "trace_kprobe_selftest.h" + #include "trace_probe.h" + #include "trace_probe_tmpl.h" ++#include "trace_probe_kernel.h" + + #define KPROBE_EVENT_SYSTEM "kprobes" + #define KRETPROBE_MAXACTIVE_MAX 4096 +@@ -1223,29 +1224,14 @@ static const struct file_operations kpro + static nokprobe_inline int + fetch_store_strlen_user(unsigned long addr) + { +- const void __user *uaddr = (__force const void __user *)addr; +- +- return strnlen_user_nofault(uaddr, MAX_STRING_SIZE); ++ return kern_fetch_store_strlen_user(addr); + } + + /* Return the length of string -- including null terminal byte */ + static nokprobe_inline int + fetch_store_strlen(unsigned long addr) + { +- int ret, len = 0; +- u8 c; +- +-#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE +- if (addr < TASK_SIZE) +- return fetch_store_strlen_user(addr); +-#endif +- +- do { +- ret = copy_from_kernel_nofault(&c, (u8 *)addr + len, 1); +- len++; +- } while (c && ret == 0 && len < MAX_STRING_SIZE); +- +- return (ret < 0) ? ret : len; ++ return kern_fetch_store_strlen(addr); + } + + /* +@@ -1255,21 +1241,7 @@ fetch_store_strlen(unsigned long addr) + static nokprobe_inline int + fetch_store_string_user(unsigned long addr, void *dest, void *base) + { +- const void __user *uaddr = (__force const void __user *)addr; +- int maxlen = get_loc_len(*(u32 *)dest); +- void *__dest; +- long ret; +- +- if (unlikely(!maxlen)) +- return -ENOMEM; +- +- __dest = get_loc_data(dest, base); +- +- ret = strncpy_from_user_nofault(__dest, uaddr, maxlen); +- if (ret >= 0) +- *(u32 *)dest = make_data_loc(ret, __dest - base); +- +- return ret; ++ return kern_fetch_store_string_user(addr, dest, base); + } + + /* +@@ -1279,29 +1251,7 @@ fetch_store_string_user(unsigned long ad + static nokprobe_inline int + fetch_store_string(unsigned long addr, void *dest, void *base) + { +- int maxlen = get_loc_len(*(u32 *)dest); +- void *__dest; +- long ret; +- +-#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE +- if ((unsigned long)addr < TASK_SIZE) +- return fetch_store_string_user(addr, dest, base); +-#endif +- +- if (unlikely(!maxlen)) +- return -ENOMEM; +- +- __dest = get_loc_data(dest, base); +- +- /* +- * Try to get string again, since the string can be changed while +- * probing. +- */ +- ret = strncpy_from_kernel_nofault(__dest, (void *)addr, maxlen); +- if (ret >= 0) +- *(u32 *)dest = make_data_loc(ret, __dest - base); +- +- return ret; ++ return kern_fetch_store_string(addr, dest, base); + } + + static nokprobe_inline int +--- /dev/null ++++ b/kernel/trace/trace_probe_kernel.h +@@ -0,0 +1,96 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++#ifndef __TRACE_PROBE_KERNEL_H_ ++#define __TRACE_PROBE_KERNEL_H_ ++ ++/* ++ * This depends on trace_probe.h, but can not include it due to ++ * the way trace_probe_tmpl.h is used by trace_kprobe.c and trace_eprobe.c. ++ * Which means that any other user must include trace_probe.h before including ++ * this file. ++ */ ++/* Return the length of string -- including null terminal byte */ ++static nokprobe_inline int ++kern_fetch_store_strlen_user(unsigned long addr) ++{ ++ const void __user *uaddr = (__force const void __user *)addr; ++ ++ return strnlen_user_nofault(uaddr, MAX_STRING_SIZE); ++} ++ ++/* Return the length of string -- including null terminal byte */ ++static nokprobe_inline int ++kern_fetch_store_strlen(unsigned long addr) ++{ ++ int ret, len = 0; ++ u8 c; ++ ++#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE ++ if (addr < TASK_SIZE) ++ return kern_fetch_store_strlen_user(addr); ++#endif ++ ++ do { ++ ret = copy_from_kernel_nofault(&c, (u8 *)addr + len, 1); ++ len++; ++ } while (c && ret == 0 && len < MAX_STRING_SIZE); ++ ++ return (ret < 0) ? ret : len; ++} ++ ++/* ++ * Fetch a null-terminated string from user. Caller MUST set *(u32 *)buf ++ * with max length and relative data location. ++ */ ++static nokprobe_inline int ++kern_fetch_store_string_user(unsigned long addr, void *dest, void *base) ++{ ++ const void __user *uaddr = (__force const void __user *)addr; ++ int maxlen = get_loc_len(*(u32 *)dest); ++ void *__dest; ++ long ret; ++ ++ if (unlikely(!maxlen)) ++ return -ENOMEM; ++ ++ __dest = get_loc_data(dest, base); ++ ++ ret = strncpy_from_user_nofault(__dest, uaddr, maxlen); ++ if (ret >= 0) ++ *(u32 *)dest = make_data_loc(ret, __dest - base); ++ ++ return ret; ++} ++ ++/* ++ * Fetch a null-terminated string. Caller MUST set *(u32 *)buf with max ++ * length and relative data location. ++ */ ++static nokprobe_inline int ++kern_fetch_store_string(unsigned long addr, void *dest, void *base) ++{ ++ int maxlen = get_loc_len(*(u32 *)dest); ++ void *__dest; ++ long ret; ++ ++#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE ++ if ((unsigned long)addr < TASK_SIZE) ++ return kern_fetch_store_string_user(addr, dest, base); ++#endif ++ ++ if (unlikely(!maxlen)) ++ return -ENOMEM; ++ ++ __dest = get_loc_data(dest, base); ++ ++ /* ++ * Try to get string again, since the string can be changed while ++ * probing. ++ */ ++ ret = strncpy_from_kernel_nofault(__dest, (void *)addr, maxlen); ++ if (ret >= 0) ++ *(u32 *)dest = make_data_loc(ret, __dest - base); ++ ++ return ret; ++} ++ ++#endif /* __TRACE_PROBE_KERNEL_H_ */ diff --git a/queue-6.0/tracing-wake-up-ring-buffer-waiters-on-closing-of-the-file.patch b/queue-6.0/tracing-wake-up-ring-buffer-waiters-on-closing-of-the-file.patch new file mode 100644 index 00000000000..6def3a3a9c6 --- /dev/null +++ b/queue-6.0/tracing-wake-up-ring-buffer-waiters-on-closing-of-the-file.patch @@ -0,0 +1,79 @@ +From f3ddb74ad0790030c9592229fb14d8c451f4e9a8 Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Tue, 27 Sep 2022 19:15:27 -0400 +Subject: tracing: Wake up ring buffer waiters on closing of the file + +From: Steven Rostedt (Google) + +commit f3ddb74ad0790030c9592229fb14d8c451f4e9a8 upstream. + +When the file that represents the ring buffer is closed, there may be +waiters waiting on more input from the ring buffer. Call +ring_buffer_wake_waiters() to wake up any waiters when the file is +closed. + +Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org + +Cc: stable@vger.kernel.org +Cc: Ingo Molnar +Cc: Andrew Morton +Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/trace_events.h | 1 + + kernel/trace/trace.c | 15 +++++++++++++++ + 2 files changed, 16 insertions(+) + +--- a/include/linux/trace_events.h ++++ b/include/linux/trace_events.h +@@ -92,6 +92,7 @@ struct trace_iterator { + unsigned int temp_size; + char *fmt; /* modified format holder */ + unsigned int fmt_size; ++ long wait_index; + + /* trace_seq for __print_flags() and __print_symbolic() etc. */ + struct trace_seq tmp_seq; +--- a/kernel/trace/trace.c ++++ b/kernel/trace/trace.c +@@ -8160,6 +8160,12 @@ static int tracing_buffers_release(struc + + __trace_array_put(iter->tr); + ++ iter->wait_index++; ++ /* Make sure the waiters see the new wait_index */ ++ smp_wmb(); ++ ++ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); ++ + if (info->spare) + ring_buffer_free_read_page(iter->array_buffer->buffer, + info->spare_cpu, info->spare); +@@ -8313,6 +8319,8 @@ tracing_buffers_splice_read(struct file + + /* did we read anything? */ + if (!spd.nr_pages) { ++ long wait_index; ++ + if (ret) + goto out; + +@@ -8320,10 +8328,17 @@ tracing_buffers_splice_read(struct file + if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) + goto out; + ++ wait_index = READ_ONCE(iter->wait_index); ++ + ret = wait_on_pipe(iter, iter->tr->buffer_percent); + if (ret) + goto out; + ++ /* Make sure we see the new wait_index */ ++ smp_rmb(); ++ if (wait_index != iter->wait_index) ++ goto out; ++ + goto again; + } + diff --git a/queue-6.0/tracing-wake-up-waiters-when-tracing-is-disabled.patch b/queue-6.0/tracing-wake-up-waiters-when-tracing-is-disabled.patch new file mode 100644 index 00000000000..87aab6175c1 --- /dev/null +++ b/queue-6.0/tracing-wake-up-waiters-when-tracing-is-disabled.patch @@ -0,0 +1,43 @@ +From 2b0fd9a59b7990c161fa1cb7b79edb22847c87c2 Mon Sep 17 00:00:00 2001 +From: "Steven Rostedt (Google)" +Date: Wed, 28 Sep 2022 18:22:20 -0400 +Subject: tracing: Wake up waiters when tracing is disabled + +From: Steven Rostedt (Google) + +commit 2b0fd9a59b7990c161fa1cb7b79edb22847c87c2 upstream. + +When tracing is disabled, there's no reason that waiters should stay +waiting, wake them up, otherwise tasks get stuck when they should be +flushing the buffers. + +Cc: stable@vger.kernel.org +Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") +Signed-off-by: Steven Rostedt (Google) +Signed-off-by: Greg Kroah-Hartman +--- + kernel/trace/trace.c | 6 ++++++ + 1 file changed, 6 insertions(+) + +--- a/kernel/trace/trace.c ++++ b/kernel/trace/trace.c +@@ -8334,6 +8334,10 @@ tracing_buffers_splice_read(struct file + if (ret) + goto out; + ++ /* No need to wait after waking up when tracing is off */ ++ if (!tracer_tracing_is_on(iter->tr)) ++ goto out; ++ + /* Make sure we see the new wait_index */ + smp_rmb(); + if (wait_index != iter->wait_index) +@@ -9043,6 +9047,8 @@ rb_simple_write(struct file *filp, const + tracer_tracing_off(tr); + if (tr->current_trace->stop) + tr->current_trace->stop(tr); ++ /* Wake up any waiters */ ++ ring_buffer_wake_waiters(buffer, RING_BUFFER_ALL_CPUS); + } + mutex_unlock(&trace_types_lock); + }