]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blame - releases/3.16.3/ring-buffer-always-reset-iterator-to-reader-page.patch
4.9-stable patches
[thirdparty/kernel/stable-queue.git] / releases / 3.16.3 / ring-buffer-always-reset-iterator-to-reader-page.patch
CommitLineData
bf9ffb20
GKH
1From 651e22f2701b4113989237c3048d17337dd2185c Mon Sep 17 00:00:00 2001
2From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
3Date: Wed, 6 Aug 2014 14:11:33 -0400
4Subject: ring-buffer: Always reset iterator to reader page
5
6From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
7
8commit 651e22f2701b4113989237c3048d17337dd2185c upstream.
9
10When performing a consuming read, the ring buffer swaps out a
11page from the ring buffer with a empty page and this page that
12was swapped out becomes the new reader page. The reader page
13is owned by the reader and since it was swapped out of the ring
14buffer, writers do not have access to it (there's an exception
15to that rule, but it's out of scope for this commit).
16
17When reading the "trace" file, it is a non consuming read, which
18means that the data in the ring buffer will not be modified.
19When the trace file is opened, a ring buffer iterator is allocated
20and writes to the ring buffer are disabled, such that the iterator
21will not have issues iterating over the data.
22
23Although the ring buffer disabled writes, it does not disable other
24reads, or even consuming reads. If a consuming read happens, then
25the iterator is reset and starts reading from the beginning again.
26
27My tests would sometimes trigger this bug on my i386 box:
28
29WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa()
30Modules linked in:
31CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8
32Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
33 00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b
34 f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0
35 ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M
36Call Trace:
37 [<c18796b3>] dump_stack+0x4b/0x75
38 [<c103a0e3>] warn_slowpath_common+0x7e/0x95
39 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
40 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
41 [<c103a185>] warn_slowpath_fmt+0x33/0x35
42 [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M
43 [<c10bed04>] trace_find_cmdline+0x40/0x64
44 [<c10c3c16>] trace_print_context+0x27/0xec
45 [<c10c4360>] ? trace_seq_printf+0x37/0x5b
46 [<c10c0b15>] print_trace_line+0x319/0x39b
47 [<c10ba3fb>] ? ring_buffer_read+0x47/0x50
48 [<c10c13b1>] s_show+0x192/0x1ab
49 [<c10bfd9a>] ? s_next+0x5a/0x7c
50 [<c112e76e>] seq_read+0x267/0x34c
51 [<c1115a25>] vfs_read+0x8c/0xef
52 [<c112e507>] ? seq_lseek+0x154/0x154
53 [<c1115ba2>] SyS_read+0x54/0x7f
54 [<c188488e>] syscall_call+0x7/0xb
55---[ end trace 3f507febd6b4cc83 ]---
56>>>> ##### CPU 1 buffer started ####
57
58Which was the __trace_find_cmdline() function complaining about the pid
59in the event record being negative.
60
61After adding more test cases, this would trigger more often. Strangely
62enough, it would never trigger on a single test, but instead would trigger
63only when running all the tests. I believe that was the case because it
64required one of the tests to be shutting down via delayed instances while
65a new test started up.
66
67After spending several days debugging this, I found that it was caused by
68the iterator becoming corrupted. Debugging further, I found out why
69the iterator became corrupted. It happened with the rb_iter_reset().
70
71As consuming reads may not read the full reader page, and only part
72of it, there's a "read" field to know where the last read took place.
73The iterator, must also start at the read position. In the rb_iter_reset()
74code, if the reader page was disconnected from the ring buffer, the iterator
75would start at the head page within the ring buffer (where writes still
76happen). But the mistake there was that it still used the "read" field
77to start the iterator on the head page, where it should always start
78at zero because readers never read from within the ring buffer where
79writes occur.
80
81I originally wrote a patch to have it set the iter->head to 0 instead
82of iter->head_page->read, but then I questioned why it wasn't always
83setting the iter to point to the reader page, as the reader page is
84still valid. The list_empty(reader_page->list) just means that it was
85successful in swapping out. But the reader_page may still have data.
86
87There was a bug report a long time ago that was not reproducible that
88had something about trace_pipe (consuming read) not matching trace
89(iterator read). This may explain why that happened.
90
91Anyway, the correct answer to this bug is to always use the reader page
92an not reset the iterator to inside the writable ring buffer.
93
94Fixes: d769041f8653 "ring_buffer: implement new locking"
95Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
96Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
97
98---
99 kernel/trace/ring_buffer.c | 17 ++++++-----------
100 1 file changed, 6 insertions(+), 11 deletions(-)
101
102--- a/kernel/trace/ring_buffer.c
103+++ b/kernel/trace/ring_buffer.c
104@@ -3357,21 +3357,16 @@ static void rb_iter_reset(struct ring_bu
105 struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
106
107 /* Iterator usage is expected to have record disabled */
108- if (list_empty(&cpu_buffer->reader_page->list)) {
109- iter->head_page = rb_set_head_page(cpu_buffer);
110- if (unlikely(!iter->head_page))
111- return;
112- iter->head = iter->head_page->read;
113- } else {
114- iter->head_page = cpu_buffer->reader_page;
115- iter->head = cpu_buffer->reader_page->read;
116- }
117+ iter->head_page = cpu_buffer->reader_page;
118+ iter->head = cpu_buffer->reader_page->read;
119+
120+ iter->cache_reader_page = iter->head_page;
121+ iter->cache_read = iter->head;
122+
123 if (iter->head)
124 iter->read_stamp = cpu_buffer->read_stamp;
125 else
126 iter->read_stamp = iter->head_page->page->time_stamp;
127- iter->cache_reader_page = cpu_buffer->reader_page;
128- iter->cache_read = cpu_buffer->read;
129 }
130
131 /**