]>
Commit | Line | Data |
---|---|---|
bf9ffb20 GKH |
1 | From 651e22f2701b4113989237c3048d17337dd2185c Mon Sep 17 00:00:00 2001 |
2 | From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org> | |
3 | Date: Wed, 6 Aug 2014 14:11:33 -0400 | |
4 | Subject: ring-buffer: Always reset iterator to reader page | |
5 | ||
6 | From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org> | |
7 | ||
8 | commit 651e22f2701b4113989237c3048d17337dd2185c upstream. | |
9 | ||
10 | When performing a consuming read, the ring buffer swaps out a | |
11 | page from the ring buffer with a empty page and this page that | |
12 | was swapped out becomes the new reader page. The reader page | |
13 | is owned by the reader and since it was swapped out of the ring | |
14 | buffer, writers do not have access to it (there's an exception | |
15 | to that rule, but it's out of scope for this commit). | |
16 | ||
17 | When reading the "trace" file, it is a non consuming read, which | |
18 | means that the data in the ring buffer will not be modified. | |
19 | When the trace file is opened, a ring buffer iterator is allocated | |
20 | and writes to the ring buffer are disabled, such that the iterator | |
21 | will not have issues iterating over the data. | |
22 | ||
23 | Although the ring buffer disabled writes, it does not disable other | |
24 | reads, or even consuming reads. If a consuming read happens, then | |
25 | the iterator is reset and starts reading from the beginning again. | |
26 | ||
27 | My tests would sometimes trigger this bug on my i386 box: | |
28 | ||
29 | WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa() | |
30 | Modules linked in: | |
31 | CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8 | |
32 | Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 | |
33 | 00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b | |
34 | f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0 | |
35 | ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M | |
36 | Call Trace: | |
37 | [<c18796b3>] dump_stack+0x4b/0x75 | |
38 | [<c103a0e3>] warn_slowpath_common+0x7e/0x95 | |
39 | [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa | |
40 | [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa | |
41 | [<c103a185>] warn_slowpath_fmt+0x33/0x35 | |
42 | [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M | |
43 | [<c10bed04>] trace_find_cmdline+0x40/0x64 | |
44 | [<c10c3c16>] trace_print_context+0x27/0xec | |
45 | [<c10c4360>] ? trace_seq_printf+0x37/0x5b | |
46 | [<c10c0b15>] print_trace_line+0x319/0x39b | |
47 | [<c10ba3fb>] ? ring_buffer_read+0x47/0x50 | |
48 | [<c10c13b1>] s_show+0x192/0x1ab | |
49 | [<c10bfd9a>] ? s_next+0x5a/0x7c | |
50 | [<c112e76e>] seq_read+0x267/0x34c | |
51 | [<c1115a25>] vfs_read+0x8c/0xef | |
52 | [<c112e507>] ? seq_lseek+0x154/0x154 | |
53 | [<c1115ba2>] SyS_read+0x54/0x7f | |
54 | [<c188488e>] syscall_call+0x7/0xb | |
55 | ---[ end trace 3f507febd6b4cc83 ]--- | |
56 | >>>> ##### CPU 1 buffer started #### | |
57 | ||
58 | Which was the __trace_find_cmdline() function complaining about the pid | |
59 | in the event record being negative. | |
60 | ||
61 | After adding more test cases, this would trigger more often. Strangely | |
62 | enough, it would never trigger on a single test, but instead would trigger | |
63 | only when running all the tests. I believe that was the case because it | |
64 | required one of the tests to be shutting down via delayed instances while | |
65 | a new test started up. | |
66 | ||
67 | After spending several days debugging this, I found that it was caused by | |
68 | the iterator becoming corrupted. Debugging further, I found out why | |
69 | the iterator became corrupted. It happened with the rb_iter_reset(). | |
70 | ||
71 | As consuming reads may not read the full reader page, and only part | |
72 | of it, there's a "read" field to know where the last read took place. | |
73 | The iterator, must also start at the read position. In the rb_iter_reset() | |
74 | code, if the reader page was disconnected from the ring buffer, the iterator | |
75 | would start at the head page within the ring buffer (where writes still | |
76 | happen). But the mistake there was that it still used the "read" field | |
77 | to start the iterator on the head page, where it should always start | |
78 | at zero because readers never read from within the ring buffer where | |
79 | writes occur. | |
80 | ||
81 | I originally wrote a patch to have it set the iter->head to 0 instead | |
82 | of iter->head_page->read, but then I questioned why it wasn't always | |
83 | setting the iter to point to the reader page, as the reader page is | |
84 | still valid. The list_empty(reader_page->list) just means that it was | |
85 | successful in swapping out. But the reader_page may still have data. | |
86 | ||
87 | There was a bug report a long time ago that was not reproducible that | |
88 | had something about trace_pipe (consuming read) not matching trace | |
89 | (iterator read). This may explain why that happened. | |
90 | ||
91 | Anyway, the correct answer to this bug is to always use the reader page | |
92 | an not reset the iterator to inside the writable ring buffer. | |
93 | ||
94 | Fixes: d769041f8653 "ring_buffer: implement new locking" | |
95 | Signed-off-by: Steven Rostedt <rostedt@goodmis.org> | |
96 | Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | |
97 | ||
98 | --- | |
99 | kernel/trace/ring_buffer.c | 17 ++++++----------- | |
100 | 1 file changed, 6 insertions(+), 11 deletions(-) | |
101 | ||
102 | --- a/kernel/trace/ring_buffer.c | |
103 | +++ b/kernel/trace/ring_buffer.c | |
104 | @@ -3357,21 +3357,16 @@ static void rb_iter_reset(struct ring_bu | |
105 | struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer; | |
106 | ||
107 | /* Iterator usage is expected to have record disabled */ | |
108 | - if (list_empty(&cpu_buffer->reader_page->list)) { | |
109 | - iter->head_page = rb_set_head_page(cpu_buffer); | |
110 | - if (unlikely(!iter->head_page)) | |
111 | - return; | |
112 | - iter->head = iter->head_page->read; | |
113 | - } else { | |
114 | - iter->head_page = cpu_buffer->reader_page; | |
115 | - iter->head = cpu_buffer->reader_page->read; | |
116 | - } | |
117 | + iter->head_page = cpu_buffer->reader_page; | |
118 | + iter->head = cpu_buffer->reader_page->read; | |
119 | + | |
120 | + iter->cache_reader_page = iter->head_page; | |
121 | + iter->cache_read = iter->head; | |
122 | + | |
123 | if (iter->head) | |
124 | iter->read_stamp = cpu_buffer->read_stamp; | |
125 | else | |
126 | iter->read_stamp = iter->head_page->page->time_stamp; | |
127 | - iter->cache_reader_page = cpu_buffer->reader_page; | |
128 | - iter->cache_read = cpu_buffer->read; | |
129 | } | |
130 | ||
131 | /** |