]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blame - releases/3.6.7/futex-handle-futex_pi-owner_died-take-over-correctly.patch
Linux 3.6.7
[thirdparty/kernel/stable-queue.git] / releases / 3.6.7 / futex-handle-futex_pi-owner_died-take-over-correctly.patch
CommitLineData
7b8cad35
GKH
1From 59fa6245192159ab5e1e17b8e31f15afa9cff4bf Mon Sep 17 00:00:00 2001
2From: Thomas Gleixner <tglx@linutronix.de>
3Date: Tue, 23 Oct 2012 22:29:38 +0200
4Subject: futex: Handle futex_pi OWNER_DIED take over correctly
5
6From: Thomas Gleixner <tglx@linutronix.de>
7
8commit 59fa6245192159ab5e1e17b8e31f15afa9cff4bf upstream.
9
10Siddhesh analyzed a failure in the take over of pi futexes in case the
11owner died and provided a workaround.
12See: http://sourceware.org/bugzilla/show_bug.cgi?id=14076
13
14The detailed problem analysis shows:
15
16Futex F is initialized with PTHREAD_PRIO_INHERIT and
17PTHREAD_MUTEX_ROBUST_NP attributes.
18
19T1 lock_futex_pi(F);
20
21T2 lock_futex_pi(F);
22 --> T2 blocks on the futex and creates pi_state which is associated
23 to T1.
24
25T1 exits
26 --> exit_robust_list() runs
27 --> Futex F userspace value TID field is set to 0 and
28 FUTEX_OWNER_DIED bit is set.
29
30T3 lock_futex_pi(F);
31 --> Succeeds due to the check for F's userspace TID field == 0
32 --> Claims ownership of the futex and sets its own TID into the
33 userspace TID field of futex F
34 --> returns to user space
35
36T1 --> exit_pi_state_list()
37 --> Transfers pi_state to waiter T2 and wakes T2 via
38 rt_mutex_unlock(&pi_state->mutex)
39
40T2 --> acquires pi_state->mutex and gains real ownership of the
41 pi_state
42 --> Claims ownership of the futex and sets its own TID into the
43 userspace TID field of futex F
44 --> returns to user space
45
46T3 --> observes inconsistent state
47
48This problem is independent of UP/SMP, preemptible/non preemptible
49kernels, or process shared vs. private. The only difference is that
50certain configurations are more likely to expose it.
51
52So as Siddhesh correctly analyzed the following check in
53futex_lock_pi_atomic() is the culprit:
54
55 if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
56
57We check the userspace value for a TID value of 0 and take over the
58futex unconditionally if that's true.
59
60AFAICT this check is there as it is correct for a different corner
61case of futexes: the WAITERS bit became stale.
62
63Now the proposed change
64
65- if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
66+ if (unlikely(ownerdied ||
67+ !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) {
68
69solves the problem, but it's not obvious why and it wreckages the
70"stale WAITERS bit" case.
71
72What happens is, that due to the WAITERS bit being set (T2 is blocked
73on that futex) it enforces T3 to go through lookup_pi_state(), which
74in the above case returns an existing pi_state and therefor forces T3
75to legitimately fight with T2 over the ownership of the pi_state (via
76pi_state->mutex). Probelm solved!
77
78Though that does not work for the "WAITERS bit is stale" problem
79because if lookup_pi_state() does not find existing pi_state it
80returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to
81return -ESRCH to user space because the OWNER_DIED bit is not set.
82
83Now there is a different solution to that problem. Do not look at the
84user space value at all and enforce a lookup of possibly available
85pi_state. If pi_state can be found, then the new incoming locker T3
86blocks on that pi_state and legitimately races with T2 to acquire the
87rt_mutex and the pi_state and therefor the proper ownership of the
88user space futex.
89
90lookup_pi_state() has the correct order of checks. It first tries to
91find a pi_state associated with the user space futex and only if that
92fails it checks for futex TID value = 0. If no pi_state is available
93nothing can create new state at that point because this happens with
94the hash bucket lock held.
95
96So the above scenario changes to:
97
98T1 lock_futex_pi(F);
99
100T2 lock_futex_pi(F);
101 --> T2 blocks on the futex and creates pi_state which is associated
102 to T1.
103
104T1 exits
105 --> exit_robust_list() runs
106 --> Futex F userspace value TID field is set to 0 and
107 FUTEX_OWNER_DIED bit is set.
108
109T3 lock_futex_pi(F);
110 --> Finds pi_state and blocks on pi_state->rt_mutex
111
112T1 --> exit_pi_state_list()
113 --> Transfers pi_state to waiter T2 and wakes it via
114 rt_mutex_unlock(&pi_state->mutex)
115
116T2 --> acquires pi_state->mutex and gains ownership of the pi_state
117 --> Claims ownership of the futex and sets its own TID into the
118 userspace TID field of futex F
119 --> returns to user space
120
121This covers all gazillion points on which T3 might come in between
122T1's exit_robust_list() clearing the TID field and T2 fixing it up. It
123also solves the "WAITERS bit stale" problem by forcing the take over.
124
125Another benefit of changing the code this way is that it makes it less
126dependent on untrusted user space values and therefor minimizes the
127possible wreckage which might be inflicted.
128
129As usual after staring for too long at the futex code my brain hurts
130so much that I really want to ditch that whole optimization of
131avoiding the syscall for the non contended case for PI futexes and rip
132out the maze of corner case handling code. Unfortunately we can't as
133user space relies on that existing behaviour, but at least thinking
134about it helps me to preserve my mental sanity. Maybe we should
135nevertheless :)
136
137Reported-and-tested-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
138Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1210232138540.2756@ionos
139Acked-by: Darren Hart <dvhart@linux.intel.com>
140Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
141Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
142
143---
144 kernel/futex.c | 41 ++++++++++++++++++++++-------------------
145 1 file changed, 22 insertions(+), 19 deletions(-)
146
147--- a/kernel/futex.c
148+++ b/kernel/futex.c
149@@ -716,7 +716,7 @@ static int futex_lock_pi_atomic(u32 __us
150 struct futex_pi_state **ps,
151 struct task_struct *task, int set_waiters)
152 {
153- int lock_taken, ret, ownerdied = 0;
154+ int lock_taken, ret, force_take = 0;
155 u32 uval, newval, curval, vpid = task_pid_vnr(task);
156
157 retry:
158@@ -755,17 +755,15 @@ retry:
159 newval = curval | FUTEX_WAITERS;
160
161 /*
162- * There are two cases, where a futex might have no owner (the
163- * owner TID is 0): OWNER_DIED. We take over the futex in this
164- * case. We also do an unconditional take over, when the owner
165- * of the futex died.
166- *
167- * This is safe as we are protected by the hash bucket lock !
168+ * Should we force take the futex? See below.
169 */
170- if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
171- /* Keep the OWNER_DIED bit */
172+ if (unlikely(force_take)) {
173+ /*
174+ * Keep the OWNER_DIED and the WAITERS bit and set the
175+ * new TID value.
176+ */
177 newval = (curval & ~FUTEX_TID_MASK) | vpid;
178- ownerdied = 0;
179+ force_take = 0;
180 lock_taken = 1;
181 }
182
183@@ -775,7 +773,7 @@ retry:
184 goto retry;
185
186 /*
187- * We took the lock due to owner died take over.
188+ * We took the lock due to forced take over.
189 */
190 if (unlikely(lock_taken))
191 return 1;
192@@ -790,20 +788,25 @@ retry:
193 switch (ret) {
194 case -ESRCH:
195 /*
196- * No owner found for this futex. Check if the
197- * OWNER_DIED bit is set to figure out whether
198- * this is a robust futex or not.
199+ * We failed to find an owner for this
200+ * futex. So we have no pi_state to block
201+ * on. This can happen in two cases:
202+ *
203+ * 1) The owner died
204+ * 2) A stale FUTEX_WAITERS bit
205+ *
206+ * Re-read the futex value.
207 */
208 if (get_futex_value_locked(&curval, uaddr))
209 return -EFAULT;
210
211 /*
212- * We simply start over in case of a robust
213- * futex. The code above will take the futex
214- * and return happy.
215+ * If the owner died or we have a stale
216+ * WAITERS bit the owner TID in the user space
217+ * futex is 0.
218 */
219- if (curval & FUTEX_OWNER_DIED) {
220- ownerdied = 1;
221+ if (!(curval & FUTEX_TID_MASK)) {
222+ force_take = 1;
223 goto retry;
224 }
225 default: