]>
Commit | Line | Data |
---|---|---|
8f0aff2a | 1 | .\" Page by b.hubert |
1abce893 MK |
2 | .\" and Copyright (C) 2015, Thomas Gleixner <tglx@linutronix.de> |
3 | .\" and Copyright (C) 2015, Michael Kerrisk <mtk.manpages@gmail.com> | |
2297bf0e | 4 | .\" |
2e46a6e7 | 5 | .\" %%%LICENSE_START(FREELY_REDISTRIBUTABLE) |
8f0aff2a | 6 | .\" may be freely modified and distributed |
8ff7380d | 7 | .\" %%%LICENSE_END |
fea681da MK |
8 | .\" |
9 | .\" Niki A. Rahimi (LTC Security Development, narahimi@us.ibm.com) | |
10 | .\" added ERRORS section. | |
11 | .\" | |
12 | .\" Modified 2004-06-17 mtk | |
13 | .\" Modified 2004-10-07 aeb, added FUTEX_REQUEUE, FUTEX_CMP_REQUEUE | |
14 | .\" | |
4f58b197 MK |
15 | .\" 2.6.31 adds FUTEX_WAIT_REQUEUE_PI, FUTEX_CMP_REQUEUE_PI |
16 | .\" commit 52400ba946759af28442dee6265c5c0180ac7122 | |
17 | .\" Author: Darren Hart <dvhltc@us.ibm.com> | |
18 | .\" Date: Fri Apr 3 13:40:49 2009 -0700 | |
19 | .\" | |
20 | .\" commit ba9c22f2c01cf5c88beed5a6b9e07d42e10bd358 | |
21 | .\" Author: Darren Hart <dvhltc@us.ibm.com> | |
22 | .\" Date: Mon Apr 20 22:22:22 2009 -0700 | |
23 | .\" | |
24 | .\" See Documentation/futex-requeue-pi.txt | |
34f7665a | 25 | .\" |
3d155313 | 26 | .TH FUTEX 2 2014-05-21 "Linux" "Linux Programmer's Manual" |
fea681da | 27 | .SH NAME |
ce154705 | 28 | futex \- fast user-space locking |
fea681da | 29 | .SH SYNOPSIS |
9d9dc1e8 | 30 | .nf |
fea681da MK |
31 | .sp |
32 | .B "#include <linux/futex.h>" | |
fea681da MK |
33 | .B "#include <sys/time.h>" |
34 | .sp | |
d33602c4 MK |
35 | .BI "int futex(int *" uaddr ", int " futex_op ", int " val , |
36 | .BI " const struct timespec *" timeout , | |
9d9dc1e8 | 37 | .BI " int *" uaddr2 ", int " val3 ); |
fea681da | 38 | .\" int *? void *? u32 *? |
9d9dc1e8 | 39 | .fi |
409f08b0 | 40 | |
b939d6e4 MK |
41 | .IR Note : |
42 | There is no glibc wrapper for this system call; see NOTES. | |
47297adb | 43 | .SH DESCRIPTION |
fea681da MK |
44 | .PP |
45 | The | |
e511ffb6 | 46 | .BR futex () |
fea681da MK |
47 | system call provides a method for |
48 | a program to wait for a value at a given address to change, and a | |
49 | method to wake up anyone waiting on a particular address (while the | |
50 | addresses for the same memory in separate processes may not be | |
51 | equal, the kernel maps them internally so the same memory mapped in | |
52 | different locations will correspond for | |
e511ffb6 | 53 | .BR futex () |
c13182ef | 54 | calls). |
fd3fa7ef | 55 | This system call is typically used to |
fea681da MK |
56 | implement the contended case of a lock in shared memory, as |
57 | described in | |
a8bda636 | 58 | .BR futex (7). |
fea681da | 59 | .PP |
f388ba70 MK |
60 | When a futex operation did not finish uncontended in user space, a |
61 | .BR futex () | |
62 | call needs to be made to the kernel to arbitrate. | |
c13182ef | 63 | Arbitration can either mean putting the calling |
fea681da MK |
64 | process to sleep or, conversely, waking a waiting process. |
65 | .PP | |
f388ba70 MK |
66 | Callers of |
67 | .BR futex () | |
68 | are expected to adhere to the semantics described in | |
a8bda636 | 69 | .BR futex (7). |
fea681da | 70 | As these |
d603cc27 | 71 | semantics involve writing nonportable assembly instructions, this in turn |
fea681da MK |
72 | probably means that most users will in fact be library authors and not |
73 | general application developers. | |
74 | .PP | |
75 | The | |
76 | .I uaddr | |
f388ba70 MK |
77 | argument points to an integer which stores the counter (futex). |
78 | On all platforms, futexes are four-byte integers that | |
79 | must be aligned on a four-byte boundary. | |
80 | The operation to perform on the futex is specified in the | |
81 | .I futex_op | |
82 | argument; | |
83 | .IR val | |
84 | is a value whose meaning and purpose depends on | |
85 | .IR futex_op . | |
36ab2074 MK |
86 | |
87 | The remaining arguments | |
88 | .RI ( timeout , | |
89 | .IR uaddr2 , | |
90 | and | |
91 | .IR val3 ) | |
92 | are required only for certain of the futex operations described below. | |
93 | Where one of these arguments is not required, it is ignored. | |
94 | For several blocking operations, the | |
95 | .I timeout | |
96 | argument is a pointer to a | |
97 | .IR timespec | |
98 | structure that specifies a timeout for the operation. | |
99 | However, notwithstanding the prototype shown above, for some operations, | |
100 | this argument is instead a four-byte integer whose meaning | |
101 | is determined by the operation. | |
102 | Where it is required, | |
103 | .IR uaddr2 | |
104 | is a pointer to a second futex that is employed by the operation. | |
105 | The interpretation of the final integer argument, | |
106 | .IR val3 , | |
107 | depends on the operation. | |
108 | ||
6be4bad7 | 109 | The |
d33602c4 | 110 | .I futex_op |
6be4bad7 MK |
111 | argument consists of two parts: |
112 | a command that specifies the operation to be performed, | |
113 | bit-wise ORed with zero or or more options that | |
114 | modify the behaviour of the operation. | |
fc30eb79 | 115 | The options that may be included in |
d33602c4 | 116 | .I futex_op |
fc30eb79 TG |
117 | are as follows: |
118 | .TP | |
119 | .BR FUTEX_PRIVATE_FLAG " (since Linux 2.6.22)" | |
120 | .\" commit 34f01cc1f512fa783302982776895c73714ebbc2 | |
121 | This option bit can be employed with all futex operations. | |
122 | It tells the kernel that the futex is process private and not shared | |
123 | with another process. | |
124 | This allows the kernel to choose the fast path for validating | |
125 | the user-space address and avoids expensive VMA lookups, | |
126 | taking reference counts on file backing store, and so on. | |
ae2c1774 MK |
127 | |
128 | As a convenience, | |
129 | .IR <linux/futex.h> | |
130 | defines a set of constants with the suffix | |
131 | .BR _PRIVATE | |
132 | that are equivalents of all of the operations listed below, | |
dcdfde26 | 133 | .\" except the obsolete FUTEX_FD, for which the "private" flag was |
ae2c1774 MK |
134 | .\" meaningless |
135 | but with the | |
136 | .BR FUTEX_PRIVATE_FLAG | |
137 | ORed into the constant value. | |
138 | Thus, there are | |
139 | .BR FUTEX_WAIT_PRIVATE , | |
140 | .BR FUTEX_WAKE_PRIVATE , | |
141 | and so on. | |
2e98bbc2 TG |
142 | .TP |
143 | .BR FUTEX_CLOCK_REALTIME " (since Linux 2.6.28)" | |
144 | .\" commit 1acdac104668a0834cfa267de9946fac7764d486 | |
4a7e5b05 | 145 | This option bit can be employed only with the |
2e98bbc2 TG |
146 | .BR FUTEX_WAIT_BITSET |
147 | and | |
148 | .BR FUTEX_WAIT_REQUEUE_PI | |
c84cf68c | 149 | operations. |
2e98bbc2 | 150 | |
f2103b26 MK |
151 | If this option is set, the kernel treats |
152 | .I timeout | |
153 | as an absolute time based on | |
2e98bbc2 TG |
154 | .BR CLOCK_REALTIME . |
155 | ||
f2103b26 MK |
156 | If this option is not set, the kernel treats |
157 | .I timeout | |
158 | as relative time, | |
1c952cf5 MK |
159 | .\" FIXME I added CLOCK_MONOTONIC here. Is it correct? |
160 | measured against the | |
161 | .BR CLOCK_MONOTONIC | |
162 | clock. | |
6be4bad7 MK |
163 | .PP |
164 | The operation specified in | |
d33602c4 | 165 | .I futex_op |
6be4bad7 | 166 | is one of the following: |
fea681da | 167 | .TP |
81c9d87e MK |
168 | .BR FUTEX_WAIT " (since Linux 2.6.0)" |
169 | .\" Strictly speaking, since some time in 2.5.x | |
f065673c MK |
170 | This operation tests that the value at the |
171 | location pointed to by the futex address | |
fea681da MK |
172 | .I uaddr |
173 | still contains the value | |
174 | .IR val , | |
f065673c | 175 | and then sleeps awaiting |
682edefb | 176 | .B FUTEX_WAKE |
f065673c MK |
177 | on the futex address. |
178 | The test and sleep steps are performed atomically. | |
179 | If the futex value does not match | |
180 | .IR val , | |
4710334a | 181 | then the call fails immediately with the error |
badbf70c | 182 | .BR EAGAIN . |
f065673c MK |
183 | .\" FIXME I added the following sentence. Please confirm that it is correct. |
184 | The purpose of the test step is to detect races where | |
185 | another process changes that value of the futex between | |
186 | the time it was last checked and the time of the | |
187 | .BR FUTEX_WAIT | |
63d3f911 | 188 | operation. |
1909e523 | 189 | |
c13182ef | 190 | If the |
fea681da | 191 | .I timeout |
1c952cf5 MK |
192 | argument is non-NULL, its contents specify a relative timeout for the wait |
193 | .\" FIXME I added CLOCK_MONOTONIC here. Is it correct? | |
194 | measured according to the | |
195 | .BR CLOCK_MONOTONIC | |
196 | clock. | |
82a6092b MK |
197 | (This interval will be rounded up to the system clock granularity, |
198 | and kernel scheduling delays mean that the | |
199 | blocking interval may overrun by a small amount.) | |
200 | If | |
201 | .I timeout | |
202 | is NULL, the call blocks indefinitely. | |
4798a7f3 | 203 | |
c13182ef | 204 | The arguments |
fea681da MK |
205 | .I uaddr2 |
206 | and | |
207 | .I val3 | |
208 | are ignored. | |
209 | ||
210 | For | |
a8bda636 | 211 | .BR futex (7), |
fea681da MK |
212 | this call is executed if decrementing the count gave a negative value |
213 | (indicating contention), and will sleep until another process releases | |
682edefb MK |
214 | the futex and executes the |
215 | .B FUTEX_WAKE | |
216 | operation. | |
fea681da | 217 | .TP |
81c9d87e MK |
218 | .BR FUTEX_WAKE " (since Linux 2.6.0)" |
219 | .\" Strictly speaking, since Linux 2.5.x | |
f065673c MK |
220 | This operation wakes at most |
221 | .I val | |
222 | processes waiting (i.e., inside | |
223 | .BR FUTEX_WAIT ) | |
224 | on the futex at the address | |
225 | .IR uaddr . | |
226 | Most commonly, | |
227 | .I val | |
228 | is specified as either 1 (wake up a single waiter) or | |
229 | .BR INT_MAX | |
230 | (wake up all waiters). | |
730bfbda MK |
231 | .\" FIXME Please confirm that the following is correct: |
232 | No guarantee is provided about which waiters are awoken | |
233 | (e.g., a waiter with a higher scheduling priority is not guaranteed | |
234 | to be awoken in preference to a waiter with a lower priority). | |
4798a7f3 | 235 | |
fea681da MK |
236 | The arguments |
237 | .IR timeout , | |
c8b921bd | 238 | .IR uaddr2 , |
fea681da MK |
239 | and |
240 | .I val3 | |
241 | are ignored. | |
242 | ||
243 | For | |
a8bda636 | 244 | .BR futex (7), |
fea681da MK |
245 | this is executed if incrementing |
246 | the count showed that there were waiters, once the futex value has been set | |
247 | to 1 (indicating that it is available). | |
a7c2bf45 MK |
248 | .TP |
249 | .BR FUTEX_FD " (from Linux 2.6.0 up to and including Linux 2.6.25)" | |
250 | .\" Strictly speaking, from Linux 2.5.x to 2.6.25 | |
251 | This operation creates a file descriptor that is associated with the futex at | |
252 | .IR uaddr . | |
253 | .\" , suitable for .BR poll (2). | |
254 | The calling process must close the returned file descriptor after use. | |
255 | When another process performs a | |
256 | .BR FUTEX_WAKE | |
257 | on the futex, the file descriptor indicates as being readable with | |
258 | .BR select (2), | |
259 | .BR poll (2), | |
260 | and | |
261 | .BR epoll (7) | |
262 | ||
263 | The file descriptor can be used to obtain asynchronous notifications: | |
264 | if | |
265 | .I val | |
266 | is nonzero, then when another process executes a | |
267 | .BR FUTEX_WAKE , | |
268 | the caller will receive the signal number that was passed in | |
269 | .IR val . | |
270 | ||
271 | The arguments | |
272 | .IR timeout , | |
273 | .I uaddr2 | |
274 | and | |
275 | .I val3 | |
276 | are ignored. | |
277 | ||
278 | To prevent race conditions, the caller should test if the futex has | |
279 | been upped after | |
280 | .B FUTEX_FD | |
281 | returns. | |
282 | ||
283 | Because it was inherently racy, | |
284 | .B FUTEX_FD | |
285 | has been removed | |
286 | .\" commit 82af7aca56c67061420d618cc5a30f0fd4106b80 | |
287 | from Linux 2.6.26 onward. | |
288 | .TP | |
289 | .BR FUTEX_REQUEUE " (since Linux 2.6.0)" | |
290 | .\" Strictly speaking: from Linux 2.5.70 | |
291 | .\" | |
292 | .\" FIXME I added this warning. Okay? | |
293 | .IR "Avoid using this operation" . | |
294 | It is broken (unavoidably racy) for its intended purpose. | |
295 | Use | |
296 | .BR FUTEX_CMP_REQUEUE | |
297 | instead. | |
298 | ||
299 | This operation performs the same task as | |
300 | .BR FUTEX_CMP_REQUEUE , | |
301 | except that no check is made using the value in | |
302 | .IR val3 . | |
303 | (The argument | |
304 | .I val3 | |
305 | is ignored.) | |
306 | .TP | |
307 | .BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)" | |
308 | This operation was added as a replacement for the earlier | |
309 | .BR FUTEX_REQUEUE , | |
310 | because that operation was racy for its intended use. | |
311 | ||
312 | As with | |
313 | .BR FUTEX_REQUEUE , | |
314 | the | |
315 | .BR FUTEX_CMP_REQUEUE | |
316 | operation is used to avoid a "thundering herd" effect when | |
317 | .B FUTEX_WAKE | |
318 | is used and all processes woken up need to acquire another futex. | |
319 | It differs from | |
320 | .BR FUTEX_REQUEUE | |
321 | in that it first checks whether the location | |
322 | .I uaddr | |
323 | still contains the value | |
324 | .IR val3 . | |
325 | If not, the operation fails with the error | |
326 | .BR EAGAIN . | |
327 | .\" FIXME I added the following sentence on rational for FUTEX_CMP_REQUEUE. | |
328 | .\" Is it correct? SHould it be expanded? | |
329 | This additional feature of | |
330 | .BR FUTEX_CMP_REQUEUE | |
331 | can be used by the caller to (atomically) detect changes | |
332 | in the value of the target futex at | |
333 | .IR uaddr2 . | |
334 | ||
335 | The operation wakes up a maximum of | |
336 | .I val | |
337 | waiters that are waiting on the futex at | |
338 | .IR uaddr . | |
339 | If there are more than | |
340 | .I val | |
341 | waiters, then the remaining waiters are removed | |
342 | from the wait queue of the source futex at | |
343 | .I uaddr | |
344 | and added to the wait queue of the target futex at | |
345 | .IR uaddr2 . | |
346 | The | |
347 | .I timeout | |
348 | argument is (ab)used to specify a cap on the number of waiters | |
349 | that are requeued to the futex at | |
350 | .IR uaddr2 ; | |
351 | the kernel casts the | |
352 | .I timeout | |
353 | value to | |
354 | .IR u32 . | |
355 | ||
356 | .\" FIXME Please review the following new paragraph to see if it is | |
357 | .\" accurate. | |
358 | Typical values to specify for | |
359 | .I val | |
360 | are 0 or or 1. | |
361 | (Specifying | |
362 | .BR INT_MAX | |
363 | is not useful, because it would make the | |
364 | .BR FUTEX_CMP_REQUEUE | |
365 | operation equivalent to | |
366 | .BR FUTEX_WAKE .) | |
367 | The cap value specified via the (abused) | |
368 | .I timeout | |
369 | argument is typically either 1 or | |
370 | .BR INT_MAX . | |
371 | (Specifying the argument as 0 is not useful, because it would make the | |
372 | .BR FUTEX_CMP_REQUEUE | |
373 | operation equivalent to | |
374 | .BR FUTEX_WAIT .) | |
6bac3b85 MK |
375 | .\" |
376 | .\" FIXME I added some FUTEX_WAKE_OP text, and I'd be happy if someone | |
377 | .\" checked it. | |
fea681da | 378 | .TP |
d67e21f5 MK |
379 | .BR FUTEX_WAKE_OP " (since Linux 2.6.14)" |
380 | .\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721 | |
6bac3b85 MK |
381 | .\" Author: Jakub Jelinek <jakub@redhat.com> |
382 | .\" Date: Tue Sep 6 15:16:25 2005 -0700 | |
383 | This operation was added to support some user-space use cases | |
384 | where more than one futex must be handled at the same time. | |
385 | The most notable example is the implementation of | |
386 | .BR pthread_cond_signal (3), | |
387 | which requires operations on two futexes, | |
388 | the one used to implement the mutex and the one used in the implementation | |
389 | of the wait queue associated with the condition variable. | |
390 | .BR FUTEX_WAKE_OP | |
391 | allows such cases to be implemented without leading to | |
392 | high rates of contention and context switching. | |
393 | ||
394 | The | |
395 | .BR FUTEX_WAIT_OP | |
396 | operation is equivalent to atomically executing the following code: | |
397 | ||
398 | .in +4n | |
399 | .nf | |
400 | int oldval = *(int *) uaddr2; | |
401 | *(int *) uaddr2 = oldval \fIop\fP \fIoparg\fP; | |
402 | futex(uaddr, FUTEX_WAKE, val, 0, 0, 0); | |
403 | if (oldval \fIcmp\fP \fIcmparg\fP) | |
404 | futex(uaddr2, FUTEX_WAKE, nr_wake2, 0, 0, 0); | |
405 | .fi | |
406 | .in | |
407 | ||
408 | In other words, | |
409 | .BR FUTEX_WAIT_OP | |
410 | does the following: | |
411 | .RS | |
412 | .IP * 3 | |
413 | saves the original value of the futex at | |
414 | .IR uaddr2 ; | |
415 | .IP * | |
416 | performs an operation to modify the value of the futex at | |
417 | .IR uaddr2 ; | |
418 | .IP * | |
419 | wakes up a maximum of | |
420 | .I val | |
421 | waiters on the futex | |
422 | .IR uaddr ; | |
423 | and | |
424 | .IP * | |
425 | dependent on the results of a test of the original value of the futex at | |
426 | .IR uaddr2 , | |
427 | wakes up a maximum of | |
428 | .I nr_wake2 | |
429 | waiters on the futex | |
430 | .IR uaddr2 . | |
431 | .RE | |
432 | .IP | |
433 | The | |
434 | .I nr_wake2 | |
435 | value is actually the | |
436 | .BR futex () | |
437 | .I timeout | |
438 | argument (ab)used to specify how many of the waiters on the futex at | |
439 | .IR uaddr2 | |
440 | are to be woken up; | |
441 | the kernel casts the | |
442 | .I timeout | |
443 | value to | |
444 | .IR u32 . | |
445 | ||
446 | The operation and comparison that are to be performed are encoded | |
447 | in the bits of the argument | |
448 | .IR val3 . | |
449 | Pictorially, the encoding is: | |
450 | ||
f6af90e7 | 451 | .in +8n |
6bac3b85 | 452 | .nf |
f6af90e7 MK |
453 | +---+---+-----------+-----------+ |
454 | |op |cmp| oparg | cmparg | | |
455 | +---+---+-----------+-----------+ | |
456 | 4 4 12 12 <== # of bits | |
6bac3b85 MK |
457 | .fi |
458 | .in | |
459 | ||
460 | Expressed in code, the encoding is: | |
461 | ||
462 | .in +4n | |
463 | .nf | |
464 | #define FUTEX_OP(op, oparg, cmp, cmparg) \\ | |
465 | (((op & 0xf) << 28) | \\ | |
466 | ((cmp & 0xf) << 24) | \\ | |
467 | ((oparg & 0xfff) << 12) | \\ | |
468 | (cmparg & 0xfff)) | |
469 | .fi | |
470 | .in | |
471 | ||
472 | In the above, | |
473 | .I op | |
474 | and | |
475 | .I cmp | |
476 | are each one of the codes listed below. | |
477 | The | |
478 | .I oparg | |
479 | and | |
480 | .I cmparg | |
481 | components are literal numeric values, except as noted below. | |
482 | ||
483 | The | |
484 | .I op | |
485 | component has one of the following values: | |
486 | ||
487 | .in +4n | |
488 | .nf | |
489 | FUTEX_OP_SET 0 /* uaddr2 = oparg; */ | |
490 | FUTEX_OP_ADD 1 /* uaddr2 += oparg; */ | |
491 | FUTEX_OP_OR 2 /* uaddr2 |= oparg; */ | |
492 | FUTEX_OP_ANDN 3 /* uaddr2 &= ~oparg; */ | |
493 | FUTEX_OP_XOR 4 /* uaddr2 ^= oparg; */ | |
494 | .fi | |
495 | .in | |
496 | ||
497 | In addition, bit-wise ORing the following value into | |
498 | .I op | |
499 | causes | |
500 | .IR "(1\ <<\ oparg)" | |
501 | to be used as the operand: | |
502 | ||
503 | .in +4n | |
504 | .nf | |
505 | FUTEX_OP_ARG_SHIFT 8 /* Use (1 << oparg) as operand */ | |
506 | .fi | |
507 | .in | |
508 | ||
509 | The | |
510 | .I cmp | |
511 | field is one of the following: | |
512 | ||
513 | .in +4n | |
514 | .nf | |
515 | FUTEX_OP_CMP_EQ 0 /* if (oldval == cmparg) wake */ | |
516 | FUTEX_OP_CMP_NE 1 /* if (oldval != cmparg) wake */ | |
517 | FUTEX_OP_CMP_LT 2 /* if (oldval < cmparg) wake */ | |
518 | FUTEX_OP_CMP_LE 3 /* if (oldval <= cmparg) wake */ | |
519 | FUTEX_OP_CMP_GT 4 /* if (oldval > cmparg) wake */ | |
520 | FUTEX_OP_CMP_GE 5 /* if (oldval >= cmparg) wake */ | |
521 | .fi | |
522 | .in | |
523 | ||
524 | The return value of | |
525 | .BR FUTEX_WAKE_OP | |
526 | is the sum of the number of waiters woken on the futex | |
527 | .IR uaddr | |
528 | plus the number of waiters woken on the futex | |
529 | .IR uaddr2 . | |
d67e21f5 | 530 | .TP |
79c9b436 TG |
531 | .BR FUTEX_WAIT_BITSET " (since Linux 2.6.25)" |
532 | .\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d | |
fd9e59d4 | 533 | This operation is like |
79c9b436 TG |
534 | .BR FUTEX_WAIT |
535 | except that | |
536 | .I val3 | |
537 | is used to provide a 32-bit bitset to the kernel. | |
538 | This bitset is stored in the kernel-internal state of the waiter. | |
539 | See the description of | |
540 | .BR FUTEX_WAKE_BITSET | |
541 | for further details. | |
542 | ||
fd9e59d4 MK |
543 | The |
544 | .BR FUTEX_WAIT_BITSET | |
545 | also interprets the | |
546 | .I timeout | |
547 | argument differently from | |
548 | .BR FUTEX_WAIT . | |
549 | See the discussion of | |
550 | .BR FUTEX_CLOCK_REALTIME , | |
551 | above. | |
552 | ||
79c9b436 TG |
553 | The |
554 | .I uaddr2 | |
555 | argument is ignored. | |
556 | .TP | |
d67e21f5 MK |
557 | .BR FUTEX_WAKE_BITSET " (since Linux 2.6.25)" |
558 | .\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d | |
55cc422d TG |
559 | This operation is the same as |
560 | .BR FUTEX_WAKE | |
561 | except that the | |
562 | .I val3 | |
563 | argument is used to provide a 32-bit bitset to the kernel. | |
98d769c0 MK |
564 | This bitset is used to select which waiters should be woken up. |
565 | The selection is done by a bit-wise AND of the "wake" bitset | |
566 | (i.e., the value in | |
567 | .IR val3 ) | |
568 | and the bitset which is stored in the kernel-internal | |
09cb4ce7 | 569 | state of the waiter (the "wait" bitset that is set using |
98d769c0 MK |
570 | .BR FUTEX_WAIT_BITSET ). |
571 | All of the waiters for which the result of the AND is nonzero are woken up; | |
572 | the remaining waiters are left sleeping. | |
573 | ||
e9d4496b MK |
574 | .\" FIXME please review this paragraph that I added |
575 | The effect of | |
576 | .BR FUTEX_WAIT_BITSET | |
577 | and | |
578 | .BR FUTEX_WAKE_BITSET | |
579 | is to allow selective wake-ups among multiple waiters that are waiting | |
580 | on the same futex; | |
581 | since a futex has a size of 32 bits, | |
582 | these operations provide 32 wakeup "channels". | |
583 | (The | |
584 | .BR FUTEX_WAIT | |
585 | and | |
586 | .BR FUTEX_WAKE | |
587 | operations correspond to | |
588 | .BR FUTEX_WAIT_BITSET | |
589 | and | |
590 | .BR FUTEX_WAKE_BITSET | |
591 | operations where the bitsets are all ones.) | |
09cb4ce7 | 592 | Note, however, that using this bitset multiplexing feature on a |
e9d4496b MK |
593 | futex is less efficient than simply using multiple futexes, |
594 | because employing bitset multiplexing requires the kernel | |
595 | to check all waiters on a futex, | |
596 | including those that are not interested in being woken up | |
597 | (i.e., they do not have the relevant bit set in their "wait" bitset). | |
598 | .\" According to http://locklessinc.com/articles/futex_cheat_sheet/: | |
599 | .\" | |
600 | .\" "The original reason for the addition of these extensions | |
601 | .\" was to improve the performance of pthread read-write locks | |
602 | .\" in glibc. However, the pthreads library no longer uses the | |
603 | .\" same locking algorithm, and these extensions are not used | |
604 | .\" without the bitset parameter being all ones. | |
605 | .\" | |
606 | .\" The page goes on to note that the FUTEX_WAIT_BITSET operation | |
607 | .\" is nevertheless used (with a bitset of all ones) in order to | |
608 | .\" obtain the absolute timeout functionality that is useful | |
609 | .\" for efficiently implementing Pthreads APIs (which use absolute | |
610 | .\" timeouts); FUTEX_WAIT provides only relative timeouts. | |
611 | ||
98d769c0 MK |
612 | The |
613 | .I uaddr2 | |
614 | and | |
615 | .I timeout | |
616 | arguments are ignored. | |
bd90a5f9 MK |
617 | .\" |
618 | .\" | |
619 | .SS Priority-inheritance futexes | |
b52e1cd4 MK |
620 | Linux supports priority-inheritance (PI) futexes in order to handle |
621 | priority-inversion problems that can be encountered with | |
622 | normal futex locks. | |
79d918c7 MK |
623 | .\" |
624 | .\" FIXME ===== Start of adapted Hart/Guniguntala text ===== | |
625 | .\" The following text is drawn from the Hart/Guniguntala paper, | |
626 | .\" but I have reworded some pieces significantly. Please check it. | |
627 | .\" | |
628 | The PI futex operations described below differ from the other | |
629 | futex operations in that they impose policy on the use of the futex value: | |
630 | .IP * 3 | |
7c16fbff | 631 | If the lock is unowned, the futex value shall be 0. |
79d918c7 MK |
632 | .IP * |
633 | If the lock is owned, the futex value shall be the thread ID (TID; see | |
634 | .BR gettid (2)) | |
635 | of the owning thread. | |
636 | .IP * | |
637 | .\" FIXME In the following line, I added "the lock is owned and". Okay? | |
638 | If the lock is owned and there are threads contending for the lock, | |
639 | then the | |
640 | .B FUTEX_WAITERS | |
641 | bit shall be set in the futex value; in other words, the futex value is: | |
642 | ||
643 | FUTEX_WAITERS | TID | |
644 | .PP | |
645 | With this policy in place, | |
646 | a user-space application can acquire an unowned | |
b52e1cd4 | 647 | lock or release an uncontended lock using a atomic |
79d918c7 | 648 | .\" FIXME In the following line, I added "user-space". Okay? |
b52e1cd4 MK |
649 | user-space instructions (e.g., |
650 | .I cmpxchg | |
651 | on the x86 architecture). | |
652 | Locking an unowned lock simply consists of setting | |
653 | the futex value to the caller's TID. | |
654 | Releasing an uncontended lock simply requires setting the futex value to 0. | |
655 | ||
656 | If a futex is currently owned (i.e., has a nonzero value), | |
657 | waiters must employ the | |
79d918c7 MK |
658 | .B FUTEX_LOCK_PI |
659 | operation to acquire the lock. | |
b52e1cd4 | 660 | If a lock is contended (i.e., the |
79d918c7 | 661 | .B FUTEX_WAITERS |
b52e1cd4 | 662 | bit is set in the futex value), the lock owner must employ the |
79d918c7 | 663 | .B FUTEX_UNLOCK_PI |
b52e1cd4 MK |
664 | operation to release the lock. |
665 | ||
79d918c7 MK |
666 | In the cases where callers are forced into the kernel |
667 | (i.e., required to perform a | |
668 | .BR futex () | |
669 | operation), | |
670 | they then deal directly with a so-called RT-mutex, | |
671 | a kernel locking mechanism which implements the required | |
672 | priority-inheritance semantics. | |
673 | After the RT-mutex is acquired, the futex value is updated accordingly, | |
674 | before the calling thread returns to user space. | |
675 | .\" FIXME ===== End of adapted Hart/Guniguntala text ===== | |
676 | ||
677 | It is important | |
678 | .\" FIXME We need some explanation here of why it is important to note this | |
679 | to note that the kernel will update the futex value prior | |
680 | to returning to user space. | |
681 | Unlike the other futex operations described above, | |
682 | the PI futex operations are designed | |
7c16fbff | 683 | for the implementation of very specific IPC mechanisms). |
fc57e6bb MK |
684 | .\" |
685 | .\" FIXME We don't quite have a definition anywhere of what a PI futex | |
686 | .\" is (vs a non-PI futex). Below, we have the information of | |
687 | .\" FUTEX_CMP_REQUEUE_PI requeues from a non-PI futex to a | |
688 | .\" PI futex, but what determines whether the futex is of one | |
689 | .\" kind of the other? We should have such a definition somewhere | |
690 | .\" about here. | |
bd90a5f9 MK |
691 | |
692 | PI futexes are operated on by specifying one of the following values in | |
693 | .IR futex_op : | |
d67e21f5 MK |
694 | .TP |
695 | .BR FUTEX_LOCK_PI " (since Linux 2.6.18)" | |
696 | .\" commit c87e2837be82df479a6bae9f155c43516d2feebc | |
67833bec MK |
697 | .\" |
698 | .\" FIXME I did some significant rewording of tglx's text. | |
699 | .\" Please check, in case I injected errors. | |
700 | .\" | |
701 | This operation is used after after an attempt to acquire | |
702 | the futex lock via an atomic user-space instruction failed | |
703 | because the futex has a nonzero value\(emspecifically, | |
704 | because it contained the namespace-specific TID of the lock owner. | |
67259526 | 705 | .\" FIXME In the preceding line, what does "namespace-specific" mean? |
67833bec | 706 | .\" (I kept those words from tglx.) |
67259526 | 707 | .\" That is, what kind of namespace are we talking about? |
67833bec MK |
708 | .\" (I suppose we are talking PID namespaces here, but I want to |
709 | .\" be sure.) | |
710 | ||
711 | The operation checks the value of the futex at the address | |
712 | .IR uaddr . | |
713 | If the value is 0, then the kernel tries to atomically set the futex value to | |
714 | the caller's TID. | |
715 | If that fails, | |
716 | .\" FIXME What would be the cause of failure? | |
717 | or the futex value is nonzero, | |
718 | the kernel atomically sets the | |
e0547e70 | 719 | .B FUTEX_WAITERS |
67833bec MK |
720 | bit, which signals the futex owner that it cannot unlock the futex in |
721 | user space atomically by setting the futex value to 0. | |
722 | After that, the kernel tries to find the thread which is | |
723 | associated with the owner TID, | |
724 | .\" FIXME Could I get a bit more detail on the next two lines? | |
725 | .\" What is "creates or reuses kernel state" about? | |
726 | creates or reuses kernel state on behalf of the owner | |
727 | and attaches the waiter to it. | |
67259526 MK |
728 | .\" FIXME In the next line, what type of "priority" are we talking about? |
729 | .\" Realtime priorities for SCHED_FIFO and SCHED_RR? | |
730 | .\" Or something else? | |
e0547e70 TG |
731 | The enqueing of the waiter is in descending priority order if more |
732 | than one waiter exists. | |
67259526 | 733 | .\" FIXME What does "bandwidth" refer to in the next line? |
e0547e70 | 734 | The owner inherits either the priority or the bandwidth of the waiter. |
67259526 MK |
735 | .\" FIXME In the preceding line, what determines whether the |
736 | .\" owner inherits the priority versus the bandwidth? | |
67833bec MK |
737 | .\" |
738 | .\" FIXME Could I get some help translating the next sentence into | |
739 | .\" something that user-space developers (and I) can understand? | |
740 | .\" In particular, what are "nexted locks" in this context? | |
e0547e70 TG |
741 | This inheritance follows the lock chain in the case of |
742 | nested locking and performs deadlock detection. | |
743 | ||
9ce19cf1 MK |
744 | .\" FIXME tglx says "The timeout argument is handled as described in |
745 | .\" FUTEX_WAIT." However, it appears to me that this is not right. | |
746 | .\" Is the following formulation correct. | |
e0547e70 TG |
747 | The |
748 | .I timeout | |
9ce19cf1 MK |
749 | argument provides a timeout for the lock attempt. |
750 | It is interpreted as an absolute time, measured against the | |
751 | .BR CLOCK_REALTIME | |
752 | clock. | |
753 | If | |
754 | .I timeout | |
755 | is NULL, the operation will block indefinitely. | |
e0547e70 | 756 | |
a449c634 | 757 | The |
e0547e70 TG |
758 | .IR uaddr2 , |
759 | .IR val , | |
760 | and | |
761 | .IR val3 | |
a449c634 | 762 | arguments are ignored. |
fedaeaf3 | 763 | .\" FIXME |
a9dcb4d1 | 764 | .\" tglx noted the following "ERROR" case for FUTEX_LOCK_PI and |
670b34f8 MK |
765 | .\" FUTEX_TRYLOCK_PI and FUTEX_WAIT_REQUEUE_PI: |
766 | .\" | |
a9dcb4d1 MK |
767 | .\" > [EOWNERDIED] The owner of the futex died and the kernel made the |
768 | .\" > caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit | |
769 | .\" > in the futex userspace value. Caller is responsible for cleanup | |
fedaeaf3 | 770 | .\" |
a9dcb4d1 | 771 | .\" However, there is no such thing as an EOWNERDIED error. I had a look |
fedaeaf3 MK |
772 | .\" through the kernel source for the FUTEX_OWNER_DIED cases and didn't |
773 | .\" see an obvious error associated with them. Can you clarify? (I think | |
774 | .\" the point is that this condition, which is described in | |
775 | .\" Documentation/robust-futexes.txt, is not an error as such. However, | |
776 | .\" I'm not yet sure of how to describe it in the man page.) | |
670b34f8 | 777 | .\" Suggestions please! |
67833bec | 778 | .\" |
d67e21f5 | 779 | .TP |
12fdbe23 | 780 | .BR FUTEX_TRYLOCK_PI " (since Linux 2.6.18)" |
d67e21f5 | 781 | .\" commit c87e2837be82df479a6bae9f155c43516d2feebc |
12fdbe23 MK |
782 | This operation tries to acquire the futex at |
783 | .IR uaddr . | |
0b761826 MK |
784 | .\" FIXME I think it would be helpful here to say a few more words about |
785 | .\" the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI | |
fa0388c3 | 786 | It deals with the situation where the TID value at |
12fdbe23 MK |
787 | .I uaddr |
788 | is 0, but the | |
b52e1cd4 | 789 | .B FUTEX_WAITERS |
12fdbe23 | 790 | bit is set. |
fa0388c3 MK |
791 | .\" FIXME How does the situation in the previous sentence come about? |
792 | .\" Probably it would be helpful to say something about that in | |
793 | .\" the man page. | |
badbf70c | 794 | .\" FIXME And *how* does FUTEX_TRYLOCK_PI deal with this situation? |
12fdbe23 | 795 | User space cannot handle this race free. |
084744ef MK |
796 | |
797 | The | |
798 | .IR uaddr2 , | |
799 | .IR val , | |
800 | .IR timeout , | |
801 | and | |
802 | .IR val3 | |
803 | arguments are ignored. | |
d67e21f5 | 804 | .TP |
12fdbe23 | 805 | .BR FUTEX_UNLOCK_PI " (since Linux 2.6.18)" |
d67e21f5 | 806 | .\" commit c87e2837be82df479a6bae9f155c43516d2feebc |
ecae2099 TG |
807 | This operation wakes the top priority waiter which is waiting in |
808 | .B FUTEX_LOCK_PI | |
809 | on the futex address provided by the | |
810 | .I uaddr | |
811 | argument. | |
812 | ||
813 | This is called when the user space value at | |
814 | .I uaddr | |
815 | cannot be changed atomically from a TID (of the owner) to 0. | |
816 | ||
817 | The | |
818 | .IR uaddr2 , | |
819 | .IR val , | |
820 | .IR timeout , | |
821 | and | |
822 | .IR val3 | |
11a194bf | 823 | arguments are ignored. |
d67e21f5 | 824 | .TP |
d67e21f5 MK |
825 | .BR FUTEX_CMP_REQUEUE_PI " (since Linux 2.6.31)" |
826 | .\" commit 52400ba946759af28442dee6265c5c0180ac7122 | |
827 | .\" FIXME to complete | |
f812a08b DH |
828 | This operation is a PI-aware variant of |
829 | .BR FUTEX_CMP_REQUEUE . | |
830 | It requeues waiters that are blocked via | |
831 | .B FUTEX_WAIT_REQUEUE_PI | |
832 | on | |
833 | .I uaddr | |
834 | from a non-PI source futex | |
835 | .RI ( uaddr ) | |
836 | to a PI target futex | |
837 | .RI ( uaddr2 ). | |
838 | ||
9e54d26d MK |
839 | As with |
840 | .BR FUTEX_CMP_REQUEUE , | |
841 | this operation wakes up a maximum of | |
842 | .I val | |
843 | waiters that are waiting on the futex at | |
844 | .IR uaddr . | |
845 | However, for | |
846 | .BR FUTEX_CMP_REQUEUE_PI , | |
847 | .I val | |
848 | is required to be 1. | |
849 | The remaining waiters are removed from the wait queue of the source futex at | |
850 | .I uaddr | |
851 | and added to the wait queue of the target futex at | |
852 | .IR uaddr2 . | |
f812a08b | 853 | |
9e54d26d MK |
854 | The |
855 | .I val3 | |
856 | and | |
857 | .I timeout | |
858 | arguments serve the same purposes as for | |
859 | .BR FUTEX_CMP_REQUEUE . | |
be376673 MK |
860 | .\" FIXME The page at http://locklessinc.com/articles/futex_cheat_sheet/ |
861 | .\" notes that "priority-inheritance Futex to priority-inheritance | |
862 | .\" Futex requeues are currently unsupported". Do we need to say | |
863 | .\" something in the man page about that? | |
d67e21f5 MK |
864 | .TP |
865 | .BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)" | |
866 | .\" commit 52400ba946759af28442dee6265c5c0180ac7122 | |
867 | .\" FIXME to complete | |
dd218aaa MK |
868 | .\" |
869 | .\" FIXME Employs 'timeout' argument, supports FUTEX_CLOCK_REALTIME | |
870 | .\" 'timeout' can be NULL | |
871 | .\" | |
d67e21f5 | 872 | [As yet undocumented] |
47297adb | 873 | .SH RETURN VALUE |
fea681da | 874 | .PP |
6f147f79 | 875 | In the event of an error, all operations return \-1 and set |
e808bba0 | 876 | .I errno |
6f147f79 | 877 | to indicate the cause of the error. |
e808bba0 MK |
878 | The return value on success depends on the operation, |
879 | as described in the following list: | |
fea681da MK |
880 | .TP |
881 | .B FUTEX_WAIT | |
682edefb MK |
882 | Returns 0 if the process was woken by a |
883 | .B FUTEX_WAKE | |
7446a837 MK |
884 | or |
885 | .B FUTEX_WAKE_BITSET | |
682edefb | 886 | call. |
fea681da MK |
887 | .TP |
888 | .B FUTEX_WAKE | |
889 | Returns the number of processes woken up. | |
890 | .TP | |
891 | .B FUTEX_FD | |
892 | Returns the new file descriptor associated with the futex. | |
893 | .TP | |
894 | .B FUTEX_REQUEUE | |
895 | Returns the number of processes woken up. | |
896 | .TP | |
897 | .B FUTEX_CMP_REQUEUE | |
3dfcc11d MK |
898 | Returns the total number of processes woken up or requeued to the futex at |
899 | .IR uaddr2 . | |
900 | If this value is greater than | |
901 | .IR val , | |
902 | then difference is the number of waiters requeued to the futex at | |
903 | .IR uaddr2 . | |
519f2c3d MK |
904 | .\" |
905 | .\" FIXME Add success returns for other operations | |
dcad19c0 MK |
906 | .TP |
907 | .B FUTEX_WAKE_OP | |
a8b5b324 MK |
908 | .\" FIXME Is the following correct? |
909 | Returns the total number of waiters that were woken up. | |
910 | This is the sum of the woken waiters on the two futexes at | |
911 | .I uaddr | |
912 | and | |
913 | .IR uaddr2 . | |
dcad19c0 MK |
914 | .TP |
915 | .B FUTEX_WAIT_BITSET | |
7bcc5351 MK |
916 | .\" FIXME Is the following correct? |
917 | Returns 0 if the process was woken by a | |
918 | .B FUTEX_WAKE | |
919 | or | |
920 | .B FUTEX_WAKE_BITSET | |
921 | call. | |
dcad19c0 MK |
922 | .TP |
923 | .B FUTEX_WAKE_BITSET | |
b884566b MK |
924 | .\" FIXME Is the following correct? |
925 | Returns the number of processes woken up. | |
dcad19c0 MK |
926 | .TP |
927 | .B FUTEX_LOCK_PI | |
bf02a260 MK |
928 | .\" FIXME Is the following correct? |
929 | Returns 0 if the futex was successfully locked. | |
dcad19c0 MK |
930 | .TP |
931 | .B FUTEX_TRYLOCK_PI | |
5c716eef MK |
932 | .\" FIXME Is the following correct? |
933 | Returns 0 if the futex was successfully locked. | |
dcad19c0 MK |
934 | .TP |
935 | .B FUTEX_UNLOCK_PI | |
52bb928f MK |
936 | .\" FIXME Is the following correct? |
937 | Returns 0 if the futex was successfully unlocked. | |
dcad19c0 MK |
938 | .TP |
939 | .B FUTEX_CMP_REQUEUE_PI | |
dddd395a MK |
940 | .\" FIXME Is the following correct? |
941 | Returns the total number of processes woken up or requeued to the futex at | |
942 | .IR uaddr2 . | |
943 | If this value is greater than | |
944 | .IR val , | |
945 | then difference is the number of waiters requeued to the futex at | |
946 | .IR uaddr2 . | |
dcad19c0 MK |
947 | .TP |
948 | .B FUTEX_WAIT_REQUEUE_PI | |
22c15de9 MK |
949 | .\" FIXME Is the following correct? |
950 | Returns 0 if the caller was successfully requeued to the futex at | |
951 | .IR uaddr2 . | |
fea681da MK |
952 | .SH ERRORS |
953 | .TP | |
954 | .B EACCES | |
955 | No read access to futex memory. | |
956 | .TP | |
957 | .B EAGAIN | |
badbf70c MK |
958 | .RB ( FUTEX_WAIT ) |
959 | The value pointed to by | |
960 | .I uaddr | |
961 | was not equal to the expected value | |
962 | .I val | |
963 | at the time of the call. | |
964 | .TP | |
965 | .B EAGAIN | |
8f2068bb MK |
966 | .RB ( FUTEX_CMP_REQUEUE , |
967 | .BR FUTEX_CMP_REQUEUE_PI ) | |
ce5602fd | 968 | The value pointed to by |
9f6c40c0 МК |
969 | .I uaddr |
970 | is not equal to the expected value | |
971 | .IR val3 . | |
fd1dc4c2 | 972 | .\" FIXME: Is the following sentence correct? |
fea681da | 973 | (This probably indicates a race; |
682edefb MK |
974 | use the safe |
975 | .B FUTEX_WAKE | |
976 | now.) | |
c0091dd3 MK |
977 | .\" |
978 | .\" FIXME Should there be an EAGAIN case for FUTEX_TRYLOCK_PI? | |
979 | .\" It seems so, looking at the handling of the rt_mutex_trylock() | |
980 | .\" call in futex_lock_pi() | |
981 | .\" | |
fea681da | 982 | .TP |
5662f56a MK |
983 | .BR EAGAIN |
984 | .RB ( FUTEX_LOCK_PI , | |
aaec9032 MK |
985 | .BR FUTEX_TRYLOCK_PI , |
986 | .BR FUTEX_CMP_REQUEUE_PI ) | |
987 | The futex owner thread ID of | |
988 | .I uaddr | |
989 | (for | |
990 | .BR FUTEX_CMP_REQUEUE_PI : | |
991 | .IR uaddr2 ) | |
992 | is about to exit, | |
5662f56a MK |
993 | but has not yet handled the internal state cleanup. |
994 | Try again. | |
61f8c1d1 MK |
995 | .\" |
996 | .\" FIXME Is there not also an EAGAIN error case on 'uaddr2' for | |
997 | .\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via | |
998 | .\" futex_requeue() ==> futex_proxy_trylock_atomic() ==> | |
999 | .\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EAGAIN? | |
5662f56a | 1000 | .TP |
7a39e745 MK |
1001 | .BR EDEADLK |
1002 | .RB ( FUTEX_LOCK_PI , | |
1003 | .BR FUTEX_TRYLOCK_PI ) | |
1004 | The futex at | |
1005 | .I uaddr | |
1006 | is already locked by the caller. | |
d08ce5dd MK |
1007 | .\" |
1008 | .\" FIXME Is there not also an EDEADLK error case on 'uaddr2' for | |
1009 | .\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via | |
1010 | .\" futex_requeue() ==> futex_proxy_trylock_atomic() ==> | |
1011 | .\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EDEADLK? | |
7a39e745 | 1012 | .TP |
662c0da8 MK |
1013 | .BR EDEADLK |
1014 | .\" FIXME I reworded tglx's text somewhat; is the following okay? | |
1015 | .RB ( FUTEX_CMP_REQUEUE_PI ) | |
1016 | While requeueing a waiter to the PI futex at | |
1017 | .IR uaddr2 , | |
1018 | the kernel detected a deadlock. | |
1019 | .TP | |
fea681da | 1020 | .B EFAULT |
1ea901e8 MK |
1021 | A required pointer argument (i.e., |
1022 | .IR uaddr , | |
1023 | .IR uaddr2 , | |
1024 | or | |
1025 | .IR timeout ) | |
496df304 | 1026 | did not point to a valid user-space address. |
fea681da | 1027 | .TP |
9f6c40c0 | 1028 | .B EINTR |
e808bba0 | 1029 | A |
9f6c40c0 | 1030 | .B FUTEX_WAIT |
2674f781 MK |
1031 | or |
1032 | .B FUTEX_WAIT_BITSET | |
e808bba0 MK |
1033 | operation was interrupted by a signal (see |
1034 | .BR signal (7)) | |
1035 | or a spurious wakeup. | |
5eeca856 MK |
1036 | .\" FIXME |
1037 | .\" Regarding the words "spurious wakeup" above, I received this | |
1038 | .\" bug report from Rich Felker: | |
1039 | .\" | |
1040 | .\" I see no code in the kernel whereby a "spurious wakeup", or anything | |
1041 | .\" other than interruption by a signal handler that's not SA_RESTART, | |
1042 | .\" can cause futex to fail with EINTR. In general, overloading of EINTR | |
1043 | .\" and/or spurious EINTRs from a syscall make it impossible to use that | |
1044 | .\" syscall for implementing any function where EINTR is a mandatory | |
1045 | .\" failure on interruption-by-signal, since there is no way for | |
1046 | .\" userspace to distinguish whether the EINTR occurred as a result of | |
1047 | .\" an interrupting signal or some other reason. The kernel folks have | |
1048 | .\" gone to great lengths to fix spurious EINTRs (see signal(7) for | |
1049 | .\" history), especially by non-interrupting signal handlers, including | |
1050 | .\" in futex, and allowing EINTR here would be contrary to that goal. | |
1051 | .\" | |
1052 | .\" It's my belief that the "or a spurious wakeup" text should simply be | |
1053 | .\" removed. | |
1054 | .\" | |
1055 | .\" The reason I'm raising this topic is its relevance to a thread on | |
1056 | .\" libc-alpha: | |
1057 | .\" [RFC] mutex destruction (#13690): problem description and workarounds | |
1058 | .\" | |
1059 | .\" The bug and mailing list discussions to which Rich refers are: | |
1060 | .\" https://sourceware.org/bugzilla/show_bug.cgi?id=13690 | |
1061 | .\" https://sourceware.org/ml/libc-alpha/2014-12/threads.html#0001 | |
1062 | .\" | |
1063 | .\" Can anyone comment on whether the words "spurious wakeup" are correct? | |
1064 | .\" | |
9f6c40c0 | 1065 | .TP |
fea681da | 1066 | .B EINVAL |
180f97b7 MK |
1067 | The operation in |
1068 | .IR futex_op | |
1069 | is one of those that employs a timeout, but the supplied | |
fb2f4c27 MK |
1070 | .I timeout |
1071 | argument was invalid | |
1072 | .RI ( tv_sec | |
1073 | was less than zero, or | |
1074 | .IR tv_nsec | |
1075 | was not less than 1000,000,000). | |
1076 | .TP | |
1077 | .B EINVAL | |
0c74df0b MK |
1078 | The operation specified in |
1079 | .BR futex_op | |
1080 | employs one or both of the pointers | |
51ee94be | 1081 | .I uaddr |
a1f47699 | 1082 | and |
0c74df0b MK |
1083 | .IR uaddr2 , |
1084 | but one of these does not point to a valid object\(emthat is, | |
1085 | the address is not four-byte-aligned. | |
51ee94be MK |
1086 | .TP |
1087 | .B EINVAL | |
bae14b6c | 1088 | .RB ( FUTEX_WAKE , |
5447735d | 1089 | .BR FUTEX_WAKE_OP , |
98d769c0 | 1090 | .BR FUTEX_WAKE_BITSET , |
e169277f MK |
1091 | .BR FUTEX_REQUEUE , |
1092 | .BR FUTEX_CMP_REQUEUE ) | |
496df304 | 1093 | The kernel detected an inconsistency between the user-space state at |
9534086b TG |
1094 | .I uaddr |
1095 | and the kernel state\(emthat is, it detected a waiter which waits in | |
5447735d MK |
1096 | .BR FUTEX_LOCK_PI |
1097 | on | |
1098 | .IR uaddr . | |
9534086b TG |
1099 | .TP |
1100 | .B EINVAL | |
55cc422d TG |
1101 | .RB ( FUTEX_WAIT_BITSET , |
1102 | .BR FUTEX_WAKE_BITSET ) | |
79c9b436 TG |
1103 | The bitset supplied in |
1104 | .IR val3 | |
1105 | is zero. | |
1106 | .TP | |
1107 | .B EINVAL | |
2043f2c1 MK |
1108 | .RB ( FUTEX_REQUEUE , |
1109 | .\" FIXME tglx suggested adding this, but does this error really occur for | |
1110 | .\" FUTEX_REQUEUE? (The case where it occurs for FUTEX_CMP_REQUEUE_PI | |
1111 | .\" is obvious at the start of futex_requeue().) | |
1112 | .BR FUTEX_CMP_REQUEUE_PI ) | |
add875c0 MK |
1113 | .I uaddr |
1114 | equals | |
1115 | .IR uaddr2 | |
1116 | (i.e., an attempt was made to requeue to the same futex). | |
1117 | .TP | |
ff597681 MK |
1118 | .BR EINVAL |
1119 | .RB ( FUTEX_FD ) | |
1120 | The signal number supplied in | |
1121 | .I val | |
1122 | is invalid. | |
1123 | .TP | |
6bac3b85 | 1124 | .B EINVAL |
a218ef20 | 1125 | .RB ( FUTEX_LOCK_PI , |
ce022f18 MK |
1126 | .BR FUTEX_TRYLOCK_PI , |
1127 | .BR FUTEX_UNLOCK_PI ) | |
a218ef20 MK |
1128 | The kernel detected an inconsistency between the user-space state at |
1129 | .I uaddr | |
1130 | and the kernel state. | |
ce022f18 MK |
1131 | This indicates either state corruption |
1132 | .\" FIXME tglx did not mention the "state corruption" for FUTEX_UNLOCK_PI. | |
1133 | .\" Does that case also apply for FUTEX_UNLOCK_PI? | |
1134 | or that the kernel found a waiter on | |
a218ef20 MK |
1135 | .I uaddr |
1136 | which is waiting via | |
1137 | .BR FUTEX_WAIT | |
1138 | or | |
1139 | .BR FUTEX_WAIT_BITSET . | |
1140 | .TP | |
1141 | .B EINVAL | |
4832b48a | 1142 | Invalid argument. |
fea681da | 1143 | .TP |
a449c634 MK |
1144 | .BR ENOMEM |
1145 | .RB ( FUTEX_LOCK_PI , | |
e34a8fb6 MK |
1146 | .BR FUTEX_TRYLOCK_PI , |
1147 | .BR FUTEX_CMP_REQUEUE_PI ) | |
a449c634 MK |
1148 | The kernel could not allocate memory to hold state information. |
1149 | .TP | |
fea681da | 1150 | .B ENFILE |
ff597681 | 1151 | .RB ( FUTEX_FD ) |
fea681da | 1152 | The system limit on the total number of open files has been reached. |
4701fc28 MK |
1153 | .TP |
1154 | .B ENOSYS | |
1155 | Invalid operation specified in | |
d33602c4 | 1156 | .IR futex_op . |
9f6c40c0 | 1157 | .TP |
4a7e5b05 MK |
1158 | .B ENOSYS |
1159 | The | |
1160 | .BR FUTEX_CLOCK_REALTIME | |
1161 | option was specified in | |
1afcee7c | 1162 | .IR futex_op , |
4a7e5b05 MK |
1163 | but the accompanying operation was neither |
1164 | .BR FUTEX_WAIT_BITSET | |
1165 | nor | |
1166 | .BR FUTEX_WAIT_REQUEUE_PI . | |
1167 | .TP | |
a9dcb4d1 MK |
1168 | .BR ENOSYS |
1169 | .RB ( FUTEX_LOCK_PI , | |
f2424fae | 1170 | .BR FUTEX_TRYLOCK_PI , |
4945ff19 | 1171 | .BR FUTEX_UNLOCK_PI , |
794bb106 MK |
1172 | .BR FUTEX_CMP_REQUEUE_PI |
1173 | .BR FUTEX_WAIT_REQUEUE_PI ) | |
a9dcb4d1 | 1174 | A run-time check determined that the operation not available. |
a2ebebcd MK |
1175 | The PI futex operations are not implemented on all architectures and |
1176 | are not supported on some CPU variants. | |
a9dcb4d1 | 1177 | .TP |
c7589177 MK |
1178 | .BR EPERM |
1179 | .RB ( FUTEX_LOCK_PI , | |
dc2742a8 MK |
1180 | .BR FUTEX_TRYLOCK_PI , |
1181 | .BR FUTEX_CMP_REQUEUE_PI ) | |
04331c3f | 1182 | The caller is not allowed to attach itself to the futex at |
dc2742a8 MK |
1183 | .I uaddr |
1184 | (for | |
1185 | .BR FUTEX_CMP_REQUEUE_PI : | |
1186 | the futex at | |
1187 | .IR uaddr2 ). | |
c7589177 | 1188 | (This may be caused by a state corruption in user space.) |
61f8c1d1 MK |
1189 | .\" |
1190 | .\" FIXME Is there not also an EPERM error case on 'uaddr2' for | |
1191 | .\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via | |
1192 | .\" futex_requeue() ==> futex_proxy_trylock_atomic() ==> | |
1193 | .\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EPERM? | |
c7589177 | 1194 | .TP |
76f347ba | 1195 | .BR EPERM |
87276709 | 1196 | .RB ( FUTEX_UNLOCK_PI ) |
76f347ba MK |
1197 | The caller does not own the futex. |
1198 | .TP | |
0b0e4934 MK |
1199 | .BR ESRCH |
1200 | .RB ( FUTEX_LOCK_PI , | |
1201 | .BR FUTEX_TRYLOCK_PI ) | |
1202 | .\" FIXME I reworded the following sentence a bit differently from | |
1203 | .\" tglx's formulation. Is it okay? | |
1204 | The thread ID in the futex at | |
1205 | .I uaddr | |
1206 | does not exist. | |
61f8c1d1 MK |
1207 | .\" |
1208 | .\" FIXME Is there not also an ESRCH error case on 'uaddr2' for | |
1209 | .\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via | |
1210 | .\" futex_requeue() ==> futex_proxy_trylock_atomic() ==> | |
1211 | .\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> ESRCH? | |
0b0e4934 | 1212 | .TP |
360f773c MK |
1213 | .BR ESRCH |
1214 | .RB ( FUTEX_CMP_REQUEUE_PI ) | |
1215 | .\" FIXME I reworded the following sentence a bit differently from | |
1216 | .\" tglx's formulation. Is it okay? | |
1217 | The thread ID in the futex at | |
1218 | .I uaddr2 | |
1219 | does not exist. | |
1220 | .TP | |
9f6c40c0 | 1221 | .B ETIMEDOUT |
4d85047f MK |
1222 | The operation in |
1223 | .IR futex_op | |
1224 | employed the timeout specified in | |
1225 | .IR timeout , | |
1226 | and the timeout expired before the operation completed. | |
47297adb | 1227 | .SH VERSIONS |
a1d5f77c | 1228 | .PP |
81c9d87e MK |
1229 | Futexes were first made available in a stable kernel release |
1230 | with Linux 2.6.0. | |
1231 | ||
a1d5f77c MK |
1232 | Initial futex support was merged in Linux 2.5.7 but with different semantics |
1233 | from what was described above. | |
52dee70e | 1234 | A four-argument system call with the semantics |
fd3fa7ef | 1235 | described in this page was introduced in Linux 2.5.40. |
11b520ed | 1236 | In Linux 2.5.70, one argument |
a1d5f77c | 1237 | was added. |
11b520ed | 1238 | In Linux 2.6.7, a sixth argument was added\(emmessy, especially |
a1d5f77c | 1239 | on the s390 architecture. |
47297adb | 1240 | .SH CONFORMING TO |
8382f16d | 1241 | This system call is Linux-specific. |
47297adb | 1242 | .SH NOTES |
fea681da | 1243 | .PP |
fcdad7d6 | 1244 | To reiterate, bare futexes are not intended as an easy-to-use abstraction |
c13182ef | 1245 | for end-users. |
fcdad7d6 | 1246 | (There is no wrapper function for this system call in glibc.) |
c13182ef | 1247 | Implementors are expected to be assembly literate and to have |
7fac88a9 | 1248 | read the sources of the futex user-space library referenced below. |
d282bb24 | 1249 | .\" .SH AUTHORS |
fea681da MK |
1250 | .\" .PP |
1251 | .\" Futexes were designed and worked on by | |
1252 | .\" Hubertus Franke (IBM Thomas J. Watson Research Center), | |
1253 | .\" Matthew Kirkwood, Ingo Molnar (Red Hat) | |
1254 | .\" and Rusty Russell (IBM Linux Technology Center). | |
1255 | .\" This page written by bert hubert. | |
47297adb | 1256 | .SH SEE ALSO |
9913033c | 1257 | .BR get_robust_list (2), |
d806bc05 | 1258 | .BR restart_syscall (2), |
14d8dd3b | 1259 | .BR futex (7) |
fea681da | 1260 | .PP |
f5ad572f MK |
1261 | The following kernel source files: |
1262 | .IP * 2 | |
1263 | .I Documentation/pi-futex.txt | |
1264 | .IP * | |
1265 | .I Documentation/futex-requeue-pi.txt | |
1266 | .IP * | |
1267 | .I Documentation/locking/rt-mutex.txt | |
1268 | .IP * | |
1269 | .I Documentation/locking/rt-mutex-design.txt | |
43b99089 | 1270 | .PP |
52087dd3 | 1271 | \fIFuss, Futexes and Furwocks: Fast Userlevel Locking in Linux\fP |
9b936e9e MK |
1272 | (proceedings of the Ottawa Linux Symposium 2002), online at |
1273 | .br | |
608bf950 SK |
1274 | .UR http://kernel.org\:/doc\:/ols\:/2002\:/ols2002-pages-479-495.pdf |
1275 | .UE | |
f42eb21b | 1276 | |
2ed26199 MK |
1277 | \fIA futex overview and update\fP, 11 November 2009 |
1278 | .UR http://lwn.net/Articles/360699/ | |
1279 | .UE | |
1280 | ||
0483b6cc MK |
1281 | \fIRequeue-PI: Making Glibc Condvars PI-Aware\fP |
1282 | (2009 Real-Time Linux Workshop) | |
1283 | .UR http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf | |
1284 | .UE | |
1285 | ||
f42eb21b MK |
1286 | \fIFutexes Are Tricky\fP (updated in 2011), Ulrich Drepper |
1287 | .UR http://www.akkadia.org/drepper/futex.pdf | |
1288 | .UE | |
9b936e9e MK |
1289 | .PP |
1290 | Futex example library, futex-*.tar.bz2 at | |
1291 | .br | |
a605264d | 1292 | .UR ftp://ftp.kernel.org\:/pub\:/linux\:/kernel\:/people\:/rusty/ |
608bf950 | 1293 | .UE |
34f14794 MK |
1294 | .\" |
1295 | .\" FIXME Are there any other resources that should be listed | |
1296 | .\" in the SEE ALSO section? |