]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man2/futex.2
futex.2: ffix (ASCII art layout)
[thirdparty/man-pages.git] / man2 / futex.2
1 .\" Page by b.hubert
2 .\" and Copyright (C) 2015, Thomas Gleixner <tglx@linutronix.de>
3 .\" and Copyright (C) 2015, Michael Kerrisk <mtk.manpages@gmail.com>
4 .\"
5 .\" %%%LICENSE_START(FREELY_REDISTRIBUTABLE)
6 .\" may be freely modified and distributed
7 .\" %%%LICENSE_END
8 .\"
9 .\" Niki A. Rahimi (LTC Security Development, narahimi@us.ibm.com)
10 .\" added ERRORS section.
11 .\"
12 .\" Modified 2004-06-17 mtk
13 .\" Modified 2004-10-07 aeb, added FUTEX_REQUEUE, FUTEX_CMP_REQUEUE
14 .\"
15 .\" 2.6.31 adds FUTEX_WAIT_REQUEUE_PI, FUTEX_CMP_REQUEUE_PI
16 .\" commit 52400ba946759af28442dee6265c5c0180ac7122
17 .\" Author: Darren Hart <dvhltc@us.ibm.com>
18 .\" Date: Fri Apr 3 13:40:49 2009 -0700
19 .\"
20 .\" commit ba9c22f2c01cf5c88beed5a6b9e07d42e10bd358
21 .\" Author: Darren Hart <dvhltc@us.ibm.com>
22 .\" Date: Mon Apr 20 22:22:22 2009 -0700
23 .\"
24 .\" See Documentation/futex-requeue-pi.txt
25 .\"
26 .TH FUTEX 2 2014-05-21 "Linux" "Linux Programmer's Manual"
27 .SH NAME
28 futex \- fast user-space locking
29 .SH SYNOPSIS
30 .nf
31 .sp
32 .B "#include <linux/futex.h>"
33 .B "#include <sys/time.h>"
34 .sp
35 .BI "int futex(int *" uaddr ", int " futex_op ", int " val ,
36 .BI " const struct timespec *" timeout ,
37 .BI " int *" uaddr2 ", int " val3 );
38 .\" int *? void *? u32 *?
39 .fi
40
41 .IR Note :
42 There is no glibc wrapper for this system call; see NOTES.
43 .SH DESCRIPTION
44 .PP
45 The
46 .BR futex ()
47 system call provides a method for
48 a program to wait for a value at a given address to change, and a
49 method to wake up anyone waiting on a particular address (while the
50 addresses for the same memory in separate processes may not be
51 equal, the kernel maps them internally so the same memory mapped in
52 different locations will correspond for
53 .BR futex ()
54 calls).
55 This system call is typically used to
56 implement the contended case of a lock in shared memory, as
57 described in
58 .BR futex (7).
59 .PP
60 When a futex operation did not finish uncontended in user space, a
61 .BR futex ()
62 call needs to be made to the kernel to arbitrate.
63 Arbitration can either mean putting the calling
64 process to sleep or, conversely, waking a waiting process.
65 .PP
66 Callers of
67 .BR futex ()
68 are expected to adhere to the semantics described in
69 .BR futex (7).
70 As these
71 semantics involve writing nonportable assembly instructions, this in turn
72 probably means that most users will in fact be library authors and not
73 general application developers.
74 .PP
75 The
76 .I uaddr
77 argument points to an integer which stores the counter (futex).
78 On all platforms, futexes are four-byte integers that
79 must be aligned on a four-byte boundary.
80 The operation to perform on the futex is specified in the
81 .I futex_op
82 argument;
83 .IR val
84 is a value whose meaning and purpose depends on
85 .IR futex_op .
86
87 The remaining arguments
88 .RI ( timeout ,
89 .IR uaddr2 ,
90 and
91 .IR val3 )
92 are required only for certain of the futex operations described below.
93 Where one of these arguments is not required, it is ignored.
94 For several blocking operations, the
95 .I timeout
96 argument is a pointer to a
97 .IR timespec
98 structure that specifies a timeout for the operation.
99 However, notwithstanding the prototype shown above, for some operations,
100 this argument is instead a four-byte integer whose meaning
101 is determined by the operation.
102 Where it is required,
103 .IR uaddr2
104 is a pointer to a second futex that is employed by the operation.
105 The interpretation of the final integer argument,
106 .IR val3 ,
107 depends on the operation.
108
109 The
110 .I futex_op
111 argument consists of two parts:
112 a command that specifies the operation to be performed,
113 bit-wise ORed with zero or or more options that
114 modify the behaviour of the operation.
115 The options that may be included in
116 .I futex_op
117 are as follows:
118 .TP
119 .BR FUTEX_PRIVATE_FLAG " (since Linux 2.6.22)"
120 .\" commit 34f01cc1f512fa783302982776895c73714ebbc2
121 This option bit can be employed with all futex operations.
122 It tells the kernel that the futex is process private and not shared
123 with another process.
124 This allows the kernel to choose the fast path for validating
125 the user-space address and avoids expensive VMA lookups,
126 taking reference counts on file backing store, and so on.
127
128 As a convenience,
129 .IR <linux/futex.h>
130 defines a set of constants with the suffix
131 .BR _PRIVATE
132 that are equivalents of all of the operations listed below,
133 .\" except the obsolete FUTEX_FD, for which the "private" flag was
134 .\" meaningless
135 but with the
136 .BR FUTEX_PRIVATE_FLAG
137 ORed into the constant value.
138 Thus, there are
139 .BR FUTEX_WAIT_PRIVATE ,
140 .BR FUTEX_WAKE_PRIVATE ,
141 and so on.
142 .TP
143 .BR FUTEX_CLOCK_REALTIME " (since Linux 2.6.28)"
144 .\" commit 1acdac104668a0834cfa267de9946fac7764d486
145 This option bit can be employed only with the
146 .BR FUTEX_WAIT_BITSET
147 and
148 .BR FUTEX_WAIT_REQUEUE_PI
149 operations.
150
151 If this option is set, the kernel treats
152 .I timeout
153 as an absolute time based on
154 .BR CLOCK_REALTIME .
155
156 If this option is not set, the kernel treats
157 .I timeout
158 as relative time,
159 .\" FIXME I added CLOCK_MONOTONIC here. Is it correct?
160 measured against the
161 .BR CLOCK_MONOTONIC
162 clock.
163 .PP
164 The operation specified in
165 .I futex_op
166 is one of the following:
167 .TP
168 .BR FUTEX_WAIT " (since Linux 2.6.0)"
169 .\" Strictly speaking, since some time in 2.5.x
170 This operation tests that the value at the
171 location pointed to by the futex address
172 .I uaddr
173 still contains the value
174 .IR val ,
175 and then sleeps awaiting
176 .B FUTEX_WAKE
177 on the futex address.
178 The test and sleep steps are performed atomically.
179 If the futex value does not match
180 .IR val ,
181 then the call fails immediately with the error
182 .BR EAGAIN .
183 .\" FIXME I added the following sentence. Please confirm that it is correct.
184 The purpose of the test step is to detect races where
185 another process changes that value of the futex between
186 the time it was last checked and the time of the
187 .BR FUTEX_WAIT
188 operation.
189
190 If the
191 .I timeout
192 argument is non-NULL, its contents specify a relative timeout for the wait
193 .\" FIXME I added CLOCK_MONOTONIC here. Is it correct?
194 measured according to the
195 .BR CLOCK_MONOTONIC
196 clock.
197 (This interval will be rounded up to the system clock granularity,
198 and kernel scheduling delays mean that the
199 blocking interval may overrun by a small amount.)
200 If
201 .I timeout
202 is NULL, the call blocks indefinitely.
203
204 The arguments
205 .I uaddr2
206 and
207 .I val3
208 are ignored.
209
210 For
211 .BR futex (7),
212 this call is executed if decrementing the count gave a negative value
213 (indicating contention), and will sleep until another process releases
214 the futex and executes the
215 .B FUTEX_WAKE
216 operation.
217 .TP
218 .BR FUTEX_WAKE " (since Linux 2.6.0)"
219 .\" Strictly speaking, since Linux 2.5.x
220 This operation wakes at most
221 .I val
222 processes waiting (i.e., inside
223 .BR FUTEX_WAIT )
224 on the futex at the address
225 .IR uaddr .
226 Most commonly,
227 .I val
228 is specified as either 1 (wake up a single waiter) or
229 .BR INT_MAX
230 (wake up all waiters).
231 .\" FIXME Please confirm that the following is correct:
232 No guarantee is provided about which waiters are awoken
233 (e.g., a waiter with a higher scheduling priority is not guaranteed
234 to be awoken in preference to a waiter with a lower priority).
235
236 The arguments
237 .IR timeout ,
238 .IR uaddr2 ,
239 and
240 .I val3
241 are ignored.
242
243 For
244 .BR futex (7),
245 this is executed if incrementing
246 the count showed that there were waiters, once the futex value has been set
247 to 1 (indicating that it is available).
248 .TP
249 .BR FUTEX_FD " (from Linux 2.6.0 up to and including Linux 2.6.25)"
250 .\" Strictly speaking, from Linux 2.5.x to 2.6.25
251 This operation creates a file descriptor that is associated with the futex at
252 .IR uaddr .
253 .\" , suitable for .BR poll (2).
254 The calling process must close the returned file descriptor after use.
255 When another process performs a
256 .BR FUTEX_WAKE
257 on the futex, the file descriptor indicates as being readable with
258 .BR select (2),
259 .BR poll (2),
260 and
261 .BR epoll (7)
262
263 The file descriptor can be used to obtain asynchronous notifications:
264 if
265 .I val
266 is nonzero, then when another process executes a
267 .BR FUTEX_WAKE ,
268 the caller will receive the signal number that was passed in
269 .IR val .
270
271 The arguments
272 .IR timeout ,
273 .I uaddr2
274 and
275 .I val3
276 are ignored.
277
278 To prevent race conditions, the caller should test if the futex has
279 been upped after
280 .B FUTEX_FD
281 returns.
282
283 Because it was inherently racy,
284 .B FUTEX_FD
285 has been removed
286 .\" commit 82af7aca56c67061420d618cc5a30f0fd4106b80
287 from Linux 2.6.26 onward.
288 .TP
289 .BR FUTEX_REQUEUE " (since Linux 2.6.0)"
290 .\" Strictly speaking: from Linux 2.5.70
291 .\"
292 .\" FIXME I added this warning. Okay?
293 .IR "Avoid using this operation" .
294 It is broken (unavoidably racy) for its intended purpose.
295 Use
296 .BR FUTEX_CMP_REQUEUE
297 instead.
298
299 This operation performs the same task as
300 .BR FUTEX_CMP_REQUEUE ,
301 except that no check is made using the value in
302 .IR val3 .
303 (The argument
304 .I val3
305 is ignored.)
306 .TP
307 .BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)"
308 This operation was added as a replacement for the earlier
309 .BR FUTEX_REQUEUE ,
310 because that operation was racy for its intended use.
311
312 As with
313 .BR FUTEX_REQUEUE ,
314 the
315 .BR FUTEX_CMP_REQUEUE
316 operation is used to avoid a "thundering herd" effect when
317 .B FUTEX_WAKE
318 is used and all processes woken up need to acquire another futex.
319 It differs from
320 .BR FUTEX_REQUEUE
321 in that it first checks whether the location
322 .I uaddr
323 still contains the value
324 .IR val3 .
325 If not, the operation fails with the error
326 .BR EAGAIN .
327 .\" FIXME I added the following sentence on rational for FUTEX_CMP_REQUEUE.
328 .\" Is it correct? SHould it be expanded?
329 This additional feature of
330 .BR FUTEX_CMP_REQUEUE
331 can be used by the caller to (atomically) detect changes
332 in the value of the target futex at
333 .IR uaddr2 .
334
335 The operation wakes up a maximum of
336 .I val
337 waiters that are waiting on the futex at
338 .IR uaddr .
339 If there are more than
340 .I val
341 waiters, then the remaining waiters are removed
342 from the wait queue of the source futex at
343 .I uaddr
344 and added to the wait queue of the target futex at
345 .IR uaddr2 .
346 The
347 .I timeout
348 argument is (ab)used to specify a cap on the number of waiters
349 that are requeued to the futex at
350 .IR uaddr2 ;
351 the kernel casts the
352 .I timeout
353 value to
354 .IR u32 .
355
356 .\" FIXME Please review the following new paragraph to see if it is
357 .\" accurate.
358 Typical values to specify for
359 .I val
360 are 0 or or 1.
361 (Specifying
362 .BR INT_MAX
363 is not useful, because it would make the
364 .BR FUTEX_CMP_REQUEUE
365 operation equivalent to
366 .BR FUTEX_WAKE .)
367 The cap value specified via the (abused)
368 .I timeout
369 argument is typically either 1 or
370 .BR INT_MAX .
371 (Specifying the argument as 0 is not useful, because it would make the
372 .BR FUTEX_CMP_REQUEUE
373 operation equivalent to
374 .BR FUTEX_WAIT .)
375 .\"
376 .\" FIXME I added some FUTEX_WAKE_OP text, and I'd be happy if someone
377 .\" checked it.
378 .TP
379 .BR FUTEX_WAKE_OP " (since Linux 2.6.14)"
380 .\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721
381 .\" Author: Jakub Jelinek <jakub@redhat.com>
382 .\" Date: Tue Sep 6 15:16:25 2005 -0700
383 This operation was added to support some user-space use cases
384 where more than one futex must be handled at the same time.
385 The most notable example is the implementation of
386 .BR pthread_cond_signal (3),
387 which requires operations on two futexes,
388 the one used to implement the mutex and the one used in the implementation
389 of the wait queue associated with the condition variable.
390 .BR FUTEX_WAKE_OP
391 allows such cases to be implemented without leading to
392 high rates of contention and context switching.
393
394 The
395 .BR FUTEX_WAIT_OP
396 operation is equivalent to atomically executing the following code:
397
398 .in +4n
399 .nf
400 int oldval = *(int *) uaddr2;
401 *(int *) uaddr2 = oldval \fIop\fP \fIoparg\fP;
402 futex(uaddr, FUTEX_WAKE, val, 0, 0, 0);
403 if (oldval \fIcmp\fP \fIcmparg\fP)
404 futex(uaddr2, FUTEX_WAKE, nr_wake2, 0, 0, 0);
405 .fi
406 .in
407
408 In other words,
409 .BR FUTEX_WAIT_OP
410 does the following:
411 .RS
412 .IP * 3
413 saves the original value of the futex at
414 .IR uaddr2 ;
415 .IP *
416 performs an operation to modify the value of the futex at
417 .IR uaddr2 ;
418 .IP *
419 wakes up a maximum of
420 .I val
421 waiters on the futex
422 .IR uaddr ;
423 and
424 .IP *
425 dependent on the results of a test of the original value of the futex at
426 .IR uaddr2 ,
427 wakes up a maximum of
428 .I nr_wake2
429 waiters on the futex
430 .IR uaddr2 .
431 .RE
432 .IP
433 The
434 .I nr_wake2
435 value is actually the
436 .BR futex ()
437 .I timeout
438 argument (ab)used to specify how many of the waiters on the futex at
439 .IR uaddr2
440 are to be woken up;
441 the kernel casts the
442 .I timeout
443 value to
444 .IR u32 .
445
446 The operation and comparison that are to be performed are encoded
447 in the bits of the argument
448 .IR val3 .
449 Pictorially, the encoding is:
450
451 .in +8n
452 .nf
453 +---+---+-----------+-----------+
454 |op |cmp| oparg | cmparg |
455 +---+---+-----------+-----------+
456 4 4 12 12 <== # of bits
457 .fi
458 .in
459
460 Expressed in code, the encoding is:
461
462 .in +4n
463 .nf
464 #define FUTEX_OP(op, oparg, cmp, cmparg) \\
465 (((op & 0xf) << 28) | \\
466 ((cmp & 0xf) << 24) | \\
467 ((oparg & 0xfff) << 12) | \\
468 (cmparg & 0xfff))
469 .fi
470 .in
471
472 In the above,
473 .I op
474 and
475 .I cmp
476 are each one of the codes listed below.
477 The
478 .I oparg
479 and
480 .I cmparg
481 components are literal numeric values, except as noted below.
482
483 The
484 .I op
485 component has one of the following values:
486
487 .in +4n
488 .nf
489 FUTEX_OP_SET 0 /* uaddr2 = oparg; */
490 FUTEX_OP_ADD 1 /* uaddr2 += oparg; */
491 FUTEX_OP_OR 2 /* uaddr2 |= oparg; */
492 FUTEX_OP_ANDN 3 /* uaddr2 &= ~oparg; */
493 FUTEX_OP_XOR 4 /* uaddr2 ^= oparg; */
494 .fi
495 .in
496
497 In addition, bit-wise ORing the following value into
498 .I op
499 causes
500 .IR "(1\ <<\ oparg)"
501 to be used as the operand:
502
503 .in +4n
504 .nf
505 FUTEX_OP_ARG_SHIFT 8 /* Use (1 << oparg) as operand */
506 .fi
507 .in
508
509 The
510 .I cmp
511 field is one of the following:
512
513 .in +4n
514 .nf
515 FUTEX_OP_CMP_EQ 0 /* if (oldval == cmparg) wake */
516 FUTEX_OP_CMP_NE 1 /* if (oldval != cmparg) wake */
517 FUTEX_OP_CMP_LT 2 /* if (oldval < cmparg) wake */
518 FUTEX_OP_CMP_LE 3 /* if (oldval <= cmparg) wake */
519 FUTEX_OP_CMP_GT 4 /* if (oldval > cmparg) wake */
520 FUTEX_OP_CMP_GE 5 /* if (oldval >= cmparg) wake */
521 .fi
522 .in
523
524 The return value of
525 .BR FUTEX_WAKE_OP
526 is the sum of the number of waiters woken on the futex
527 .IR uaddr
528 plus the number of waiters woken on the futex
529 .IR uaddr2 .
530 .TP
531 .BR FUTEX_WAIT_BITSET " (since Linux 2.6.25)"
532 .\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
533 This operation is like
534 .BR FUTEX_WAIT
535 except that
536 .I val3
537 is used to provide a 32-bit bitset to the kernel.
538 This bitset is stored in the kernel-internal state of the waiter.
539 See the description of
540 .BR FUTEX_WAKE_BITSET
541 for further details.
542
543 The
544 .BR FUTEX_WAIT_BITSET
545 also interprets the
546 .I timeout
547 argument differently from
548 .BR FUTEX_WAIT .
549 See the discussion of
550 .BR FUTEX_CLOCK_REALTIME ,
551 above.
552
553 The
554 .I uaddr2
555 argument is ignored.
556 .TP
557 .BR FUTEX_WAKE_BITSET " (since Linux 2.6.25)"
558 .\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
559 This operation is the same as
560 .BR FUTEX_WAKE
561 except that the
562 .I val3
563 argument is used to provide a 32-bit bitset to the kernel.
564 This bitset is used to select which waiters should be woken up.
565 The selection is done by a bit-wise AND of the "wake" bitset
566 (i.e., the value in
567 .IR val3 )
568 and the bitset which is stored in the kernel-internal
569 state of the waiter (the "wait" bitset that is set using
570 .BR FUTEX_WAIT_BITSET ).
571 All of the waiters for which the result of the AND is nonzero are woken up;
572 the remaining waiters are left sleeping.
573
574 .\" FIXME please review this paragraph that I added
575 The effect of
576 .BR FUTEX_WAIT_BITSET
577 and
578 .BR FUTEX_WAKE_BITSET
579 is to allow selective wake-ups among multiple waiters that are waiting
580 on the same futex;
581 since a futex has a size of 32 bits,
582 these operations provide 32 wakeup "channels".
583 (The
584 .BR FUTEX_WAIT
585 and
586 .BR FUTEX_WAKE
587 operations correspond to
588 .BR FUTEX_WAIT_BITSET
589 and
590 .BR FUTEX_WAKE_BITSET
591 operations where the bitsets are all ones.)
592 Note, however, that using this bitset multiplexing feature on a
593 futex is less efficient than simply using multiple futexes,
594 because employing bitset multiplexing requires the kernel
595 to check all waiters on a futex,
596 including those that are not interested in being woken up
597 (i.e., they do not have the relevant bit set in their "wait" bitset).
598 .\" According to http://locklessinc.com/articles/futex_cheat_sheet/:
599 .\"
600 .\" "The original reason for the addition of these extensions
601 .\" was to improve the performance of pthread read-write locks
602 .\" in glibc. However, the pthreads library no longer uses the
603 .\" same locking algorithm, and these extensions are not used
604 .\" without the bitset parameter being all ones.
605 .\"
606 .\" The page goes on to note that the FUTEX_WAIT_BITSET operation
607 .\" is nevertheless used (with a bitset of all ones) in order to
608 .\" obtain the absolute timeout functionality that is useful
609 .\" for efficiently implementing Pthreads APIs (which use absolute
610 .\" timeouts); FUTEX_WAIT provides only relative timeouts.
611
612 The
613 .I uaddr2
614 and
615 .I timeout
616 arguments are ignored.
617 .\"
618 .\"
619 .SS Priority-inheritance futexes
620 Linux supports priority-inheritance (PI) futexes in order to handle
621 priority-inversion problems that can be encountered with
622 normal futex locks.
623 .\"
624 .\" FIXME ===== Start of adapted Hart/Guniguntala text =====
625 .\" The following text is drawn from the Hart/Guniguntala paper,
626 .\" but I have reworded some pieces significantly. Please check it.
627 .\"
628 The PI futex operations described below differ from the other
629 futex operations in that they impose policy on the use of the futex value:
630 .IP * 3
631 If the lock is unowned, the futex value shall be 0.
632 .IP *
633 If the lock is owned, the futex value shall be the thread ID (TID; see
634 .BR gettid (2))
635 of the owning thread.
636 .IP *
637 .\" FIXME In the following line, I added "the lock is owned and". Okay?
638 If the lock is owned and there are threads contending for the lock,
639 then the
640 .B FUTEX_WAITERS
641 bit shall be set in the futex value; in other words, the futex value is:
642
643 FUTEX_WAITERS | TID
644 .PP
645 With this policy in place,
646 a user-space application can acquire an unowned
647 lock or release an uncontended lock using a atomic
648 .\" FIXME In the following line, I added "user-space". Okay?
649 user-space instructions (e.g.,
650 .I cmpxchg
651 on the x86 architecture).
652 Locking an unowned lock simply consists of setting
653 the futex value to the caller's TID.
654 Releasing an uncontended lock simply requires setting the futex value to 0.
655
656 If a futex is currently owned (i.e., has a nonzero value),
657 waiters must employ the
658 .B FUTEX_LOCK_PI
659 operation to acquire the lock.
660 If a lock is contended (i.e., the
661 .B FUTEX_WAITERS
662 bit is set in the futex value), the lock owner must employ the
663 .B FUTEX_UNLOCK_PI
664 operation to release the lock.
665
666 In the cases where callers are forced into the kernel
667 (i.e., required to perform a
668 .BR futex ()
669 operation),
670 they then deal directly with a so-called RT-mutex,
671 a kernel locking mechanism which implements the required
672 priority-inheritance semantics.
673 After the RT-mutex is acquired, the futex value is updated accordingly,
674 before the calling thread returns to user space.
675 .\" FIXME ===== End of adapted Hart/Guniguntala text =====
676
677 It is important
678 .\" FIXME We need some explanation here of why it is important to note this
679 to note that the kernel will update the futex value prior
680 to returning to user space.
681 Unlike the other futex operations described above,
682 the PI futex operations are designed
683 for the implementation of very specific IPC mechanisms).
684 .\"
685 .\" FIXME We don't quite have a definition anywhere of what a PI futex
686 .\" is (vs a non-PI futex). Below, we have the information of
687 .\" FUTEX_CMP_REQUEUE_PI requeues from a non-PI futex to a
688 .\" PI futex, but what determines whether the futex is of one
689 .\" kind of the other? We should have such a definition somewhere
690 .\" about here.
691
692 PI futexes are operated on by specifying one of the following values in
693 .IR futex_op :
694 .TP
695 .BR FUTEX_LOCK_PI " (since Linux 2.6.18)"
696 .\" commit c87e2837be82df479a6bae9f155c43516d2feebc
697 .\"
698 .\" FIXME I did some significant rewording of tglx's text.
699 .\" Please check, in case I injected errors.
700 .\"
701 This operation is used after after an attempt to acquire
702 the futex lock via an atomic user-space instruction failed
703 because the futex has a nonzero value\(emspecifically,
704 because it contained the namespace-specific TID of the lock owner.
705 .\" FIXME In the preceding line, what does "namespace-specific" mean?
706 .\" (I kept those words from tglx.)
707 .\" That is, what kind of namespace are we talking about?
708 .\" (I suppose we are talking PID namespaces here, but I want to
709 .\" be sure.)
710
711 The operation checks the value of the futex at the address
712 .IR uaddr .
713 If the value is 0, then the kernel tries to atomically set the futex value to
714 the caller's TID.
715 If that fails,
716 .\" FIXME What would be the cause of failure?
717 or the futex value is nonzero,
718 the kernel atomically sets the
719 .B FUTEX_WAITERS
720 bit, which signals the futex owner that it cannot unlock the futex in
721 user space atomically by setting the futex value to 0.
722 After that, the kernel tries to find the thread which is
723 associated with the owner TID,
724 .\" FIXME Could I get a bit more detail on the next two lines?
725 .\" What is "creates or reuses kernel state" about?
726 creates or reuses kernel state on behalf of the owner
727 and attaches the waiter to it.
728 .\" FIXME In the next line, what type of "priority" are we talking about?
729 .\" Realtime priorities for SCHED_FIFO and SCHED_RR?
730 .\" Or something else?
731 The enqueing of the waiter is in descending priority order if more
732 than one waiter exists.
733 .\" FIXME What does "bandwidth" refer to in the next line?
734 The owner inherits either the priority or the bandwidth of the waiter.
735 .\" FIXME In the preceding line, what determines whether the
736 .\" owner inherits the priority versus the bandwidth?
737 .\"
738 .\" FIXME Could I get some help translating the next sentence into
739 .\" something that user-space developers (and I) can understand?
740 .\" In particular, what are "nexted locks" in this context?
741 This inheritance follows the lock chain in the case of
742 nested locking and performs deadlock detection.
743
744 .\" FIXME tglx says "The timeout argument is handled as described in
745 .\" FUTEX_WAIT." However, it appears to me that this is not right.
746 .\" Is the following formulation correct.
747 The
748 .I timeout
749 argument provides a timeout for the lock attempt.
750 It is interpreted as an absolute time, measured against the
751 .BR CLOCK_REALTIME
752 clock.
753 If
754 .I timeout
755 is NULL, the operation will block indefinitely.
756
757 The
758 .IR uaddr2 ,
759 .IR val ,
760 and
761 .IR val3
762 arguments are ignored.
763 .\" FIXME
764 .\" tglx noted the following "ERROR" case for FUTEX_LOCK_PI and
765 .\" FUTEX_TRYLOCK_PI
766 .\" > [EOWNERDIED] The owner of the futex died and the kernel made the
767 .\" > caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit
768 .\" > in the futex userspace value. Caller is responsible for cleanup
769 .\"
770 .\" However, there is no such thing as an EOWNERDIED error. I had a look
771 .\" through the kernel source for the FUTEX_OWNER_DIED cases and didn't
772 .\" see an obvious error associated with them. Can you clarify? (I think
773 .\" the point is that this condition, which is described in
774 .\" Documentation/robust-futexes.txt, is not an error as such. However,
775 .\" I'm not yet sure of how to describe it in the man page.)
776 .\"
777 .TP
778 .BR FUTEX_TRYLOCK_PI " (since Linux 2.6.18)"
779 .\" commit c87e2837be82df479a6bae9f155c43516d2feebc
780 This operation tries to acquire the futex at
781 .IR uaddr .
782 .\" FIXME I think it would be helpful here to say a few more words about
783 .\" the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI
784 It deals with the situation where the TID value at
785 .I uaddr
786 is 0, but the
787 .B FUTEX_WAITERS
788 bit is set.
789 .\" FIXME How does the situation in the previous sentence come about?
790 .\" Probably it would be helpful to say something about that in
791 .\" the man page.
792 .\" FIXME And *how* does FUTEX_TRYLOCK_PI deal with this situation?
793 User space cannot handle this race free.
794
795 The
796 .IR uaddr2 ,
797 .IR val ,
798 .IR timeout ,
799 and
800 .IR val3
801 arguments are ignored.
802 .TP
803 .BR FUTEX_UNLOCK_PI " (since Linux 2.6.18)"
804 .\" commit c87e2837be82df479a6bae9f155c43516d2feebc
805 This operation wakes the top priority waiter which is waiting in
806 .B FUTEX_LOCK_PI
807 on the futex address provided by the
808 .I uaddr
809 argument.
810
811 This is called when the user space value at
812 .I uaddr
813 cannot be changed atomically from a TID (of the owner) to 0.
814
815 The
816 .IR uaddr2 ,
817 .IR val ,
818 .IR timeout ,
819 and
820 .IR val3
821 arguments are ignored.
822 .TP
823 .BR FUTEX_CMP_REQUEUE_PI " (since Linux 2.6.31)"
824 .\" commit 52400ba946759af28442dee6265c5c0180ac7122
825 .\" FIXME to complete
826 This operation is a PI-aware variant of
827 .BR FUTEX_CMP_REQUEUE .
828 It requeues waiters that are blocked via
829 .B FUTEX_WAIT_REQUEUE_PI
830 on
831 .I uaddr
832 from a non-PI source futex
833 .RI ( uaddr )
834 to a PI target futex
835 .RI ( uaddr2 ).
836
837 As with
838 .BR FUTEX_CMP_REQUEUE ,
839 this operation wakes up a maximum of
840 .I val
841 waiters that are waiting on the futex at
842 .IR uaddr .
843 However, for
844 .BR FUTEX_CMP_REQUEUE_PI ,
845 .I val
846 is required to be 1.
847 The remaining waiters are removed from the wait queue of the source futex at
848 .I uaddr
849 and added to the wait queue of the target futex at
850 .IR uaddr2 .
851
852 The
853 .I val3
854 and
855 .I timeout
856 arguments serve the same purposes as for
857 .BR FUTEX_CMP_REQUEUE .
858 .\" FIXME The page at http://locklessinc.com/articles/futex_cheat_sheet/
859 .\" notes that "priority-inheritance Futex to priority-inheritance
860 .\" Futex requeues are currently unsupported". Do we need to say
861 .\" something in the man page about that?
862 .TP
863 .BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)"
864 .\" commit 52400ba946759af28442dee6265c5c0180ac7122
865 .\" FIXME to complete
866 .\"
867 .\" FIXME Employs 'timeout' argument, supports FUTEX_CLOCK_REALTIME
868 .\" 'timeout' can be NULL
869 .\"
870 [As yet undocumented]
871 .SH RETURN VALUE
872 .PP
873 In the event of an error, all operations return \-1 and set
874 .I errno
875 to indicate the cause of the error.
876 The return value on success depends on the operation,
877 as described in the following list:
878 .TP
879 .B FUTEX_WAIT
880 Returns 0 if the process was woken by a
881 .B FUTEX_WAKE
882 or
883 .B FUTEX_WAKE_BITSET
884 call.
885 .TP
886 .B FUTEX_WAKE
887 Returns the number of processes woken up.
888 .TP
889 .B FUTEX_FD
890 Returns the new file descriptor associated with the futex.
891 .TP
892 .B FUTEX_REQUEUE
893 Returns the number of processes woken up.
894 .TP
895 .B FUTEX_CMP_REQUEUE
896 Returns the total number of processes woken up or requeued to the futex at
897 .IR uaddr2 .
898 If this value is greater than
899 .IR val ,
900 then difference is the number of waiters requeued to the futex at
901 .IR uaddr2 .
902 .\"
903 .\" FIXME Add success returns for other operations
904 .TP
905 .B FUTEX_WAKE_OP
906 .\" FIXME Is the following correct?
907 Returns the total number of waiters that were woken up.
908 This is the sum of the woken waiters on the two futexes at
909 .I uaddr
910 and
911 .IR uaddr2 .
912 .TP
913 .B FUTEX_WAIT_BITSET
914 .\" FIXME Is the following correct?
915 Returns 0 if the process was woken by a
916 .B FUTEX_WAKE
917 or
918 .B FUTEX_WAKE_BITSET
919 call.
920 .TP
921 .B FUTEX_WAKE_BITSET
922 .\" FIXME Is the following correct?
923 Returns the number of processes woken up.
924 .TP
925 .B FUTEX_LOCK_PI
926 .\" FIXME Is the following correct?
927 Returns 0 if the futex was successfully locked.
928 .TP
929 .B FUTEX_TRYLOCK_PI
930 .\" FIXME Is the following correct?
931 Returns 0 if the futex was successfully locked.
932 .TP
933 .B FUTEX_UNLOCK_PI
934 .\" FIXME Is the following correct?
935 Returns 0 if the futex was successfully unlocked.
936 .TP
937 .B FUTEX_CMP_REQUEUE_PI
938 .\" FIXME Is the following correct?
939 Returns the total number of processes woken up or requeued to the futex at
940 .IR uaddr2 .
941 If this value is greater than
942 .IR val ,
943 then difference is the number of waiters requeued to the futex at
944 .IR uaddr2 .
945 .TP
946 .B FUTEX_WAIT_REQUEUE_PI
947 .\" FIXME Is the following correct?
948 Returns 0 if the caller was successfully requeued to the futex at
949 .IR uaddr2 .
950 .SH ERRORS
951 .TP
952 .B EACCES
953 No read access to futex memory.
954 .TP
955 .B EAGAIN
956 .RB ( FUTEX_WAIT )
957 The value pointed to by
958 .I uaddr
959 was not equal to the expected value
960 .I val
961 at the time of the call.
962 .TP
963 .B EAGAIN
964 .B FUTEX_CMP_REQUEUE
965 detected that the value pointed to by
966 .I uaddr
967 is not equal to the expected value
968 .IR val3 .
969 .\" FIXME: Is the following sentence correct?
970 (This probably indicates a race;
971 use the safe
972 .B FUTEX_WAKE
973 now.)
974 .\"
975 .\" FIXME Should there be an EAGAIN case for FUTEX_TRYLOCK_PI?
976 .\" It seems so, looking at the handling of the rt_mutex_trylock()
977 .\" call in futex_lock_pi()
978 .\"
979 .TP
980 .BR EAGAIN
981 .RB ( FUTEX_LOCK_PI ,
982 .BR FUTEX_TRYLOCK_PI )
983 The futex owner thread ID is about to exit,
984 but has not yet handled the internal state cleanup.
985 Try again.
986 .\"
987 .\" FIXME Is there not also an EAGAIN error case on 'uaddr2' for
988 .\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
989 .\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
990 .\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EAGAIN?
991 .TP
992 .BR EDEADLK
993 .RB ( FUTEX_LOCK_PI ,
994 .BR FUTEX_TRYLOCK_PI )
995 The futex at
996 .I uaddr
997 is already locked by the caller.
998 .\"
999 .\" FIXME Is there not also an EDEADLK error case on 'uaddr2' for
1000 .\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
1001 .\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
1002 .\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EDEADLK?
1003 .TP
1004 .B EFAULT
1005 A required pointer argument (i.e.,
1006 .IR uaddr ,
1007 .IR uaddr2 ,
1008 or
1009 .IR timeout )
1010 did not point to a valid user-space address.
1011 .TP
1012 .B EINTR
1013 A
1014 .B FUTEX_WAIT
1015 or
1016 .B FUTEX_WAIT_BITSET
1017 operation was interrupted by a signal (see
1018 .BR signal (7))
1019 or a spurious wakeup.
1020 .TP
1021 .B EINVAL
1022 The operation in
1023 .IR futex_op
1024 is one of those that employs a timeout, but the supplied
1025 .I timeout
1026 argument was invalid
1027 .RI ( tv_sec
1028 was less than zero, or
1029 .IR tv_nsec
1030 was not less than 1000,000,000).
1031 .TP
1032 .B EINVAL
1033 The operation specified in
1034 .BR futex_op
1035 employs one or both of the pointers
1036 .I uaddr
1037 and
1038 .IR uaddr2 ,
1039 but one of these does not point to a valid object\(emthat is,
1040 the address is not four-byte-aligned.
1041 .TP
1042 .B EINVAL
1043 .RB ( FUTEX_WAKE ,
1044 .BR FUTEX_WAKE_OP ,
1045 .BR FUTEX_WAKE_BITSET ,
1046 .BR FUTEX_REQUEUE ,
1047 .BR FUTEX_CMP_REQUEUE )
1048 The kernel detected an inconsistency between the user-space state at
1049 .I uaddr
1050 and the kernel state\(emthat is, it detected a waiter which waits in
1051 .BR FUTEX_LOCK_PI
1052 on
1053 .IR uaddr .
1054 .TP
1055 .B EINVAL
1056 .RB ( FUTEX_WAIT_BITSET ,
1057 .BR FUTEX_WAKE_BITSET )
1058 The bitset supplied in
1059 .IR val3
1060 is zero.
1061 .TP
1062 .B EINVAL
1063 .RB ( FUTEX_REQUEUE )
1064 .\" FIXME tglx suggested adding this, but does this error really
1065 .\" occur for FUTEX_REQUEUE?
1066 .I uaddr
1067 equals
1068 .IR uaddr2
1069 (i.e., an attempt was made to requeue to the same futex).
1070 .TP
1071 .BR EINVAL
1072 .RB ( FUTEX_FD )
1073 The signal number supplied in
1074 .I val
1075 is invalid.
1076 .TP
1077 .B EINVAL
1078 .RB ( FUTEX_LOCK_PI ,
1079 .BR FUTEX_TRYLOCK_PI ,
1080 .BR FUTEX_UNLOCK_PI )
1081 The kernel detected an inconsistency between the user-space state at
1082 .I uaddr
1083 and the kernel state.
1084 This indicates either state corruption
1085 .\" FIXME tglx did not mention the "state corruption" for FUTEX_UNLOCK_PI.
1086 .\" Does that case also apply for FUTEX_UNLOCK_PI?
1087 or that the kernel found a waiter on
1088 .I uaddr
1089 which is waiting via
1090 .BR FUTEX_WAIT
1091 or
1092 .BR FUTEX_WAIT_BITSET .
1093 .TP
1094 .B EINVAL
1095 Invalid argument.
1096 .TP
1097 .BR ENOMEM
1098 .RB ( FUTEX_LOCK_PI ,
1099 .BR FUTEX_TRYLOCK_PI ,
1100 .BR FUTEX_CMP_REQUEUE_PI )
1101 The kernel could not allocate memory to hold state information.
1102 .TP
1103 .B ENFILE
1104 .RB ( FUTEX_FD )
1105 The system limit on the total number of open files has been reached.
1106 .TP
1107 .B ENOSYS
1108 Invalid operation specified in
1109 .IR futex_op .
1110 .TP
1111 .B ENOSYS
1112 The
1113 .BR FUTEX_CLOCK_REALTIME
1114 option was specified in
1115 .IR futex_op ,
1116 but the accompanying operation was neither
1117 .BR FUTEX_WAIT_BITSET
1118 nor
1119 .BR FUTEX_WAIT_REQUEUE_PI .
1120 .TP
1121 .BR ENOSYS
1122 .RB ( FUTEX_LOCK_PI ,
1123 .BR FUTEX_TRYLOCK_PI ,
1124 .BR FUTEX_UNLOCK_PI )
1125 A run-time check determined that the operation not available.
1126 .BR FUTEX_LOCK_PI
1127 and
1128 .BR FUTEX_TRYLOCK_PI
1129 are not implemented on all architectures and
1130 not supported on some CPU variants.
1131 .TP
1132 .BR EPERM
1133 .RB ( FUTEX_LOCK_PI ,
1134 .BR FUTEX_TRYLOCK_PI )
1135 The caller is not allowed to attach itself to the futex.
1136 (This may be caused by a state corruption in user space.)
1137 .\"
1138 .\" FIXME Is there not also an EPERM error case on 'uaddr2' for
1139 .\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
1140 .\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
1141 .\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EPERM?
1142 .TP
1143 .BR EPERM
1144 .BR FUTEX_UNLOCK_PI
1145 The caller does not own the futex.
1146 .TP
1147 .BR ESRCH
1148 .RB ( FUTEX_LOCK_PI ,
1149 .BR FUTEX_TRYLOCK_PI )
1150 .\" FIXME I reworded the following sentence a bit differently from
1151 .\" tglx's formulation. Is it okay?
1152 The thread ID in the futex at
1153 .I uaddr
1154 does not exist.
1155 .\"
1156 .\" FIXME Is there not also an ESRCH error case on 'uaddr2' for
1157 .\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
1158 .\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
1159 .\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> ESRCH?
1160 .TP
1161 .B ETIMEDOUT
1162 The operation in
1163 .IR futex_op
1164 employed the timeout specified in
1165 .IR timeout ,
1166 and the timeout expired before the operation completed.
1167 .SH VERSIONS
1168 .PP
1169 Futexes were first made available in a stable kernel release
1170 with Linux 2.6.0.
1171
1172 Initial futex support was merged in Linux 2.5.7 but with different semantics
1173 from what was described above.
1174 A four-argument system call with the semantics
1175 described in this page was introduced in Linux 2.5.40.
1176 In Linux 2.5.70, one argument
1177 was added.
1178 In Linux 2.6.7, a sixth argument was added\(emmessy, especially
1179 on the s390 architecture.
1180 .SH CONFORMING TO
1181 This system call is Linux-specific.
1182 .SH NOTES
1183 .PP
1184 To reiterate, bare futexes are not intended as an easy-to-use abstraction
1185 for end-users.
1186 (There is no wrapper function for this system call in glibc.)
1187 Implementors are expected to be assembly literate and to have
1188 read the sources of the futex user-space library referenced below.
1189 .\" .SH AUTHORS
1190 .\" .PP
1191 .\" Futexes were designed and worked on by
1192 .\" Hubertus Franke (IBM Thomas J. Watson Research Center),
1193 .\" Matthew Kirkwood, Ingo Molnar (Red Hat)
1194 .\" and Rusty Russell (IBM Linux Technology Center).
1195 .\" This page written by bert hubert.
1196 .SH SEE ALSO
1197 .BR get_robust_list (2),
1198 .BR restart_syscall (2),
1199 .BR futex (7)
1200 .PP
1201 The following kernel source files:
1202 .IP * 2
1203 .I Documentation/pi-futex.txt
1204 .IP *
1205 .I Documentation/futex-requeue-pi.txt
1206 .IP *
1207 .I Documentation/locking/rt-mutex.txt
1208 .IP *
1209 .I Documentation/locking/rt-mutex-design.txt
1210 .PP
1211 \fIFuss, Futexes and Furwocks: Fast Userlevel Locking in Linux\fP
1212 (proceedings of the Ottawa Linux Symposium 2002), online at
1213 .br
1214 .UR http://kernel.org\:/doc\:/ols\:/2002\:/ols2002-pages-479-495.pdf
1215 .UE
1216
1217 \fIRequeue-PI: Making Glibc Condvars PI-Aware\fP
1218 (2009 Real-Time Linux Workshop)
1219 .UR http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
1220 .UE
1221
1222 \fIFutexes Are Tricky\fP (updated in 2011), Ulrich Drepper
1223 .UR http://www.akkadia.org/drepper/futex.pdf
1224 .UE
1225 .PP
1226 Futex example library, futex-*.tar.bz2 at
1227 .br
1228 .UR ftp://ftp.kernel.org\:/pub\:/linux\:/kernel\:/people\:/rusty/
1229 .UE
1230 .\"
1231 .\" FIXME Are there any other resources that should be listed
1232 .\" in the SEE ALSO section?