]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/futex.2
futex.2: wfix
[thirdparty/man-pages.git] / man2 / futex.2
CommitLineData
8f0aff2a 1.\" Page by b.hubert
1abce893
MK
2.\" and Copyright (C) 2015, Thomas Gleixner <tglx@linutronix.de>
3.\" and Copyright (C) 2015, Michael Kerrisk <mtk.manpages@gmail.com>
2297bf0e 4.\"
2e46a6e7 5.\" %%%LICENSE_START(FREELY_REDISTRIBUTABLE)
8f0aff2a 6.\" may be freely modified and distributed
8ff7380d 7.\" %%%LICENSE_END
fea681da
MK
8.\"
9.\" Niki A. Rahimi (LTC Security Development, narahimi@us.ibm.com)
10.\" added ERRORS section.
11.\"
12.\" Modified 2004-06-17 mtk
13.\" Modified 2004-10-07 aeb, added FUTEX_REQUEUE, FUTEX_CMP_REQUEUE
14.\"
47f5c4ba
MK
15.\" FIXME Still to integrate are some points from Torvald Riegel's mail of
16.\" 2015-01-23:
17.\" http://thread.gmane.org/gmane.linux.kernel/1703405/focus=7977
18.\"
78e85692 19.\" FIXME Do we need to add some text regarding Torvald Riegel's 2015-01-24 mail
02182e7c
MK
20.\" at http://thread.gmane.org/gmane.linux.kernel/1703405/focus=1873242
21.\"
97986708 22.TH FUTEX 2 2016-03-15 "Linux" "Linux Programmer's Manual"
fea681da 23.SH NAME
ce154705 24futex \- fast user-space locking
fea681da 25.SH SYNOPSIS
9d9dc1e8 26.nf
fea681da
MK
27.sp
28.B "#include <linux/futex.h>"
fea681da
MK
29.B "#include <sys/time.h>"
30.sp
d33602c4 31.BI "int futex(int *" uaddr ", int " futex_op ", int " val ,
768d3c23 32.BI " const struct timespec *" timeout , \
c6dc40a2 33" \fR /* or: \fBuint32_t \fIval2\fP */
9d9dc1e8 34.BI " int *" uaddr2 ", int " val3 );
9d9dc1e8 35.fi
409f08b0 36
b939d6e4
MK
37.IR Note :
38There is no glibc wrapper for this system call; see NOTES.
47297adb 39.SH DESCRIPTION
fea681da
MK
40.PP
41The
e511ffb6 42.BR futex ()
4b35dc5d 43system call provides a method for waiting until a certain condition becomes
077981d4
MK
44true.
45It is typically used as a blocking construct in the context of
d45f244c
MK
46shared-memory synchronization.
47When using futexes, the majority of
48the synchronization operations are performed in user space.
bc54ed38 49A user-space program employs the
d45f244c 50.BR futex ()
ca4e5b2b 51system call only when it is likely that the program has to block for
4c8cb0ff 52a longer time until the condition becomes true.
bc54ed38 53Other
d45f244c 54.BR futex ()
bc54ed38
MK
55operations can be used to wake any processes or threads waiting
56for a particular condition.
4b35dc5d 57
7e8dcabc
MK
58A futex is a 32-bit value\(emreferred to below as a
59.IR "futex word" \(emwhose
60address is supplied to the
4b35dc5d 61.BR futex ()
7e8dcabc 62system call.
c3f4c019 63(Futexes are 32 bits in size on all platforms, including 64-bit systems.)
7e8dcabc
MK
64All futex operations are governed by this value.
65In order to share a futex between processes,
66the futex is placed in a region of shared memory,
67created using (for example)
68.BR mmap (2)
69or
70.BR shmat (2).
c3f4c019 71(Thus, the futex word may have different
7e8dcabc
MK
72virtual addresses in different processes,
73but these addresses all refer to the same location in physical memory.)
ca4e5b2b
MK
74In a multithreaded program, it is sufficient to place the futex word
75in a global variable shared by all threads.
809ca3ae 76
0c3ec26b
MK
77When executing a futex operation that requests to block a thread,
78the kernel will block only if the futex word has the value that the
55f9e85e
MK
79calling thread supplied (as one of the arguments of the
80.BR futex ()
81call) as the expected value of the futex word.
9d32a39b
MK
82The loading of the futex word's value,
83the comparison of that value with the expected value,
bc54ed38 84and the actual blocking will happen atomically and will be totally ordered
da894b18 85with respect to concurrent operations performed by other threads
0fb87d16 86on the same futex word.
da894b18
MK
87.\" Notes from Darren Hart (Dec 2015):
88.\" Totally ordered with respect futex operations refers to semantics
89.\" of the ACQUIRE/RELEASE operations and how they impact ordering of
90.\" memory reads and writes. The kernel futex operations are protected
f6615c42 91.\" by spinlocks, which ensure that all operations are serialized
da894b18
MK
92.\" with respect to one another.
93.\"
94.\" This is a lot to attempt to define in this document. Perhaps a
95.\" reference to linux/Documentation/memory-barriers.txt as a footnote
96.\" would be sufficient? Or perhaps for this manual, "serialized" would
97.\" be sufficient, with a footnote regarding "totally ordered" and a
98.\" pointer to the memory-barrier documentation?
b80daba2 99Thus, the futex word is used to connect the synchronization in user space
9d32a39b 100with the implementation of blocking by the kernel.
55f9e85e 101Analogously to an atomic
4b35dc5d 102compare-and-exchange operation that potentially changes shared memory,
077981d4 103blocking via a futex is an atomic compare-and-block operation.
d6bb5a38
MK
104.\" FIXME(Torvald Riegel):
105.\" Eventually we want to have some text in NOTES to satisfy
106.\" the reference in the following sentence
107.\" See NOTES for
108.\" a detailed specification of the synchronization semantics.
4b35dc5d 109
ca4e5b2b 110One use of futexes is for implementing locks.
c0dc758e
MK
111The state of the lock (i.e., acquired or not acquired)
112can be represented as an atomically accessed flag in shared memory.
4c8cb0ff 113In the uncontended case,
c3f4c019 114a thread can access or modify the lock state with atomic instructions,
4c8cb0ff
MK
115for example atomically changing it from not acquired to acquired
116using an atomic compare-and-exchange instruction.
55f9e85e
MK
117(Such instructions are performed entirely in user mode,
118and the kernel maintains no information about the lock state.)
119On the other hand, a thread may be unable to acquire a lock because
8e754e12 120it is already acquired by another thread.
55f9e85e 121It then may pass the lock's flag as a futex word and the value
0c3ec26b 122representing the acquired state as the expected value to a
8e754e12
HS
123.BR futex ()
124wait operation.
55f9e85e 125This
8e754e12 126.BR futex ()
bc54ed38 127operation will block if and only if the lock is still acquired
f6615c42 128(i.e., the value in the futex word still matches the "acquired state").
077981d4 129When releasing the lock, a thread has to first reset the
0c3ec26b 130lock state to not acquired and then execute a futex
55f9e85e 131operation that wakes threads blocked on the lock flag used as a futex word
f6615c42 132(this can be further optimized to avoid unnecessary wake-ups).
077981d4 133See
4b35dc5d
TR
134.BR futex (7)
135for more detail on how to use futexes.
136
137Besides the basic wait and wake-up futex functionality, there are further
077981d4 138futex operations aimed at supporting more complex use cases.
ca4e5b2b
MK
139
140Note that
2af84f99 141no explicit initialization or destruction is necessary to use futexes;
4c8cb0ff
MK
142the kernel maintains a futex
143(i.e., the kernel-internal implementation artifact)
4b35dc5d
TR
144only while operations such as
145.BR FUTEX_WAIT ,
146described below, are being performed on a particular futex word.
a663ca5a
MK
147.\"
148.SS Arguments
fea681da
MK
149The
150.I uaddr
077981d4
MK
151argument points to the futex word.
152On all platforms, futexes are four-byte
4b35dc5d 153integers that must be aligned on a four-byte boundary.
f388ba70
MK
154The operation to perform on the futex is specified in the
155.I futex_op
156argument;
157.IR val
158is a value whose meaning and purpose depends on
159.IR futex_op .
36ab2074
MK
160
161The remaining arguments
162.RI ( timeout ,
163.IR uaddr2 ,
164and
165.IR val3 )
166are required only for certain of the futex operations described below.
167Where one of these arguments is not required, it is ignored.
768d3c23 168
36ab2074
MK
169For several blocking operations, the
170.I timeout
171argument is a pointer to a
172.IR timespec
173structure that specifies a timeout for the operation.
174However, notwithstanding the prototype shown above, for some operations,
10022b8e 175the least significant four bytes are used as an integer whose meaning
36ab2074 176is determined by the operation.
768d3c23
MK
177For these operations, the kernel casts the
178.I timeout
10022b8e
HS
179value first to
180.IR "unsigned long",
181then to
c6dc40a2 182.IR uint32_t ,
768d3c23
MK
183and in the remainder of this page, this argument is referred to as
184.I val2
185when interpreted in this fashion.
186
de5a3bb4 187Where it is required, the
36ab2074 188.IR uaddr2
4c8cb0ff
MK
189argument is a pointer to a second futex word that is employed
190by the operation.
b3d5bc4c 191
36ab2074
MK
192The interpretation of the final integer argument,
193.IR val3 ,
194depends on the operation.
a663ca5a
MK
195.\"
196.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
197.\"
198.SS Futex operations
6be4bad7 199The
d33602c4 200.I futex_op
6be4bad7
MK
201argument consists of two parts:
202a command that specifies the operation to be performed,
ed1819cf 203bit-wise ORed with zero or more options that
6be4bad7 204modify the behaviour of the operation.
fc30eb79 205The options that may be included in
d33602c4 206.I futex_op
fc30eb79
TG
207are as follows:
208.TP
209.BR FUTEX_PRIVATE_FLAG " (since Linux 2.6.22)"
210.\" commit 34f01cc1f512fa783302982776895c73714ebbc2
211This option bit can be employed with all futex operations.
e45f9735 212It tells the kernel that the futex is process-private and not shared
0c3ec26b
MK
213with another process (i.e., it is being used for synchronization
214only between threads of the same process).
943ccc52
MK
215This allows the kernel to make some additional performance optimizations.
216.\" I.e., It allows the kernel choose the fast path for validating
217.\" the user-space address and avoids expensive VMA lookups,
218.\" taking reference counts on file backing store, and so on.
ae2c1774
MK
219
220As a convenience,
221.IR <linux/futex.h>
222defines a set of constants with the suffix
223.BR _PRIVATE
224that are equivalents of all of the operations listed below,
dcdfde26 225.\" except the obsolete FUTEX_FD, for which the "private" flag was
ae2c1774
MK
226.\" meaningless
227but with the
228.BR FUTEX_PRIVATE_FLAG
229ORed into the constant value.
230Thus, there are
231.BR FUTEX_WAIT_PRIVATE ,
232.BR FUTEX_WAKE_PRIVATE ,
233and so on.
2e98bbc2
TG
234.TP
235.BR FUTEX_CLOCK_REALTIME " (since Linux 2.6.28)"
236.\" commit 1acdac104668a0834cfa267de9946fac7764d486
4a7e5b05 237This option bit can be employed only with the
949ceae3
MK
238.BR FUTEX_WAIT_BITSET ,
239.BR FUTEX_WAIT_REQUEUE_PI ,
2e98bbc2 240and
949ceae3
MK
241(since Linux 4.5)
242.\" commit 337f13046ff03717a9e99675284a817527440a49
6f19879d 243.BR FUTEX_WAIT
c84cf68c 244operations.
2e98bbc2 245
8064bfa5 246If this option is set, the kernel measures the
f2103b26 247.I timeout
8064bfa5
MK
248against the
249.BR CLOCK_REALTIME
250clock.
2e98bbc2 251
8064bfa5 252If this option is not set, the kernel measures the
f2103b26 253.I timeout
8064bfa5 254against the
1c952cf5
MK
255.BR CLOCK_MONOTONIC
256clock.
6be4bad7
MK
257.PP
258The operation specified in
d33602c4 259.I futex_op
6be4bad7 260is one of the following:
70b06b90
MK
261.\"
262.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
263.\"
fea681da 264.TP
81c9d87e
MK
265.BR FUTEX_WAIT " (since Linux 2.6.0)"
266.\" Strictly speaking, since some time in 2.5.x
f065673c 267This operation tests that the value at the
4b35dc5d 268futex word pointed to by the address
fea681da 269.I uaddr
4b35dc5d 270still contains the expected value
fea681da 271.IR val ,
fd105614 272and if so, then sleeps waiting for a
682edefb 273.B FUTEX_WAKE
fd105614 274operation on the futex word.
077981d4 275The load of the value of the futex word is an atomic memory
4b35dc5d 276access (i.e., using atomic machine instructions of the respective
077981d4
MK
277architecture).
278This load, the comparison with the expected value, and
fd105614 279starting to sleep are performed atomically
da56650a 280.\" FIXME: Torvald, I think we may need to add some explanation of
cd175805 281.\" "totally ordered" here.
fd105614
MK
282and totally ordered
283with respect to other futex operations on the same futex word.
c0dc758e
MK
284If the thread starts to sleep,
285it is considered a waiter on this futex word.
f065673c
MK
286If the futex value does not match
287.IR val ,
4710334a 288then the call fails immediately with the error
badbf70c 289.BR EAGAIN .
4b35dc5d
TR
290
291The purpose of the comparison with the expected value is to prevent lost
fd105614
MK
292wake-ups.
293If another thread changed the value of the futex word after the
c0dc758e
MK
294calling thread decided to block based on the prior value,
295and if the other thread executed a
4b35dc5d
TR
296.BR FUTEX_WAKE
297operation (or similar wake-up) after the value change and before this
f065673c 298.BR FUTEX_WAIT
bc54ed38
MK
299operation, then the calling thread will observe the
300value change and will not start to sleep.
1909e523 301
c13182ef 302If the
fea681da 303.I timeout
40d2dab9
MK
304is not NULL, the structure it points to specifies a
305.I relative
306timeout for the wait.
ac991a11
MK
307(This interval will be rounded up to the system clock granularity,
308and is guaranteed not to expire early.)
a6918f1d 309The timeout is by default measured according to the
1c952cf5 310.BR CLOCK_MONOTONIC
a01c3098
MK
311clock, but, since Linux 4.5, the
312.BR CLOCK_REALTIME
313clock can be selected by specifying
314.BR FUTEX_CLOCK_REALTIME
315in
316.IR futex_op .
82a6092b
MK
317If
318.I timeout
319is NULL, the call blocks indefinitely.
4798a7f3 320
c13182ef 321The arguments
fea681da
MK
322.I uaddr2
323and
324.I val3
325are ignored.
74f58a64
MK
326.\" FIXME(Torvald) I think we should remove this. Or maybe adapt to a
327.\" different example.
4b35dc5d
TR
328.\" For
329.\" .BR futex (7),
330.\" this call is executed if decrementing the count gave a negative value
331.\" (indicating contention),
332.\" and will sleep until another process or thread releases
333.\" the futex and executes the
334.\" .B FUTEX_WAKE
335.\" operation.
70b06b90
MK
336.\"
337.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
338.\"
fea681da 339.TP
81c9d87e
MK
340.BR FUTEX_WAKE " (since Linux 2.6.0)"
341.\" Strictly speaking, since Linux 2.5.x
f065673c
MK
342This operation wakes at most
343.I val
4b35dc5d 344of the waiters that are waiting (e.g., inside
f065673c 345.BR FUTEX_WAIT )
4b35dc5d 346on the futex word at the address
f065673c
MK
347.IR uaddr .
348Most commonly,
349.I val
350is specified as either 1 (wake up a single waiter) or
351.BR INT_MAX
352(wake up all waiters).
730bfbda
MK
353No guarantee is provided about which waiters are awoken
354(e.g., a waiter with a higher scheduling priority is not guaranteed
355to be awoken in preference to a waiter with a lower priority).
4798a7f3 356
fea681da
MK
357The arguments
358.IR timeout ,
c8b921bd 359.IR uaddr2 ,
fea681da
MK
360and
361.I val3
362are ignored.
74f58a64
MK
363.\" FIXME(Torvald) I think we should remove this. Or maybe adapt to
364.\" a different example.
4c8cb0ff
MK
365.\" For
366.\" .BR futex (7),
367.\" this is executed if incrementing the count showed that
368.\" there were waiters,
369.\" once the futex value has been set to 1
370.\" (indicating that it is available).
371.\"
372.\" FIXME How does "incrementing the count show that there were waiters"?
70b06b90
MK
373.\"
374.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
375.\"
a7c2bf45
MK
376.TP
377.BR FUTEX_FD " (from Linux 2.6.0 up to and including Linux 2.6.25)"
378.\" Strictly speaking, from Linux 2.5.x to 2.6.25
4c8cb0ff
MK
379This operation creates a file descriptor that is associated with
380the futex at
a7c2bf45 381.IR uaddr .
bdc5957a
MK
382The caller must close the returned file descriptor after use.
383When another process or thread performs a
a7c2bf45 384.BR FUTEX_WAKE
4b35dc5d 385on the futex word, the file descriptor indicates as being readable with
a7c2bf45
MK
386.BR select (2),
387.BR poll (2),
388and
389.BR epoll (7)
390
f1d2171d 391The file descriptor can be used to obtain asynchronous notifications: if
a7c2bf45 392.I val
ca4e5b2b 393is nonzero, then, when another process or thread executes a
a7c2bf45
MK
394.BR FUTEX_WAKE ,
395the caller will receive the signal number that was passed in
396.IR val .
397
398The arguments
399.IR timeout ,
400.I uaddr2
401and
402.I val3
403are ignored.
404
a7c2bf45
MK
405Because it was inherently racy,
406.B FUTEX_FD
407has been removed
408.\" commit 82af7aca56c67061420d618cc5a30f0fd4106b80
409from Linux 2.6.26 onward.
70b06b90
MK
410.\"
411.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
412.\"
a7c2bf45
MK
413.TP
414.BR FUTEX_REQUEUE " (since Linux 2.6.0)"
a7c2bf45 415This operation performs the same task as
27dd3a6e
MK
416.BR FUTEX_CMP_REQUEUE
417(see below), except that no check is made using the value in
a7c2bf45
MK
418.IR val3 .
419(The argument
420.I val3
421is ignored.)
70b06b90
MK
422.\"
423.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
424.\"
a7c2bf45
MK
425.TP
426.BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)"
4b35dc5d 427This operation first checks whether the location
a7c2bf45
MK
428.I uaddr
429still contains the value
430.IR val3 .
431If not, the operation fails with the error
432.BR EAGAIN .
4b35dc5d 433Otherwise, the operation wakes up a maximum of
a7c2bf45
MK
434.I val
435waiters that are waiting on the futex at
436.IR uaddr .
437If there are more than
438.I val
439waiters, then the remaining waiters are removed
440from the wait queue of the source futex at
441.I uaddr
442and added to the wait queue of the target futex at
443.IR uaddr2 .
444The
768d3c23 445.I val2
936876a9 446argument specifies an upper limit on the number of waiters
a7c2bf45 447that are requeued to the futex at
768d3c23 448.IR uaddr2 .
a7c2bf45 449
d6bb5a38
MK
450.\" FIXME(Torvald) Is the following correct? Or is just the decision
451.\" which threads to wake or requeue part of the atomic operation?
4b35dc5d
TR
452The load from
453.I uaddr
4c8cb0ff
MK
454is an atomic memory access (i.e., using atomic machine instructions of
455the respective architecture).
077981d4 456This load, the comparison with
4b35dc5d 457.IR val3 ,
4c8cb0ff
MK
458and the requeueing of any waiters are performed atomically and totally
459ordered with respect to other operations on the same futex word.
ee65b0e8
MK
460.\" Notes from a f2f conversation with Thomas Gleixner (Aug 2015): ###
461.\" The operation is serialized with respect to operations on both
462.\" source and target futex. No other waiter can enqueue itself
463.\" for waiting and no other waiter can dequeue itself because of
464.\" a timeout or signal.
4b35dc5d 465
a7c2bf45
MK
466Typical values to specify for
467.I val
ed1819cf 468are 0 or 1.
a7c2bf45
MK
469(Specifying
470.BR INT_MAX
471is not useful, because it would make the
472.BR FUTEX_CMP_REQUEUE
473operation equivalent to
474.BR FUTEX_WAKE .)
936876a9 475The limit value specified via
768d3c23
MK
476.I val2
477is typically either 1 or
a7c2bf45
MK
478.BR INT_MAX .
479(Specifying the argument as 0 is not useful, because it would make the
480.BR FUTEX_CMP_REQUEUE
481operation equivalent to
482.BR FUTEX_WAIT .)
627b50ce
MK
483
484The
485.B FUTEX_CMP_REQUEUE
486operation was added as a replacement for the earlier
487.BR FUTEX_REQUEUE .
488The difference is that the check of the value at
489.I uaddr
490can be used to ensure that requeueing happens only under certain
491conditions, which allows race conditions to be avoided in certain use cases.
dcb410c3 492.\" But, as Rich Felker points out, there remain valid use cases for
627b50ce
MK
493.\" FUTEX_REQUEUE, for example, when the calling thread is requeuing
494.\" the target(s) to a lock that the calling thread owns
495.\" From: Rich Felker <dalias@libc.org>
496.\" Date: Wed, 29 Oct 2014 22:43:17 -0400
497.\" To: Darren Hart <dvhart@infradead.org>
498.\" CC: libc-alpha@sourceware.org, ...
499.\" Subject: Re: Add futex wrapper to glibc?
500
501Both
502.BR FUTEX_REQUEUE
503and
504.BR FUTEX_CMP_REQUEUE
505can be used to avoid "thundering herd" wake-ups that could occur when using
506.B FUTEX_WAKE
507in cases where all of the waiters that are woken need to acquire
508another futex.
509Consider the following scenario,
510where multiple waiter threads are waiting on B,
511a wait queue implemented using a futex:
512
513.in +4n
514.nf
515lock(A)
516while (!check_value(V)) {
517 unlock(A);
518 block_on(B);
519 lock(A);
520};
521unlock(A);
522.fi
523.in
524
525If a waker thread used
526.BR FUTEX_WAKE ,
527then all waiters waiting on B would be woken up,
67c67ff2 528and they would all try to acquire lock A.
627b50ce
MK
529However, waking all of the threads in this manner would be pointless because
530all except one of the threads would immediately block on lock A again.
531By contrast, a requeue operation wakes just one waiter and moves
532the other waiters to lock A,
533and when the woken waiter unlocks A then the next waiter can proceed.
43d16602 534.\"
70b06b90
MK
535.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
536.\"
a5956430
MK
537.\" FIXME I added a lengthy piece of text on FUTEX_WAKE_OP text,
538.\" and I'd be happy if someone checked it.
fea681da 539.TP
d67e21f5
MK
540.BR FUTEX_WAKE_OP " (since Linux 2.6.14)"
541.\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721
6bac3b85
MK
542.\" Author: Jakub Jelinek <jakub@redhat.com>
543.\" Date: Tue Sep 6 15:16:25 2005 -0700
4c8cb0ff
MK
544.\" FIXME(Torvald) The glibc condvar implementation is currently being
545.\" revised (e.g., to not use an internal lock anymore).
546.\" It is probably more future-proof to remove this paragraph.
d6bb5a38 547.\" [Torvald, do you have an update here?]
6bac3b85
MK
548This operation was added to support some user-space use cases
549where more than one futex must be handled at the same time.
550The most notable example is the implementation of
551.BR pthread_cond_signal (3),
552which requires operations on two futexes,
553the one used to implement the mutex and the one used in the implementation
554of the wait queue associated with the condition variable.
555.BR FUTEX_WAKE_OP
556allows such cases to be implemented without leading to
557high rates of contention and context switching.
558
559The
57f2d48b 560.BR FUTEX_WAKE_OP
e61abc20 561operation is equivalent to executing the following code atomically
4c8cb0ff
MK
562and totally ordered with respect to other futex operations on
563any of the two supplied futex words:
6bac3b85
MK
564
565.in +4n
566.nf
567int oldval = *(int *) uaddr2;
568*(int *) uaddr2 = oldval \fIop\fP \fIoparg\fP;
569futex(uaddr, FUTEX_WAKE, val, 0, 0, 0);
570if (oldval \fIcmp\fP \fIcmparg\fP)
768d3c23 571 futex(uaddr2, FUTEX_WAKE, val2, 0, 0, 0);
6bac3b85
MK
572.fi
573.in
574
575In other words,
57f2d48b 576.BR FUTEX_WAKE_OP
6bac3b85
MK
577does the following:
578.RS
579.IP * 3
4b35dc5d
TR
580saves the original value of the futex word at
581.IR uaddr2
582and performs an operation to modify the value of the futex at
6bac3b85 583.IR uaddr2 ;
4c8cb0ff
MK
584this is an atomic read-modify-write memory access (i.e., using atomic
585machine instructions of the respective architecture)
6bac3b85
MK
586.IP *
587wakes up a maximum of
588.I val
4b35dc5d 589waiters on the futex for the futex word at
6bac3b85
MK
590.IR uaddr ;
591and
592.IP *
4c8cb0ff
MK
593dependent on the results of a test of the original value of the
594futex word at
6bac3b85
MK
595.IR uaddr2 ,
596wakes up a maximum of
768d3c23 597.I val2
4b35dc5d 598waiters on the futex for the futex word at
6bac3b85
MK
599.IR uaddr2 .
600.RE
601.IP
6bac3b85
MK
602The operation and comparison that are to be performed are encoded
603in the bits of the argument
604.IR val3 .
605Pictorially, the encoding is:
606
f6af90e7 607.in +8n
6bac3b85 608.nf
f6af90e7
MK
609+---+---+-----------+-----------+
610|op |cmp| oparg | cmparg |
611+---+---+-----------+-----------+
612 4 4 12 12 <== # of bits
6bac3b85
MK
613.fi
614.in
615
616Expressed in code, the encoding is:
617
618.in +4n
619.nf
620#define FUTEX_OP(op, oparg, cmp, cmparg) \\
621 (((op & 0xf) << 28) | \\
622 ((cmp & 0xf) << 24) | \\
623 ((oparg & 0xfff) << 12) | \\
624 (cmparg & 0xfff))
625.fi
626.in
627
628In the above,
629.I op
630and
631.I cmp
632are each one of the codes listed below.
633The
634.I oparg
635and
636.I cmparg
637components are literal numeric values, except as noted below.
638
639The
640.I op
641component has one of the following values:
642
643.in +4n
644.nf
645FUTEX_OP_SET 0 /* uaddr2 = oparg; */
646FUTEX_OP_ADD 1 /* uaddr2 += oparg; */
647FUTEX_OP_OR 2 /* uaddr2 |= oparg; */
648FUTEX_OP_ANDN 3 /* uaddr2 &= ~oparg; */
649FUTEX_OP_XOR 4 /* uaddr2 ^= oparg; */
650.fi
651.in
652
653In addition, bit-wise ORing the following value into
654.I op
655causes
656.IR "(1\ <<\ oparg)"
657to be used as the operand:
658
659.in +4n
660.nf
661FUTEX_OP_ARG_SHIFT 8 /* Use (1 << oparg) as operand */
662.fi
663.in
664
665The
666.I cmp
667field is one of the following:
668
669.in +4n
670.nf
671FUTEX_OP_CMP_EQ 0 /* if (oldval == cmparg) wake */
672FUTEX_OP_CMP_NE 1 /* if (oldval != cmparg) wake */
673FUTEX_OP_CMP_LT 2 /* if (oldval < cmparg) wake */
674FUTEX_OP_CMP_LE 3 /* if (oldval <= cmparg) wake */
675FUTEX_OP_CMP_GT 4 /* if (oldval > cmparg) wake */
676FUTEX_OP_CMP_GE 5 /* if (oldval >= cmparg) wake */
677.fi
678.in
679
680The return value of
681.BR FUTEX_WAKE_OP
682is the sum of the number of waiters woken on the futex
683.IR uaddr
684plus the number of waiters woken on the futex
685.IR uaddr2 .
70b06b90
MK
686.\"
687.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
688.\"
d67e21f5 689.TP
79c9b436
TG
690.BR FUTEX_WAIT_BITSET " (since Linux 2.6.25)"
691.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
fd9e59d4 692This operation is like
79c9b436
TG
693.BR FUTEX_WAIT
694except that
695.I val3
84abf4ba 696is used to provide a 32-bit bit mask to the kernel.
5e1456d4 697This bit mask is stored in the kernel-internal state of the waiter.
79c9b436
TG
698See the description of
699.BR FUTEX_WAKE_BITSET
700for further details.
701
8064bfa5
MK
702If
703.I timeout
704is not NULL, the structure it points to specifies
705an absolute timeout for the wait operation.
706If
707.I timeout
708is NULL, the operation can block indefinitely.
709
710
79c9b436
TG
711The
712.I uaddr2
713argument is ignored.
70b06b90
MK
714.\"
715.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
716.\"
79c9b436 717.TP
d67e21f5
MK
718.BR FUTEX_WAKE_BITSET " (since Linux 2.6.25)"
719.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
55cc422d
TG
720This operation is the same as
721.BR FUTEX_WAKE
722except that the
e24fbf10 723.I val3
5e1456d4
MK
724argument is used to provide a 32-bit bit mask to the kernel.
725This bit mask is used to select which waiters should be woken up.
726The selection is done by a bit-wise AND of the "wake" bit mask
98d769c0
MK
727(i.e., the value in
728.IR val3 )
5e1456d4
MK
729and the bit mask which is stored in the kernel-internal
730state of the waiter (the "wait" bit mask that is set using
98d769c0
MK
731.BR FUTEX_WAIT_BITSET ).
732All of the waiters for which the result of the AND is nonzero are woken up;
733the remaining waiters are left sleeping.
734
e9d4496b
MK
735The effect of
736.BR FUTEX_WAIT_BITSET
737and
738.BR FUTEX_WAKE_BITSET
9732dd8b
MK
739is to allow selective wake-ups among multiple waiters that are blocked
740on the same futex.
ac894879 741However, note that, depending on the use case,
5e1456d4 742employing this bit-mask multiplexing feature on a
ac894879 743futex can be less efficient than simply using multiple futexes,
5e1456d4 744because employing bit-mask multiplexing requires the kernel
e9d4496b
MK
745to check all waiters on a futex,
746including those that are not interested in being woken up
5e1456d4 747(i.e., they do not have the relevant bit set in their "wait" bit mask).
e9d4496b
MK
748.\" According to http://locklessinc.com/articles/futex_cheat_sheet/:
749.\"
750.\" "The original reason for the addition of these extensions
751.\" was to improve the performance of pthread read-write locks
752.\" in glibc. However, the pthreads library no longer uses the
753.\" same locking algorithm, and these extensions are not used
754.\" without the bitset parameter being all ones.
e24fbf10 755.\"
e9d4496b 756.\" The page goes on to note that the FUTEX_WAIT_BITSET operation
5e1456d4 757.\" is nevertheless used (with a bit mask of all ones) in order to
e9d4496b
MK
758.\" obtain the absolute timeout functionality that is useful
759.\" for efficiently implementing Pthreads APIs (which use absolute
760.\" timeouts); FUTEX_WAIT provides only relative timeouts.
761
98d769c0
MK
762The
763.I uaddr2
764and
765.I timeout
766arguments are ignored.
9732dd8b
MK
767
768The
769.BR FUTEX_WAIT
770and
771.BR FUTEX_WAKE
772operations correspond to
773.BR FUTEX_WAIT_BITSET
774and
775.BR FUTEX_WAKE_BITSET
5e1456d4 776operations where the bit masks are all ones.
bd90a5f9 777.\"
70b06b90 778.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
bd90a5f9
MK
779.\"
780.SS Priority-inheritance futexes
b52e1cd4
MK
781Linux supports priority-inheritance (PI) futexes in order to handle
782priority-inversion problems that can be encountered with
783normal futex locks.
b565548b 784Priority inversion is the problem that occurs when a high-priority
bdc5957a
MK
785task is blocked waiting to acquire a lock held by a low-priority task,
786while tasks at an intermediate priority continuously preempt
787the low-priority task from the CPU.
788Consequently, the low-priority task makes no progress toward
789releasing the lock, and the high-priority task remains blocked.
7f315ae3 790
7d20efd7
MK
791Priority inheritance is a mechanism for dealing with
792the priority-inversion problem.
bdc5957a
MK
793With this mechanism, when a high-priority task becomes blocked
794by a lock held by a low-priority task,
9cee832c
MK
795the priority of the low-priority task is temporarily raised
796to that of the high-priority task,
bdc5957a 797so that it is not preempted by any intermediate level tasks,
7d20efd7
MK
798and can thus make progress toward releasing the lock.
799To be effective, priority inheritance must be transitive,
bdc5957a 800meaning that if a high-priority task blocks on a lock
ca4e5b2b 801held by a lower-priority task that is itself blocked by a lock
bdc5957a 802held by another intermediate-priority task
7d20efd7 803(and so on, for chains of arbitrary length),
b0f35fbb 804then both of those tasks
bdc5957a
MK
805(or more generally, all of the tasks in a lock chain)
806have their priorities raised to be the same as the high-priority task.
7d20efd7 807
9e2b90ee 808From a user-space perspective,
39e9b2e1
MK
809what makes a futex PI-aware is a policy agreement (described below)
810between user space and the kernel about the value of the futex word,
601399f3
MK
811coupled with the use of the PI-futex operations described below.
812(Unlike the other futex operations described above,
813the PI-futex operations are designed
814for the implementation of very specific IPC mechanisms.)
815.\"
9e2b90ee
MK
816.\" Quoting Darren Hart:
817.\" These opcodes paired with the PI futex value policy (described below)
818.\" defines a "futex" as PI aware. These were created very specifically
819.\" in support of PI pthread_mutexes, so it makes a lot more sense to
820.\" talk about a PI aware pthread_mutex, than a PI aware futex, since
821.\" there is a lot of policy and scaffolding that has to be built up
822.\" around it to use it properly (this is what a PI pthread_mutex is).
823
ac894879 824.\" mtk: The following text is drawn from the Hart/Guniguntala paper
1af427a4 825.\" (listed in SEE ALSO), but I have reworded some pieces
f0a9e8f4 826.\" significntly.
79d918c7 827.\"
f0a9e8f4 828The PI-futex operations described below differ from the other
4b35dc5d
TR
829futex operations in that they impose policy on the use of the value of the
830futex word:
79d918c7 831.IP * 3
4b35dc5d 832If the lock is not acquired, the futex word's value shall be 0.
79d918c7 833.IP *
4c8cb0ff
MK
834If the lock is acquired, the futex word's value shall
835be the thread ID (TID;
4b35dc5d 836see
79d918c7
MK
837.BR gettid (2))
838of the owning thread.
839.IP *
79d918c7
MK
840If the lock is owned and there are threads contending for the lock,
841then the
842.B FUTEX_WAITERS
4b35dc5d 843bit shall be set in the futex word's value; in other words, this value is:
79d918c7
MK
844
845 FUTEX_WAITERS | TID
601399f3
MK
846.IP
847(Note that is invalid for a PI futex word to have no owner and
848.BR FUTEX_WAITERS
849set.)
79d918c7
MK
850.PP
851With this policy in place,
fd105614 852a user-space application can acquire an unacquired
601399f3 853lock or release a lock using atomic instructions executed in user mode
fd105614 854(e.g., a compare-and-swap operation such as
b52e1cd4
MK
855.I cmpxchg
856on the x86 architecture).
4c8cb0ff
MK
857Acquiring a lock simply consists of using compare-and-swap to atomically
858set the futex word's value to the caller's TID if its previous value was 0.
4b35dc5d
TR
859Releasing a lock requires using compare-and-swap to set the futex word's
860value to 0 if the previous value was the expected TID.
b52e1cd4 861
4b35dc5d 862If a futex is already acquired (i.e., has a nonzero value),
b52e1cd4 863waiters must employ the
79d918c7
MK
864.B FUTEX_LOCK_PI
865operation to acquire the lock.
4b35dc5d 866If other threads are waiting for the lock, then the
79d918c7 867.B FUTEX_WAITERS
4c8cb0ff
MK
868bit is set in the futex value;
869in this case, the lock owner must employ the
79d918c7 870.B FUTEX_UNLOCK_PI
b52e1cd4
MK
871operation to release the lock.
872
79d918c7
MK
873In the cases where callers are forced into the kernel
874(i.e., required to perform a
875.BR futex ()
0c3ec26b 876call),
79d918c7
MK
877they then deal directly with a so-called RT-mutex,
878a kernel locking mechanism which implements the required
879priority-inheritance semantics.
880After the RT-mutex is acquired, the futex value is updated accordingly,
881before the calling thread returns to user space.
79d918c7 882
a59fca75 883It is important to note
ac894879 884.\" tglx (July 2015):
30239c10
MK
885.\" If there are multiple waiters on a pi futex then a wake pi operation
886.\" will wake the first waiter and hand over the lock to this waiter. This
887.\" includes handing over the rtmutex which represents the futex in the
888.\" kernel. The strict requirement is that the futex owner and the rtmutex
889.\" owner must be the same, except for the update period which is
890.\" serialized by the futex internal locking. That means the kernel must
1d09c150 891.\" update the user-space value prior to returning to user space
4b35dc5d 892that the kernel will update the futex word's value prior
79d918c7 893to returning to user space.
601399f3
MK
894(This prevents the possibility of the futex word's value ending
895up in an invalid state, such as having an owner but the value being 0,
896or having waiters but not having the
897.B FUTEX_WAITERS
898bit set.)
30239c10
MK
899
900If a futex has an associated RT-mutex in the kernel
901(i.e., there are blocked waiters)
902and the owner of the futex/RT-mutex dies unexpectedly,
903then the kernel cleans up the RT-mutex and hands it over to the next waiter.
904This in turn requires that the user-space value is updated accordingly.
905To indicate that this is required, the kernel sets the
906.B FUTEX_OWNER_DIED
907bit in the futex word along with the thread ID of the new owner.
8adaf0a7
MK
908User space can detect this situation via the presence of the
909.B FUTEX_OWNER_DIED
910bit and is then responsible for cleaning up the stale state left over by
1d09c150 911the dead owner.
30239c10
MK
912.\" tglx (July 2015):
913.\" The FUTEX_OWNER_DIED bit can also be set on uncontended futexes, where
914.\" the kernel has no state associated. This happens via the robust futex
915.\" mechanism. In that case the futex value will be set to
916.\" FUTEX_OWNER_DIED. The robust futex mechanism is also available for non
917.\" PI futexes.
bd90a5f9 918
601399f3
MK
919PI futexes are operated on by specifying one of the values listed below in
920.IR futex_op .
921Note that the PI futex operations must be used as paired operations
922and are subject to some additional requirements:
923.IP * 3
924.B FUTEX_LOCK_PI
925and
926.B FUTEX_TRYLOCK_PI
927pair with
928.BR FUTEX_UNLOCK_PI.
929.B FUTEX_UNLOCK_PI
930must be called only on a futex owned by the calling thread,
931as defined by the value policy, otherwise the error
932.B EPERM
933results.
934.IP *
935.B FUTEX_WAIT_REQUEUE_PI
936pairs with
937.BR FUTEX_CMP_REQUEUE_PI .
938This must be performed from a non-PI futex to a distinct PI futex
939(or the error
940.B EINVAL
941results).
942Additionally,
943.I val
944(the number of waiters to be woken) must be 1
945(or the error
946.B EINVAL
947results).
948.P
949The PI futex operations are as follows:
70b06b90
MK
950.\"
951.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
952.\"
d67e21f5
MK
953.TP
954.BR FUTEX_LOCK_PI " (since Linux 2.6.18)"
955.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
bc54ed38 956This operation is used after an attempt to acquire
fd105614 957the lock via an atomic user-mode instruction failed
4b35dc5d 958because the futex word has a nonzero value\(emspecifically,
8297383e 959because it contained the (PID-namespace-specific) TID of the lock owner.
67833bec 960
4b35dc5d 961The operation checks the value of the futex word at the address
67833bec 962.IR uaddr .
70b06b90
MK
963If the value is 0, then the kernel tries to atomically set
964the futex value to the caller's TID.
c3875d1d 965If the futex word's value is nonzero,
67833bec 966the kernel atomically sets the
e0547e70 967.B FUTEX_WAITERS
67833bec
MK
968bit, which signals the futex owner that it cannot unlock the futex in
969user space atomically by setting the futex value to 0.
c3875d1d
MK
970.\" tglx (July 2015):
971.\" The operation here is similar to the FUTEX_WAIT logic. When the user
972.\" space atomic acquire does not succeed because the futex value was non
973.\" zero, then the waiter goes into the kernel, takes the kernel internal
974.\" lock and retries the acquisition under the lock. If the acquisition
975.\" does not succeed either, then it sets the FUTEX_WAITERS bit, to signal
976.\" the lock owner that it needs to go into the kernel. Here is the pseudo
977.\" code:
978.\"
979.\" lock(kernel_lock);
980.\" retry:
981.\"
982.\" /*
983.\" * Owner might have unlocked in userspace before we
984.\" * were able to set the waiter bit.
985.\" */
986.\" if (atomic_acquire(futex) == SUCCESS) {
987.\" unlock(kernel_lock());
988.\" return 0;
989.\" }
990.\"
991.\" /*
992.\" * Owner might have unlocked after the above atomic_acquire()
993.\" * attempt.
994.\" */
995.\" if (atomic_set_waiters_bit(futex) != SUCCESS)
996.\" goto retry;
997.\"
998.\" queue_waiter();
999.\" unlock(kernel_lock);
1000.\" block();
1001.\"
1002After that, the kernel:
1003.RS
1004.IP 1. 3
1005Tries to find the thread which is associated with the owner TID.
1006.IP 2.
1007Creates or reuses kernel state on behalf of the owner.
1008(If this is the first waiter, there is no kernel state for this
1009futex, so kernel state is created by locking the RT-mutex
1010and the futex owner is made the owner of the RT-mutex.
1011If there are existing waiters, then the existing state is reused.)
1012.IP 3.
ca4e5b2b 1013Attaches the waiter to the futex
c3875d1d
MK
1014(i.e., the waiter is enqueued on the RT-mutex waiter list).
1015.RE
1016.IP
ac894879
MK
1017If more than one waiter exists,
1018the enqueueing of the waiter is in descending priority order.
1019(For information on priority ordering, see the discussion of the
1020.BR SCHED_DEADLINE ,
1021.BR SCHED_FIFO ,
1022and
1023.BR SCHED_RR
1024scheduling policies in
1025.BR sched (7).)
1026The owner inherits either the waiter's CPU bandwidth
1027(if the waiter is scheduled under the
1028.BR SCHED_DEADLINE
1029policy) or the waiter's priority (if the waiter is scheduled under the
1030.BR SCHED_RR
1031or
1032.BR SCHED_FIFO
1033policy).
1d09c150
MK
1034.\" August 2015:
1035.\" mtk: If the realm is restricted purely to SCHED_OTHER (SCHED_NORMAL)
1036.\" processes, does the nice value come into play also?
1037.\"
1038.\" tglx: No. SCHED_OTHER/NORMAL tasks are handled in FIFO order
c3875d1d 1039This inheritance follows the lock chain in the case of nested locking
ca4e5b2b
MK
1040.\" (i.e., task 1 blocks on lock A, held by task 2,
1041.\" while task 2 blocks on lock B, held by task 3)
c3875d1d
MK
1042and performs deadlock detection.
1043
e0547e70
TG
1044The
1045.I timeout
9ce19cf1 1046argument provides a timeout for the lock attempt.
8064bfa5
MK
1047If
1048.I timeout
1049is not NULL, the structure it points to specifies
1050an absolute timeout, measured against the
9ce19cf1
MK
1051.BR CLOCK_REALTIME
1052clock.
1053If
1054.I timeout
1055is NULL, the operation will block indefinitely.
e0547e70 1056
a449c634 1057The
e0547e70
TG
1058.IR uaddr2 ,
1059.IR val ,
1060and
1061.IR val3
a449c634 1062arguments are ignored.
67833bec 1063.\"
70b06b90
MK
1064.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1065.\"
d67e21f5 1066.TP
12fdbe23 1067.BR FUTEX_TRYLOCK_PI " (since Linux 2.6.18)"
d67e21f5 1068.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
3fbb1be1 1069This operation tries to acquire the lock at
12fdbe23 1070.IR uaddr .
c3875d1d
MK
1071It is invoked when a user-space atomic acquire did not
1072succeed because the futex word was not 0.
1073
8adaf0a7
MK
1074Because the kernel has access to more state information than user space,
1075acquisition of the lock might succeed if performed by the
1076kernel in cases where the futex word
1077(i.e., the state information accessible to use-space) contains stale state
c3875d1d
MK
1078.RB ( FUTEX_WAITERS
1079and/or
1080.BR FUTEX_OWNER_DIED ).
1081This can happen when the owner of the futex died.
1d09c150
MK
1082User space cannot handle this condition in a race-free manner,
1083but the kernel can fix this up and acquire the futex.
ee65b0e8
MK
1084.\" Paraphrasing a f2f conversation with Thomas Gleixner about the
1085.\" above point (Aug 2015): ###
1086.\" There is a rare possibility of a race condition involving an
1087.\" uncontended futex with no owner, but with waiters. The
1088.\" kernel-user-space contract is that if a futex is nonzero, you must
1089.\" go into kernel. The futex was owned by a task, and that task dies
1090.\" but there are no waiters, so the futex value is non zero.
1091.\" Therefore, the next locker has to go into the kernel,
1092.\" so that the kernel has a chance to clean up. (CMXCH on zero
1093.\" in user space would fail, so kernel has to clean up.)
8adaf0a7
MK
1094.\" Darren Hart (Oct 2015):
1095.\" The trylock in the kernel has more state, so it can independently
1096.\" verify the flags that userspace must trust implicitly.
084744ef
MK
1097
1098The
1099.IR uaddr2 ,
1100.IR val ,
1101.IR timeout ,
1102and
1103.IR val3
1104arguments are ignored.
70b06b90
MK
1105.\"
1106.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1107.\"
d67e21f5 1108.TP
12fdbe23 1109.BR FUTEX_UNLOCK_PI " (since Linux 2.6.18)"
d67e21f5 1110.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
d4ba4328 1111This operation wakes the top priority waiter that is waiting in
ecae2099
TG
1112.B FUTEX_LOCK_PI
1113on the futex address provided by the
1114.I uaddr
1115argument.
1116
1d09c150 1117This is called when the user-space value at
ecae2099
TG
1118.I uaddr
1119cannot be changed atomically from a TID (of the owner) to 0.
1120
1121The
1122.IR uaddr2 ,
1123.IR val ,
1124.IR timeout ,
1125and
1126.IR val3
11a194bf 1127arguments are ignored.
70b06b90
MK
1128.\"
1129.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1130.\"
d67e21f5 1131.TP
d67e21f5
MK
1132.BR FUTEX_CMP_REQUEUE_PI " (since Linux 2.6.31)"
1133.\" commit 52400ba946759af28442dee6265c5c0180ac7122
f812a08b
DH
1134This operation is a PI-aware variant of
1135.BR FUTEX_CMP_REQUEUE .
1136It requeues waiters that are blocked via
1137.B FUTEX_WAIT_REQUEUE_PI
1138on
1139.I uaddr
1140from a non-PI source futex
1141.RI ( uaddr )
1142to a PI target futex
1143.RI ( uaddr2 ).
1144
9e54d26d
MK
1145As with
1146.BR FUTEX_CMP_REQUEUE ,
1147this operation wakes up a maximum of
1148.I val
1149waiters that are waiting on the futex at
1150.IR uaddr .
1151However, for
1152.BR FUTEX_CMP_REQUEUE_PI ,
1153.I val
6fbeb8f4 1154is required to be 1
939ca89f 1155(since the main point is to avoid a thundering herd).
9e54d26d
MK
1156The remaining waiters are removed from the wait queue of the source futex at
1157.I uaddr
1158and added to the wait queue of the target futex at
1159.IR uaddr2 .
f812a08b 1160
9e54d26d 1161The
768d3c23 1162.I val2
c6d8cf21
MK
1163.\" val2 is the cap on the number of requeued waiters.
1164.\" In the glibc pthread_cond_broadcast() implementation, this argument
1165.\" is specified as INT_MAX, and for pthread_cond_signal() it is 0.
9e54d26d 1166and
768d3c23 1167.I val3
9e54d26d
MK
1168arguments serve the same purposes as for
1169.BR FUTEX_CMP_REQUEUE .
70b06b90 1170.\"
8297383e 1171.\" The page at http://locklessinc.com/articles/futex_cheat_sheet/
be376673 1172.\" notes that "priority-inheritance Futex to priority-inheritance
8297383e
MK
1173.\" Futex requeues are currently unsupported". However, probably
1174.\" the page does not need to say nothing about this, since
1175.\" Thomas Gleixner commented (July 2015): "they never will be
1176.\" supported because they make no sense at all"
70b06b90
MK
1177.\"
1178.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1179.\"
d67e21f5
MK
1180.TP
1181.BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)"
1182.\" commit 52400ba946759af28442dee6265c5c0180ac7122
70b06b90 1183.\"
c3875d1d 1184Wait on a non-PI futex at
6ff1b4c0 1185.I uaddr
c3875d1d
MK
1186and potentially be requeued (via a
1187.BR FUTEX_CMP_REQUEUE_PI
1188operation in another task) onto a PI futex at
6ff1b4c0
TG
1189.IR uaddr2 .
1190The wait operation on
1191.I uaddr
c3875d1d 1192is the same as for
6ff1b4c0 1193.BR FUTEX_WAIT .
c3875d1d 1194
6ff1b4c0
TG
1195The waiter can be removed from the wait on
1196.I uaddr
6ff1b4c0 1197without requeueing on
c3875d1d
MK
1198.IR uaddr2
1199via a
1d09c150 1200.BR FUTEX_WAKE
c3875d1d
MK
1201operation in another task.
1202In this case, the
1203.BR FUTEX_WAIT_REQUEUE_PI
3fbb1be1
MK
1204operation fails with the error
1205.BR EAGAIN .
a4e69912 1206
63bea7dc
MK
1207If
1208.I timeout
8064bfa5
MK
1209is not NULL, the structure it points to specifies
1210an absolute timeout for the wait operation.
63bea7dc
MK
1211If
1212.I timeout
1213is NULL, the operation can block indefinitely.
1214
a4e69912
MK
1215The
1216.I val3
1217argument is ignored.
abb571e8
MK
1218
1219The
1220.BR FUTEX_WAIT_REQUEUE_PI
1221and
1222.BR FUTEX_CMP_REQUEUE_PI
1223were added to support a fairly specific use case:
1224support for priority-inheritance-aware POSIX threads condition variables.
1225The idea is that these operations should always be paired,
1226in order to ensure that user space and the kernel remain in sync.
1227Thus, in the
1228.BR FUTEX_WAIT_REQUEUE_PI
1229operation, the user-space application pre-specifies the target
1230of the requeue that takes place in the
1231.BR FUTEX_CMP_REQUEUE_PI
1232operation.
1233.\"
1234.\" Darren Hart notes that a patch to allow glibc to fully support
1af427a4 1235.\" PI-aware pthreads condition variables has not yet been accepted into
abb571e8
MK
1236.\" glibc. The story is complex, and can be found at
1237.\" https://sourceware.org/bugzilla/show_bug.cgi?id=11588
1238.\" Darren notes that in the meantime, the patch is shipped with various
1af427a4 1239.\" PREEMPT_RT-enabled Linux systems.
abb571e8
MK
1240.\"
1241.\" Related to the preceding, Darren proposed that somewhere, man-pages
1242.\" should document the following point:
1af427a4 1243.\"
4c8cb0ff
MK
1244.\" While the Linux kernel, since 2.6.31, supports requeueing of
1245.\" priority-inheritance (PI) aware mutexes via the
1246.\" FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI futex operations,
1247.\" the glibc implementation does not yet take full advantage of this.
1248.\" Specifically, the condvar internal data lock remains a non-PI aware
1249.\" mutex, regardless of the type of the pthread_mutex associated with
1250.\" the condvar. This can lead to an unbounded priority inversion on
1251.\" the internal data lock even when associating a PI aware
1252.\" pthread_mutex with a condvar during a pthread_cond*_wait
1253.\" operation. For this reason, it is not recommended to rely on
1254.\" priority inheritance when using pthread condition variables.
1af427a4
MK
1255.\"
1256.\" The problem is that the obvious location for this text is
1257.\" the pthread_cond*wait(3) man page. However, such a man page
abb571e8 1258.\" does not currently exist.
70b06b90 1259.\"
6700de24 1260.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
70b06b90 1261.\"
47297adb 1262.SH RETURN VALUE
fea681da 1263.PP
a5c5a06a
MK
1264In the event of an error (and assuming that
1265.BR futex ()
1266was invoked via
1267.BR syscall (2)),
1268all operations return \-1 and set
e808bba0 1269.I errno
6f147f79 1270to indicate the cause of the error.
adc89a97 1271
e808bba0
MK
1272The return value on success depends on the operation,
1273as described in the following list:
fea681da
MK
1274.TP
1275.B FUTEX_WAIT
077981d4 1276Returns 0 if the caller was woken up.
4c8cb0ff
MK
1277Note that a wake-up can also be caused by common futex usage patterns
1278in unrelated code that happened to have previously used the futex word's
1279memory location (e.g., typical futex-based implementations of
1280Pthreads mutexes can cause this under some conditions).
1281Therefore, callers should always conservatively assume that a return
1282value of 0 can mean a spurious wake-up, and use the futex word's value
bc54ed38
MK
1283(i.e., the user-space synchronization scheme)
1284to decide whether to continue to block or not.
fea681da
MK
1285.TP
1286.B FUTEX_WAKE
bdc5957a 1287Returns the number of waiters that were woken up.
fea681da
MK
1288.TP
1289.B FUTEX_FD
1290Returns the new file descriptor associated with the futex.
1291.TP
1292.B FUTEX_REQUEUE
bdc5957a 1293Returns the number of waiters that were woken up.
fea681da
MK
1294.TP
1295.B FUTEX_CMP_REQUEUE
bdc5957a 1296Returns the total number of waiters that were woken up or
4b35dc5d 1297requeued to the futex for the futex word at
3dfcc11d
MK
1298.IR uaddr2 .
1299If this value is greater than
1300.IR val ,
fd105614 1301then the difference is the number of waiters requeued to the futex for the
4c8cb0ff 1302futex word at
3dfcc11d 1303.IR uaddr2 .
dcad19c0
MK
1304.TP
1305.B FUTEX_WAKE_OP
a8b5b324 1306Returns the total number of waiters that were woken up.
4c8cb0ff
MK
1307This is the sum of the woken waiters on the two futexes for
1308the futex words at
a8b5b324
MK
1309.I uaddr
1310and
1311.IR uaddr2 .
dcad19c0
MK
1312.TP
1313.B FUTEX_WAIT_BITSET
077981d4
MK
1314Returns 0 if the caller was woken up.
1315See
4b35dc5d
TR
1316.B FUTEX_WAIT
1317for how to interpret this correctly in practice.
dcad19c0
MK
1318.TP
1319.B FUTEX_WAKE_BITSET
bdc5957a 1320Returns the number of waiters that were woken up.
dcad19c0
MK
1321.TP
1322.B FUTEX_LOCK_PI
bf02a260 1323Returns 0 if the futex was successfully locked.
dcad19c0
MK
1324.TP
1325.B FUTEX_TRYLOCK_PI
5c716eef 1326Returns 0 if the futex was successfully locked.
dcad19c0
MK
1327.TP
1328.B FUTEX_UNLOCK_PI
52bb928f 1329Returns 0 if the futex was successfully unlocked.
dcad19c0
MK
1330.TP
1331.B FUTEX_CMP_REQUEUE_PI
bdc5957a 1332Returns the total number of waiters that were woken up or
4b35dc5d 1333requeued to the futex for the futex word at
dddd395a
MK
1334.IR uaddr2 .
1335If this value is greater than
1336.IR val ,
4c8cb0ff
MK
1337then difference is the number of waiters requeued to the futex for
1338the futex word at
dddd395a 1339.IR uaddr2 .
dcad19c0
MK
1340.TP
1341.B FUTEX_WAIT_REQUEUE_PI
4c8cb0ff
MK
1342Returns 0 if the caller was successfully requeued to the futex for
1343the futex word at
22c15de9 1344.IR uaddr2 .
70b06b90
MK
1345.\"
1346.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1347.\"
fea681da
MK
1348.SH ERRORS
1349.TP
1350.B EACCES
4b35dc5d 1351No read access to the memory of a futex word.
fea681da
MK
1352.TP
1353.B EAGAIN
f48516d1 1354.RB ( FUTEX_WAIT ,
4b35dc5d 1355.BR FUTEX_WAIT_BITSET ,
f48516d1 1356.BR FUTEX_WAIT_REQUEUE_PI )
badbf70c
MK
1357The value pointed to by
1358.I uaddr
1359was not equal to the expected value
1360.I val
1361at the time of the call.
9732dd8b
MK
1362
1363.BR Note :
1364on Linux, the symbolic names
1365.B EAGAIN
1366and
1367.B EWOULDBLOCK
77da5feb 1368(both of which appear in different parts of the kernel futex code)
9732dd8b 1369have the same value.
badbf70c
MK
1370.TP
1371.B EAGAIN
8f2068bb
MK
1372.RB ( FUTEX_CMP_REQUEUE ,
1373.BR FUTEX_CMP_REQUEUE_PI )
ce5602fd 1374The value pointed to by
9f6c40c0
МК
1375.I uaddr
1376is not equal to the expected value
1377.IR val3 .
fea681da 1378.TP
5662f56a
MK
1379.BR EAGAIN
1380.RB ( FUTEX_LOCK_PI ,
aaec9032
MK
1381.BR FUTEX_TRYLOCK_PI ,
1382.BR FUTEX_CMP_REQUEUE_PI )
1383The futex owner thread ID of
1384.I uaddr
1385(for
1386.BR FUTEX_CMP_REQUEUE_PI :
1387.IR uaddr2 )
1388is about to exit,
5662f56a
MK
1389but has not yet handled the internal state cleanup.
1390Try again.
1391.TP
7a39e745
MK
1392.BR EDEADLK
1393.RB ( FUTEX_LOCK_PI ,
9732dd8b
MK
1394.BR FUTEX_TRYLOCK_PI ,
1395.BR FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1396The futex word at
7a39e745
MK
1397.I uaddr
1398is already locked by the caller.
1399.TP
662c0da8 1400.BR EDEADLK
c3875d1d 1401.\" FIXME . I see that kernel/locking/rtmutex.c uses EDEADLK in some
d6bb5a38 1402.\" places, and EDEADLOCK in others. On almost all architectures
4c8cb0ff
MK
1403.\" these constants are synonymous. Is there a reason that both
1404.\" names are used?
8297383e
MK
1405.\"
1406.\" tglx (July 2015): "No. We should probably fix that."
1407.\"
662c0da8 1408.RB ( FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1409While requeueing a waiter to the PI futex for the futex word at
662c0da8
MK
1410.IR uaddr2 ,
1411the kernel detected a deadlock.
1412.TP
fea681da 1413.B EFAULT
1ea901e8
MK
1414A required pointer argument (i.e.,
1415.IR uaddr ,
1416.IR uaddr2 ,
1417or
1418.IR timeout )
496df304 1419did not point to a valid user-space address.
fea681da 1420.TP
9f6c40c0 1421.B EINTR
e808bba0 1422A
9f6c40c0 1423.B FUTEX_WAIT
2674f781
MK
1424or
1425.B FUTEX_WAIT_BITSET
e808bba0 1426operation was interrupted by a signal (see
f529fd20
MK
1427.BR signal (7)).
1428In kernels before Linux 2.6.22, this error could also be returned for
1429on a spurious wakeup; since Linux 2.6.22, this no longer happens.
9f6c40c0 1430.TP
fea681da 1431.B EINVAL
180f97b7
MK
1432The operation in
1433.IR futex_op
1434is one of those that employs a timeout, but the supplied
fb2f4c27
MK
1435.I timeout
1436argument was invalid
1437.RI ( tv_sec
1438was less than zero, or
1439.IR tv_nsec
cabee29d 1440was not less than 1,000,000,000).
fb2f4c27
MK
1441.TP
1442.B EINVAL
0c74df0b 1443The operation specified in
025e1374 1444.IR futex_op
0c74df0b 1445employs one or both of the pointers
51ee94be 1446.I uaddr
a1f47699 1447and
0c74df0b
MK
1448.IR uaddr2 ,
1449but one of these does not point to a valid object\(emthat is,
1450the address is not four-byte-aligned.
51ee94be
MK
1451.TP
1452.B EINVAL
55cc422d
TG
1453.RB ( FUTEX_WAIT_BITSET ,
1454.BR FUTEX_WAKE_BITSET )
5e1456d4 1455The bit mask supplied in
79c9b436
TG
1456.IR val3
1457is zero.
1458.TP
1459.B EINVAL
2abcba67 1460.RB ( FUTEX_CMP_REQUEUE_PI )
add875c0
MK
1461.I uaddr
1462equals
1463.IR uaddr2
1464(i.e., an attempt was made to requeue to the same futex).
1465.TP
ff597681
MK
1466.BR EINVAL
1467.RB ( FUTEX_FD )
1468The signal number supplied in
1469.I val
1470is invalid.
1471.TP
6bac3b85 1472.B EINVAL
476debd7
MK
1473.RB ( FUTEX_WAKE ,
1474.BR FUTEX_WAKE_OP ,
1475.BR FUTEX_WAKE_BITSET ,
1476.BR FUTEX_REQUEUE ,
1477.BR FUTEX_CMP_REQUEUE )
1478The kernel detected an inconsistency between the user-space state at
1479.I uaddr
1480and the kernel state\(emthat is, it detected a waiter which waits in
1481.BR FUTEX_LOCK_PI
1482on
1483.IR uaddr .
1484.TP
1485.B EINVAL
a218ef20 1486.RB ( FUTEX_LOCK_PI ,
ce022f18
MK
1487.BR FUTEX_TRYLOCK_PI ,
1488.BR FUTEX_UNLOCK_PI )
a218ef20
MK
1489The kernel detected an inconsistency between the user-space state at
1490.I uaddr
1491and the kernel state.
ce022f18 1492This indicates either state corruption
ce022f18 1493or that the kernel found a waiter on
a218ef20
MK
1494.I uaddr
1495which is waiting via
1496.BR FUTEX_WAIT
1497or
1498.BR FUTEX_WAIT_BITSET .
1499.TP
1500.B EINVAL
f9250b1a
MK
1501.RB ( FUTEX_CMP_REQUEUE_PI )
1502The kernel detected an inconsistency between the user-space state at
99c0041d
MK
1503.I uaddr2
1504and the kernel state;
ee65b0e8
MK
1505.\" From a conversation with Thomas Gleixner (Aug 2015): ###
1506.\" The kernel sees: I have non PI state for a futex you tried to
1507.\" tell me was PI
99c0041d
MK
1508that is, the kernel detected a waiter which waits via
1509.BR FUTEX_WAIT
8297383e
MK
1510or
1511.BR FUTEX_WAIT_BITSET
99c0041d
MK
1512on
1513.IR uaddr2 .
1514.TP
1515.B EINVAL
1516.RB ( FUTEX_CMP_REQUEUE_PI )
1517The kernel detected an inconsistency between the user-space state at
f9250b1a
MK
1518.I uaddr
1519and the kernel state;
1520that is, the kernel detected a waiter which waits via
75299c8d 1521.BR FUTEX_WAIT
99c0041d 1522or
75299c8d 1523.BR FUTEX_WAIT_BITESET
f9250b1a
MK
1524on
1525.IR uaddr .
1526.TP
1527.B EINVAL
99c0041d 1528.RB ( FUTEX_CMP_REQUEUE_PI )
75299c8d
MK
1529The kernel detected an inconsistency between the user-space state at
1530.I uaddr
1531and the kernel state;
1532that is, the kernel detected a waiter which waits on
1533.I uaddr
1534via
1535.BR FUTEX_LOCK_PI
1536(instead of
1537.BR FUTEX_WAIT_REQUEUE_PI ).
99c0041d
MK
1538.TP
1539.B EINVAL
9786b3ca 1540.RB ( FUTEX_CMP_REQUEUE_PI )
8297383e
MK
1541.\" This deals with the case:
1542.\" wait_requeue_pi(A, B);
1543.\" requeue_pi(A, C);
9786b3ca
MK
1544An attempt was made to requeue a waiter to a futex other than that
1545specified by the matching
1546.B FUTEX_WAIT_REQUEUE_PI
1547call for that waiter.
1548.TP
1549.B EINVAL
f0c0d61c
MK
1550.RB ( FUTEX_CMP_REQUEUE_PI )
1551The
1552.I val
1553argument is not 1.
1554.TP
1555.B EINVAL
4832b48a 1556Invalid argument.
fea681da 1557.TP
a449c634
MK
1558.BR ENOMEM
1559.RB ( FUTEX_LOCK_PI ,
e34a8fb6
MK
1560.BR FUTEX_TRYLOCK_PI ,
1561.BR FUTEX_CMP_REQUEUE_PI )
a449c634
MK
1562The kernel could not allocate memory to hold state information.
1563.TP
fea681da 1564.B ENFILE
ff597681 1565.RB ( FUTEX_FD )
e258766b 1566The system-wide limit on the total number of open files has been reached.
4701fc28
MK
1567.TP
1568.B ENOSYS
1569Invalid operation specified in
d33602c4 1570.IR futex_op .
9f6c40c0 1571.TP
4a7e5b05
MK
1572.B ENOSYS
1573The
1574.BR FUTEX_CLOCK_REALTIME
1575option was specified in
1afcee7c 1576.IR futex_op ,
4a7e5b05 1577but the accompanying operation was neither
017d194b
MK
1578.BR FUTEX_WAIT ,
1579.BR FUTEX_WAIT_BITSET ,
4a7e5b05
MK
1580nor
1581.BR FUTEX_WAIT_REQUEUE_PI .
1582.TP
a9dcb4d1
MK
1583.BR ENOSYS
1584.RB ( FUTEX_LOCK_PI ,
f2424fae 1585.BR FUTEX_TRYLOCK_PI ,
4945ff19 1586.BR FUTEX_UNLOCK_PI ,
4cf92894 1587.BR FUTEX_CMP_REQUEUE_PI ,
794bb106 1588.BR FUTEX_WAIT_REQUEUE_PI )
4b35dc5d 1589A run-time check determined that the operation is not available.
f0a9e8f4 1590The PI-futex operations are not implemented on all architectures and
077981d4 1591are not supported on some CPU variants.
a9dcb4d1 1592.TP
c7589177
MK
1593.BR EPERM
1594.RB ( FUTEX_LOCK_PI ,
dc2742a8
MK
1595.BR FUTEX_TRYLOCK_PI ,
1596.BR FUTEX_CMP_REQUEUE_PI )
04331c3f 1597The caller is not allowed to attach itself to the futex at
dc2742a8
MK
1598.I uaddr
1599(for
1600.BR FUTEX_CMP_REQUEUE_PI :
1601the futex at
1602.IR uaddr2 ).
c7589177
MK
1603(This may be caused by a state corruption in user space.)
1604.TP
76f347ba 1605.BR EPERM
87276709 1606.RB ( FUTEX_UNLOCK_PI )
4b35dc5d 1607The caller does not own the lock represented by the futex word.
76f347ba 1608.TP
0b0e4934
MK
1609.BR ESRCH
1610.RB ( FUTEX_LOCK_PI ,
9732dd8b
MK
1611.BR FUTEX_TRYLOCK_PI ,
1612.BR FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1613The thread ID in the futex word at
0b0e4934
MK
1614.I uaddr
1615does not exist.
1616.TP
360f773c
MK
1617.BR ESRCH
1618.RB ( FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1619The thread ID in the futex word at
360f773c
MK
1620.I uaddr2
1621does not exist.
1622.TP
9f6c40c0 1623.B ETIMEDOUT
4d85047f
MK
1624The operation in
1625.IR futex_op
1626employed the timeout specified in
1627.IR timeout ,
1628and the timeout expired before the operation completed.
70b06b90
MK
1629.\"
1630.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1631.\"
47297adb 1632.SH VERSIONS
a1d5f77c 1633.PP
81c9d87e
MK
1634Futexes were first made available in a stable kernel release
1635with Linux 2.6.0.
1636
4c8cb0ff
MK
1637Initial futex support was merged in Linux 2.5.7 but with different
1638semantics from what was described above.
52dee70e 1639A four-argument system call with the semantics
fd3fa7ef 1640described in this page was introduced in Linux 2.5.40.
d0442d14
MK
1641A fifth argument was added in Linux 2.5.70,
1642and a sixth argument was added in Linux 2.6.7.
47297adb 1643.SH CONFORMING TO
8382f16d 1644This system call is Linux-specific.
47297adb 1645.SH NOTES
baf0f1f4
MK
1646Glibc does not provide a wrapper for this system call; call it using
1647.BR syscall (2).
cf44281c 1648
02f7b623 1649Several higher-level programming abstractions are implemented via futexes,
e24fbf10 1650including POSIX semaphores and
02f7b623
MK
1651various POSIX threads synchronization mechanisms
1652(mutexes, condition variables, read-write locks, and barriers).
74f58a64
MK
1653.\" TODO FIXME(Torvald) Above, we cite this section and claim it contains
1654.\" details on the synchronization semantics; add the C11 equivalents
1655.\" here (or whatever we find consensus for).
305cc415
MK
1656.\"
1657.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1658.\"
1659.SH EXAMPLE
1660.\" FIXME Is it worth having an example program?
1661.\" FIXME Anything obviously broken in the example program?
1662.\"
bc54ed38
MK
1663The program below demonstrates use of futexes in a program where a parent
1664process and a child process use a pair of futexes located inside a
305cc415
MK
1665shared anonymous mapping to synchronize access to a shared resource:
1666the terminal.
1667The two processes each write
1668.IR nloops
1669(a command-line argument that defaults to 5 if omitted)
1670messages to the terminal and employ a synchronization protocol
1671that ensures that they alternate in writing messages.
1672Upon running this program we see output such as the following:
1673
1674.in +4n
1675.nf
1676$ \fB./futex_demo\fP
1677Parent (18534) 0
1678Child (18535) 0
1679Parent (18534) 1
1680Child (18535) 1
1681Parent (18534) 2
1682Child (18535) 2
1683Parent (18534) 3
1684Child (18535) 3
1685Parent (18534) 4
1686Child (18535) 4
1687.fi
1688.in
1689.SS Program source
1690\&
1691.nf
1692/* futex_demo.c
1693
1694 Usage: futex_demo [nloops]
1695 (Default: 5)
1696
1697 Demonstrate the use of futexes in a program where parent and child
1698 use a pair of futexes located inside a shared anonymous mapping to
1699 synchronize access to a shared resource: the terminal. The two
1700 processes each write \(aqnum\-loops\(aq messages to the terminal and employ
1701 a synchronization protocol that ensures that they alternate in
1702 writing messages.
1703*/
1704#define _GNU_SOURCE
1705#include <stdio.h>
1706#include <errno.h>
1707#include <stdlib.h>
1708#include <unistd.h>
1709#include <sys/wait.h>
1710#include <sys/mman.h>
1711#include <sys/syscall.h>
1712#include <linux/futex.h>
1713#include <sys/time.h>
1714
1715#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \\
1716 } while (0)
1717
1718static int *futex1, *futex2, *iaddr;
1719
1720static int
1721futex(int *uaddr, int futex_op, int val,
1722 const struct timespec *timeout, int *uaddr2, int val3)
1723{
1724 return syscall(SYS_futex, uaddr, futex_op, val,
1725 timeout, uaddr, val3);
1726}
1727
1728/* Acquire the futex pointed to by \(aqfutexp\(aq: wait for its value to
1729 become 1, and then set the value to 0. */
1730
1731static void
1732fwait(int *futexp)
1733{
1734 int s;
1735
1736 /* __sync_bool_compare_and_swap(ptr, oldval, newval) is a gcc
1737 built\-in function. It atomically performs the equivalent of:
1738
1739 if (*ptr == oldval)
1740 *ptr = newval;
1741
1742 It returns true if the test yielded true and *ptr was updated.
1743 The alternative here would be to employ the equivalent atomic
1744 machine\-language instructions. For further information, see
1745 the GCC Manual. */
1746
305cc415 1747 while (1) {
83e80dda 1748
63ad44cb 1749 /* Is the futex available? */
83e80dda 1750
305cc415
MK
1751 if (__sync_bool_compare_and_swap(futexp, 1, 0))
1752 break; /* Yes */
1753
63ad44cb 1754 /* Futex is not available; wait */
83e80dda 1755
63ad44cb
HS
1756 s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0);
1757 if (s == \-1 && errno != EAGAIN)
1758 errExit("futex\-FUTEX_WAIT");
305cc415
MK
1759 }
1760}
1761
1762/* Release the futex pointed to by \(aqfutexp\(aq: if the futex currently
1763 has the value 0, set its value to 1 and the wake any futex waiters,
1764 so that if the peer is blocked in fpost(), it can proceed. */
1765
1766static void
1767fpost(int *futexp)
1768{
1769 int s;
1770
1771 /* __sync_bool_compare_and_swap() was described in comments above */
1772
1773 if (__sync_bool_compare_and_swap(futexp, 0, 1)) {
1774
1775 s = futex(futexp, FUTEX_WAKE, 1, NULL, NULL, 0);
1776 if (s == \-1)
1777 errExit("futex\-FUTEX_WAKE");
1778 }
1779}
1780
1781int
1782main(int argc, char *argv[])
1783{
1784 pid_t childPid;
1785 int j, nloops;
1786
1787 setbuf(stdout, NULL);
1788
1789 nloops = (argc > 1) ? atoi(argv[1]) : 5;
1790
1791 /* Create a shared anonymous mapping that will hold the futexes.
1792 Since the futexes are being shared between processes, we
1793 subsequently use the "shared" futex operations (i.e., not the
1794 ones suffixed "_PRIVATE") */
1795
1796 iaddr = mmap(NULL, sizeof(int) * 2, PROT_READ | PROT_WRITE,
1797 MAP_ANONYMOUS | MAP_SHARED, \-1, 0);
1798 if (iaddr == MAP_FAILED)
1799 errExit("mmap");
1800
1801 futex1 = &iaddr[0];
1802 futex2 = &iaddr[1];
1803
1804 *futex1 = 0; /* State: unavailable */
1805 *futex2 = 1; /* State: available */
1806
1807 /* Create a child process that inherits the shared anonymous
35764662 1808 mapping */
305cc415
MK
1809
1810 childPid = fork();
92a46690 1811 if (childPid == \-1)
305cc415
MK
1812 errExit("fork");
1813
1814 if (childPid == 0) { /* Child */
1815 for (j = 0; j < nloops; j++) {
1816 fwait(futex1);
1817 printf("Child (%ld) %d\\n", (long) getpid(), j);
1818 fpost(futex2);
1819 }
1820
1821 exit(EXIT_SUCCESS);
1822 }
1823
1824 /* Parent falls through to here */
1825
1826 for (j = 0; j < nloops; j++) {
1827 fwait(futex2);
1828 printf("Parent (%ld) %d\\n", (long) getpid(), j);
1829 fpost(futex1);
1830 }
1831
1832 wait(NULL);
1833
1834 exit(EXIT_SUCCESS);
1835}
1836.fi
47297adb 1837.SH SEE ALSO
4c222281 1838.ad l
9913033c 1839.BR get_robust_list (2),
d806bc05 1840.BR restart_syscall (2),
e0074751 1841.BR pthread_mutexattr_getprotocol (3),
ac894879
MK
1842.BR futex (7),
1843.BR sched (7)
fea681da 1844.PP
f5ad572f
MK
1845The following kernel source files:
1846.IP * 2
1847.I Documentation/pi-futex.txt
1848.IP *
1849.I Documentation/futex-requeue-pi.txt
1850.IP *
1851.I Documentation/locking/rt-mutex.txt
1852.IP *
1853.I Documentation/locking/rt-mutex-design.txt
8fe019c7
MK
1854.IP *
1855.I Documentation/robust-futex-ABI.txt
43b99089 1856.PP
4c222281 1857Franke, H., Russell, R., and Kirwood, M., 2002.
52087dd3 1858\fIFuss, Futexes and Furwocks: Fast Userlevel Locking in Linux\fP
4c222281 1859(from proceedings of the Ottawa Linux Symposium 2002),
9b936e9e 1860.br
608bf950
SK
1861.UR http://kernel.org\:/doc\:/ols\:/2002\:/ols2002-pages-479-495.pdf
1862.UE
f42eb21b 1863
4c222281 1864Hart, D., 2009. \fIA futex overview and update\fP,
2ed26199
MK
1865.UR http://lwn.net/Articles/360699/
1866.UE
1867
4c222281 1868Hart, D. and Guniguntala, D., 2009.
0483b6cc 1869\fIRequeue-PI: Making Glibc Condvars PI-Aware\fP
4c222281 1870(from proceedings of the 2009 Real-Time Linux Workshop),
0483b6cc
MK
1871.UR http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
1872.UE
1873
4c222281 1874Drepper, U., 2011. \fIFutexes Are Tricky\fP,
f42eb21b
MK
1875.UR http://www.akkadia.org/drepper/futex.pdf
1876.UE
9b936e9e
MK
1877.PP
1878Futex example library, futex-*.tar.bz2 at
1879.br
a605264d 1880.UR ftp://ftp.kernel.org\:/pub\:/linux\:/kernel\:/people\:/rusty/
608bf950 1881.UE
34f14794
MK
1882.\"
1883.\" FIXME Are there any other resources that should be listed
1884.\" in the SEE ALSO section?
74f58a64 1885.\" FIXME(Torvald) We should probably refer to the glibc code here, in
4c8cb0ff
MK
1886.\" particular the glibc-internal futex wrapper functions that are
1887.\" WIP, and the generic pthread_mutex_t and perhaps condvar
1888.\" implementations.