]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/futex.2
charmap.5: ffix
[thirdparty/man-pages.git] / man2 / futex.2
CommitLineData
8f0aff2a 1.\" Page by b.hubert
1abce893
MK
2.\" and Copyright (C) 2015, Thomas Gleixner <tglx@linutronix.de>
3.\" and Copyright (C) 2015, Michael Kerrisk <mtk.manpages@gmail.com>
2297bf0e 4.\"
2e46a6e7 5.\" %%%LICENSE_START(FREELY_REDISTRIBUTABLE)
8f0aff2a 6.\" may be freely modified and distributed
8ff7380d 7.\" %%%LICENSE_END
fea681da
MK
8.\"
9.\" Niki A. Rahimi (LTC Security Development, narahimi@us.ibm.com)
10.\" added ERRORS section.
11.\"
12.\" Modified 2004-06-17 mtk
13.\" Modified 2004-10-07 aeb, added FUTEX_REQUEUE, FUTEX_CMP_REQUEUE
14.\"
47f5c4ba
MK
15.\" FIXME Still to integrate are some points from Torvald Riegel's mail of
16.\" 2015-01-23:
17.\" http://thread.gmane.org/gmane.linux.kernel/1703405/focus=7977
18.\"
78e85692 19.\" FIXME Do we need to add some text regarding Torvald Riegel's 2015-01-24 mail
02182e7c
MK
20.\" at http://thread.gmane.org/gmane.linux.kernel/1703405/focus=1873242
21.\"
97986708 22.TH FUTEX 2 2016-03-15 "Linux" "Linux Programmer's Manual"
fea681da 23.SH NAME
ce154705 24futex \- fast user-space locking
fea681da 25.SH SYNOPSIS
9d9dc1e8 26.nf
fea681da
MK
27.sp
28.B "#include <linux/futex.h>"
fea681da
MK
29.B "#include <sys/time.h>"
30.sp
d33602c4 31.BI "int futex(int *" uaddr ", int " futex_op ", int " val ,
768d3c23 32.BI " const struct timespec *" timeout , \
c6dc40a2 33" \fR /* or: \fBuint32_t \fIval2\fP */
9d9dc1e8 34.BI " int *" uaddr2 ", int " val3 );
9d9dc1e8 35.fi
409f08b0 36
b939d6e4
MK
37.IR Note :
38There is no glibc wrapper for this system call; see NOTES.
47297adb 39.SH DESCRIPTION
fea681da
MK
40.PP
41The
e511ffb6 42.BR futex ()
4b35dc5d 43system call provides a method for waiting until a certain condition becomes
077981d4
MK
44true.
45It is typically used as a blocking construct in the context of
d45f244c
MK
46shared-memory synchronization.
47When using futexes, the majority of
48the synchronization operations are performed in user space.
bc54ed38 49A user-space program employs the
d45f244c 50.BR futex ()
ca4e5b2b 51system call only when it is likely that the program has to block for
4c8cb0ff 52a longer time until the condition becomes true.
bc54ed38 53Other
d45f244c 54.BR futex ()
bc54ed38
MK
55operations can be used to wake any processes or threads waiting
56for a particular condition.
4b35dc5d 57
7e8dcabc
MK
58A futex is a 32-bit value\(emreferred to below as a
59.IR "futex word" \(emwhose
60address is supplied to the
4b35dc5d 61.BR futex ()
7e8dcabc 62system call.
c3f4c019 63(Futexes are 32 bits in size on all platforms, including 64-bit systems.)
7e8dcabc
MK
64All futex operations are governed by this value.
65In order to share a futex between processes,
66the futex is placed in a region of shared memory,
67created using (for example)
68.BR mmap (2)
69or
70.BR shmat (2).
c3f4c019 71(Thus, the futex word may have different
7e8dcabc
MK
72virtual addresses in different processes,
73but these addresses all refer to the same location in physical memory.)
ca4e5b2b
MK
74In a multithreaded program, it is sufficient to place the futex word
75in a global variable shared by all threads.
809ca3ae 76
0c3ec26b
MK
77When executing a futex operation that requests to block a thread,
78the kernel will block only if the futex word has the value that the
55f9e85e
MK
79calling thread supplied (as one of the arguments of the
80.BR futex ()
81call) as the expected value of the futex word.
9d32a39b
MK
82The loading of the futex word's value,
83the comparison of that value with the expected value,
bc54ed38 84and the actual blocking will happen atomically and will be totally ordered
da894b18 85with respect to concurrent operations performed by other threads
0fb87d16 86on the same futex word.
da894b18
MK
87.\" Notes from Darren Hart (Dec 2015):
88.\" Totally ordered with respect futex operations refers to semantics
89.\" of the ACQUIRE/RELEASE operations and how they impact ordering of
90.\" memory reads and writes. The kernel futex operations are protected
f6615c42 91.\" by spinlocks, which ensure that all operations are serialized
da894b18
MK
92.\" with respect to one another.
93.\"
94.\" This is a lot to attempt to define in this document. Perhaps a
95.\" reference to linux/Documentation/memory-barriers.txt as a footnote
96.\" would be sufficient? Or perhaps for this manual, "serialized" would
97.\" be sufficient, with a footnote regarding "totally ordered" and a
98.\" pointer to the memory-barrier documentation?
b80daba2 99Thus, the futex word is used to connect the synchronization in user space
9d32a39b 100with the implementation of blocking by the kernel.
55f9e85e 101Analogously to an atomic
4b35dc5d 102compare-and-exchange operation that potentially changes shared memory,
077981d4 103blocking via a futex is an atomic compare-and-block operation.
d6bb5a38
MK
104.\" FIXME(Torvald Riegel):
105.\" Eventually we want to have some text in NOTES to satisfy
106.\" the reference in the following sentence
107.\" See NOTES for
108.\" a detailed specification of the synchronization semantics.
4b35dc5d 109
ca4e5b2b 110One use of futexes is for implementing locks.
c0dc758e
MK
111The state of the lock (i.e., acquired or not acquired)
112can be represented as an atomically accessed flag in shared memory.
4c8cb0ff 113In the uncontended case,
c3f4c019 114a thread can access or modify the lock state with atomic instructions,
4c8cb0ff
MK
115for example atomically changing it from not acquired to acquired
116using an atomic compare-and-exchange instruction.
55f9e85e
MK
117(Such instructions are performed entirely in user mode,
118and the kernel maintains no information about the lock state.)
119On the other hand, a thread may be unable to acquire a lock because
8e754e12 120it is already acquired by another thread.
55f9e85e 121It then may pass the lock's flag as a futex word and the value
0c3ec26b 122representing the acquired state as the expected value to a
8e754e12
HS
123.BR futex ()
124wait operation.
55f9e85e 125This
8e754e12 126.BR futex ()
bc54ed38 127operation will block if and only if the lock is still acquired
f6615c42 128(i.e., the value in the futex word still matches the "acquired state").
077981d4 129When releasing the lock, a thread has to first reset the
0c3ec26b 130lock state to not acquired and then execute a futex
55f9e85e 131operation that wakes threads blocked on the lock flag used as a futex word
f6615c42 132(this can be further optimized to avoid unnecessary wake-ups).
077981d4 133See
4b35dc5d
TR
134.BR futex (7)
135for more detail on how to use futexes.
136
137Besides the basic wait and wake-up futex functionality, there are further
077981d4 138futex operations aimed at supporting more complex use cases.
ca4e5b2b
MK
139
140Note that
2af84f99 141no explicit initialization or destruction is necessary to use futexes;
4c8cb0ff
MK
142the kernel maintains a futex
143(i.e., the kernel-internal implementation artifact)
4b35dc5d
TR
144only while operations such as
145.BR FUTEX_WAIT ,
146described below, are being performed on a particular futex word.
a663ca5a
MK
147.\"
148.SS Arguments
fea681da
MK
149The
150.I uaddr
077981d4
MK
151argument points to the futex word.
152On all platforms, futexes are four-byte
4b35dc5d 153integers that must be aligned on a four-byte boundary.
f388ba70
MK
154The operation to perform on the futex is specified in the
155.I futex_op
156argument;
157.IR val
158is a value whose meaning and purpose depends on
159.IR futex_op .
36ab2074
MK
160
161The remaining arguments
162.RI ( timeout ,
163.IR uaddr2 ,
164and
165.IR val3 )
166are required only for certain of the futex operations described below.
167Where one of these arguments is not required, it is ignored.
768d3c23 168
36ab2074
MK
169For several blocking operations, the
170.I timeout
171argument is a pointer to a
172.IR timespec
173structure that specifies a timeout for the operation.
174However, notwithstanding the prototype shown above, for some operations,
10022b8e 175the least significant four bytes are used as an integer whose meaning
36ab2074 176is determined by the operation.
768d3c23
MK
177For these operations, the kernel casts the
178.I timeout
10022b8e
HS
179value first to
180.IR "unsigned long",
181then to
c6dc40a2 182.IR uint32_t ,
768d3c23
MK
183and in the remainder of this page, this argument is referred to as
184.I val2
185when interpreted in this fashion.
186
de5a3bb4 187Where it is required, the
36ab2074 188.IR uaddr2
4c8cb0ff
MK
189argument is a pointer to a second futex word that is employed
190by the operation.
36ab2074
MK
191The interpretation of the final integer argument,
192.IR val3 ,
193depends on the operation.
a663ca5a
MK
194.\"
195.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
196.\"
197.SS Futex operations
6be4bad7 198The
d33602c4 199.I futex_op
6be4bad7
MK
200argument consists of two parts:
201a command that specifies the operation to be performed,
ed1819cf 202bit-wise ORed with zero or more options that
6be4bad7 203modify the behaviour of the operation.
fc30eb79 204The options that may be included in
d33602c4 205.I futex_op
fc30eb79
TG
206are as follows:
207.TP
208.BR FUTEX_PRIVATE_FLAG " (since Linux 2.6.22)"
209.\" commit 34f01cc1f512fa783302982776895c73714ebbc2
210This option bit can be employed with all futex operations.
e45f9735 211It tells the kernel that the futex is process-private and not shared
0c3ec26b
MK
212with another process (i.e., it is being used for synchronization
213only between threads of the same process).
943ccc52
MK
214This allows the kernel to make some additional performance optimizations.
215.\" I.e., It allows the kernel choose the fast path for validating
216.\" the user-space address and avoids expensive VMA lookups,
217.\" taking reference counts on file backing store, and so on.
ae2c1774
MK
218
219As a convenience,
220.IR <linux/futex.h>
221defines a set of constants with the suffix
222.BR _PRIVATE
223that are equivalents of all of the operations listed below,
dcdfde26 224.\" except the obsolete FUTEX_FD, for which the "private" flag was
ae2c1774
MK
225.\" meaningless
226but with the
227.BR FUTEX_PRIVATE_FLAG
228ORed into the constant value.
229Thus, there are
230.BR FUTEX_WAIT_PRIVATE ,
231.BR FUTEX_WAKE_PRIVATE ,
232and so on.
2e98bbc2
TG
233.TP
234.BR FUTEX_CLOCK_REALTIME " (since Linux 2.6.28)"
235.\" commit 1acdac104668a0834cfa267de9946fac7764d486
4a7e5b05 236This option bit can be employed only with the
949ceae3
MK
237.BR FUTEX_WAIT_BITSET ,
238.BR FUTEX_WAIT_REQUEUE_PI ,
2e98bbc2 239and
949ceae3
MK
240.BR FUTEX_WAIT
241(since Linux 4.5)
242.\" commit 337f13046ff03717a9e99675284a817527440a49
c84cf68c 243operations.
2e98bbc2 244
8064bfa5 245If this option is set, the kernel measures the
f2103b26 246.I timeout
8064bfa5
MK
247against the
248.BR CLOCK_REALTIME
249clock.
2e98bbc2 250
8064bfa5 251If this option is not set, the kernel measures the
f2103b26 252.I timeout
8064bfa5 253against the
1c952cf5
MK
254.BR CLOCK_MONOTONIC
255clock.
6be4bad7
MK
256.PP
257The operation specified in
d33602c4 258.I futex_op
6be4bad7 259is one of the following:
70b06b90
MK
260.\"
261.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
262.\"
fea681da 263.TP
81c9d87e
MK
264.BR FUTEX_WAIT " (since Linux 2.6.0)"
265.\" Strictly speaking, since some time in 2.5.x
f065673c 266This operation tests that the value at the
4b35dc5d 267futex word pointed to by the address
fea681da 268.I uaddr
4b35dc5d 269still contains the expected value
fea681da 270.IR val ,
fd105614 271and if so, then sleeps waiting for a
682edefb 272.B FUTEX_WAKE
fd105614 273operation on the futex word.
077981d4 274The load of the value of the futex word is an atomic memory
4b35dc5d 275access (i.e., using atomic machine instructions of the respective
077981d4
MK
276architecture).
277This load, the comparison with the expected value, and
fd105614 278starting to sleep are performed atomically
da56650a 279.\" FIXME: Torvald, I think we may need to add some explanation of
cd175805 280.\" "totally ordered" here.
fd105614
MK
281and totally ordered
282with respect to other futex operations on the same futex word.
c0dc758e
MK
283If the thread starts to sleep,
284it is considered a waiter on this futex word.
f065673c
MK
285If the futex value does not match
286.IR val ,
4710334a 287then the call fails immediately with the error
badbf70c 288.BR EAGAIN .
4b35dc5d
TR
289
290The purpose of the comparison with the expected value is to prevent lost
fd105614
MK
291wake-ups.
292If another thread changed the value of the futex word after the
c0dc758e
MK
293calling thread decided to block based on the prior value,
294and if the other thread executed a
4b35dc5d
TR
295.BR FUTEX_WAKE
296operation (or similar wake-up) after the value change and before this
f065673c 297.BR FUTEX_WAIT
bc54ed38
MK
298operation, then the calling thread will observe the
299value change and will not start to sleep.
1909e523 300
c13182ef 301If the
fea681da 302.I timeout
8064bfa5
MK
303is not NULL, the structure it points to specifies
304a relative timeout for the wait, measured according to the
1c952cf5
MK
305.BR CLOCK_MONOTONIC
306clock.
82a6092b 307(This interval will be rounded up to the system clock granularity,
8297383e 308and is guaranteed not to expire early.)
82a6092b
MK
309If
310.I timeout
311is NULL, the call blocks indefinitely.
4798a7f3 312
c13182ef 313The arguments
fea681da
MK
314.I uaddr2
315and
316.I val3
317are ignored.
74f58a64
MK
318.\" FIXME(Torvald) I think we should remove this. Or maybe adapt to a
319.\" different example.
4b35dc5d
TR
320.\" For
321.\" .BR futex (7),
322.\" this call is executed if decrementing the count gave a negative value
323.\" (indicating contention),
324.\" and will sleep until another process or thread releases
325.\" the futex and executes the
326.\" .B FUTEX_WAKE
327.\" operation.
70b06b90
MK
328.\"
329.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
330.\"
fea681da 331.TP
81c9d87e
MK
332.BR FUTEX_WAKE " (since Linux 2.6.0)"
333.\" Strictly speaking, since Linux 2.5.x
f065673c
MK
334This operation wakes at most
335.I val
4b35dc5d 336of the waiters that are waiting (e.g., inside
f065673c 337.BR FUTEX_WAIT )
4b35dc5d 338on the futex word at the address
f065673c
MK
339.IR uaddr .
340Most commonly,
341.I val
342is specified as either 1 (wake up a single waiter) or
343.BR INT_MAX
344(wake up all waiters).
730bfbda
MK
345No guarantee is provided about which waiters are awoken
346(e.g., a waiter with a higher scheduling priority is not guaranteed
347to be awoken in preference to a waiter with a lower priority).
4798a7f3 348
fea681da
MK
349The arguments
350.IR timeout ,
c8b921bd 351.IR uaddr2 ,
fea681da
MK
352and
353.I val3
354are ignored.
74f58a64
MK
355.\" FIXME(Torvald) I think we should remove this. Or maybe adapt to
356.\" a different example.
4c8cb0ff
MK
357.\" For
358.\" .BR futex (7),
359.\" this is executed if incrementing the count showed that
360.\" there were waiters,
361.\" once the futex value has been set to 1
362.\" (indicating that it is available).
363.\"
364.\" FIXME How does "incrementing the count show that there were waiters"?
70b06b90
MK
365.\"
366.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
367.\"
a7c2bf45
MK
368.TP
369.BR FUTEX_FD " (from Linux 2.6.0 up to and including Linux 2.6.25)"
370.\" Strictly speaking, from Linux 2.5.x to 2.6.25
4c8cb0ff
MK
371This operation creates a file descriptor that is associated with
372the futex at
a7c2bf45 373.IR uaddr .
bdc5957a
MK
374The caller must close the returned file descriptor after use.
375When another process or thread performs a
a7c2bf45 376.BR FUTEX_WAKE
4b35dc5d 377on the futex word, the file descriptor indicates as being readable with
a7c2bf45
MK
378.BR select (2),
379.BR poll (2),
380and
381.BR epoll (7)
382
f1d2171d 383The file descriptor can be used to obtain asynchronous notifications: if
a7c2bf45 384.I val
ca4e5b2b 385is nonzero, then, when another process or thread executes a
a7c2bf45
MK
386.BR FUTEX_WAKE ,
387the caller will receive the signal number that was passed in
388.IR val .
389
390The arguments
391.IR timeout ,
392.I uaddr2
393and
394.I val3
395are ignored.
396
a7c2bf45
MK
397Because it was inherently racy,
398.B FUTEX_FD
399has been removed
400.\" commit 82af7aca56c67061420d618cc5a30f0fd4106b80
401from Linux 2.6.26 onward.
70b06b90
MK
402.\"
403.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
404.\"
a7c2bf45
MK
405.TP
406.BR FUTEX_REQUEUE " (since Linux 2.6.0)"
a7c2bf45 407This operation performs the same task as
27dd3a6e
MK
408.BR FUTEX_CMP_REQUEUE
409(see below), except that no check is made using the value in
a7c2bf45
MK
410.IR val3 .
411(The argument
412.I val3
413is ignored.)
70b06b90
MK
414.\"
415.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
416.\"
a7c2bf45
MK
417.TP
418.BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)"
4b35dc5d 419This operation first checks whether the location
a7c2bf45
MK
420.I uaddr
421still contains the value
422.IR val3 .
423If not, the operation fails with the error
424.BR EAGAIN .
4b35dc5d 425Otherwise, the operation wakes up a maximum of
a7c2bf45
MK
426.I val
427waiters that are waiting on the futex at
428.IR uaddr .
429If there are more than
430.I val
431waiters, then the remaining waiters are removed
432from the wait queue of the source futex at
433.I uaddr
434and added to the wait queue of the target futex at
435.IR uaddr2 .
436The
768d3c23 437.I val2
936876a9 438argument specifies an upper limit on the number of waiters
a7c2bf45 439that are requeued to the futex at
768d3c23 440.IR uaddr2 .
a7c2bf45 441
d6bb5a38
MK
442.\" FIXME(Torvald) Is the following correct? Or is just the decision
443.\" which threads to wake or requeue part of the atomic operation?
4b35dc5d
TR
444The load from
445.I uaddr
4c8cb0ff
MK
446is an atomic memory access (i.e., using atomic machine instructions of
447the respective architecture).
077981d4 448This load, the comparison with
4b35dc5d 449.IR val3 ,
4c8cb0ff
MK
450and the requeueing of any waiters are performed atomically and totally
451ordered with respect to other operations on the same futex word.
ee65b0e8
MK
452.\" Notes from a f2f conversation with Thomas Gleixner (Aug 2015): ###
453.\" The operation is serialized with respect to operations on both
454.\" source and target futex. No other waiter can enqueue itself
455.\" for waiting and no other waiter can dequeue itself because of
456.\" a timeout or signal.
4b35dc5d 457
a7c2bf45
MK
458Typical values to specify for
459.I val
ed1819cf 460are 0 or 1.
a7c2bf45
MK
461(Specifying
462.BR INT_MAX
463is not useful, because it would make the
464.BR FUTEX_CMP_REQUEUE
465operation equivalent to
466.BR FUTEX_WAKE .)
936876a9 467The limit value specified via
768d3c23
MK
468.I val2
469is typically either 1 or
a7c2bf45
MK
470.BR INT_MAX .
471(Specifying the argument as 0 is not useful, because it would make the
472.BR FUTEX_CMP_REQUEUE
473operation equivalent to
474.BR FUTEX_WAIT .)
627b50ce
MK
475
476The
477.B FUTEX_CMP_REQUEUE
478operation was added as a replacement for the earlier
479.BR FUTEX_REQUEUE .
480The difference is that the check of the value at
481.I uaddr
482can be used to ensure that requeueing happens only under certain
483conditions, which allows race conditions to be avoided in certain use cases.
dcb410c3 484.\" But, as Rich Felker points out, there remain valid use cases for
627b50ce
MK
485.\" FUTEX_REQUEUE, for example, when the calling thread is requeuing
486.\" the target(s) to a lock that the calling thread owns
487.\" From: Rich Felker <dalias@libc.org>
488.\" Date: Wed, 29 Oct 2014 22:43:17 -0400
489.\" To: Darren Hart <dvhart@infradead.org>
490.\" CC: libc-alpha@sourceware.org, ...
491.\" Subject: Re: Add futex wrapper to glibc?
492
493Both
494.BR FUTEX_REQUEUE
495and
496.BR FUTEX_CMP_REQUEUE
497can be used to avoid "thundering herd" wake-ups that could occur when using
498.B FUTEX_WAKE
499in cases where all of the waiters that are woken need to acquire
500another futex.
501Consider the following scenario,
502where multiple waiter threads are waiting on B,
503a wait queue implemented using a futex:
504
505.in +4n
506.nf
507lock(A)
508while (!check_value(V)) {
509 unlock(A);
510 block_on(B);
511 lock(A);
512};
513unlock(A);
514.fi
515.in
516
517If a waker thread used
518.BR FUTEX_WAKE ,
519then all waiters waiting on B would be woken up,
67c67ff2 520and they would all try to acquire lock A.
627b50ce
MK
521However, waking all of the threads in this manner would be pointless because
522all except one of the threads would immediately block on lock A again.
523By contrast, a requeue operation wakes just one waiter and moves
524the other waiters to lock A,
525and when the woken waiter unlocks A then the next waiter can proceed.
43d16602 526.\"
70b06b90
MK
527.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
528.\"
a5956430
MK
529.\" FIXME I added a lengthy piece of text on FUTEX_WAKE_OP text,
530.\" and I'd be happy if someone checked it.
fea681da 531.TP
d67e21f5
MK
532.BR FUTEX_WAKE_OP " (since Linux 2.6.14)"
533.\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721
6bac3b85
MK
534.\" Author: Jakub Jelinek <jakub@redhat.com>
535.\" Date: Tue Sep 6 15:16:25 2005 -0700
4c8cb0ff
MK
536.\" FIXME(Torvald) The glibc condvar implementation is currently being
537.\" revised (e.g., to not use an internal lock anymore).
538.\" It is probably more future-proof to remove this paragraph.
d6bb5a38 539.\" [Torvald, do you have an update here?]
6bac3b85
MK
540This operation was added to support some user-space use cases
541where more than one futex must be handled at the same time.
542The most notable example is the implementation of
543.BR pthread_cond_signal (3),
544which requires operations on two futexes,
545the one used to implement the mutex and the one used in the implementation
546of the wait queue associated with the condition variable.
547.BR FUTEX_WAKE_OP
548allows such cases to be implemented without leading to
549high rates of contention and context switching.
550
551The
57f2d48b 552.BR FUTEX_WAKE_OP
e61abc20 553operation is equivalent to executing the following code atomically
4c8cb0ff
MK
554and totally ordered with respect to other futex operations on
555any of the two supplied futex words:
6bac3b85
MK
556
557.in +4n
558.nf
559int oldval = *(int *) uaddr2;
560*(int *) uaddr2 = oldval \fIop\fP \fIoparg\fP;
561futex(uaddr, FUTEX_WAKE, val, 0, 0, 0);
562if (oldval \fIcmp\fP \fIcmparg\fP)
768d3c23 563 futex(uaddr2, FUTEX_WAKE, val2, 0, 0, 0);
6bac3b85
MK
564.fi
565.in
566
567In other words,
57f2d48b 568.BR FUTEX_WAKE_OP
6bac3b85
MK
569does the following:
570.RS
571.IP * 3
4b35dc5d
TR
572saves the original value of the futex word at
573.IR uaddr2
574and performs an operation to modify the value of the futex at
6bac3b85 575.IR uaddr2 ;
4c8cb0ff
MK
576this is an atomic read-modify-write memory access (i.e., using atomic
577machine instructions of the respective architecture)
6bac3b85
MK
578.IP *
579wakes up a maximum of
580.I val
4b35dc5d 581waiters on the futex for the futex word at
6bac3b85
MK
582.IR uaddr ;
583and
584.IP *
4c8cb0ff
MK
585dependent on the results of a test of the original value of the
586futex word at
6bac3b85
MK
587.IR uaddr2 ,
588wakes up a maximum of
768d3c23 589.I val2
4b35dc5d 590waiters on the futex for the futex word at
6bac3b85
MK
591.IR uaddr2 .
592.RE
593.IP
6bac3b85
MK
594The operation and comparison that are to be performed are encoded
595in the bits of the argument
596.IR val3 .
597Pictorially, the encoding is:
598
f6af90e7 599.in +8n
6bac3b85 600.nf
f6af90e7
MK
601+---+---+-----------+-----------+
602|op |cmp| oparg | cmparg |
603+---+---+-----------+-----------+
604 4 4 12 12 <== # of bits
6bac3b85
MK
605.fi
606.in
607
608Expressed in code, the encoding is:
609
610.in +4n
611.nf
612#define FUTEX_OP(op, oparg, cmp, cmparg) \\
613 (((op & 0xf) << 28) | \\
614 ((cmp & 0xf) << 24) | \\
615 ((oparg & 0xfff) << 12) | \\
616 (cmparg & 0xfff))
617.fi
618.in
619
620In the above,
621.I op
622and
623.I cmp
624are each one of the codes listed below.
625The
626.I oparg
627and
628.I cmparg
629components are literal numeric values, except as noted below.
630
631The
632.I op
633component has one of the following values:
634
635.in +4n
636.nf
637FUTEX_OP_SET 0 /* uaddr2 = oparg; */
638FUTEX_OP_ADD 1 /* uaddr2 += oparg; */
639FUTEX_OP_OR 2 /* uaddr2 |= oparg; */
640FUTEX_OP_ANDN 3 /* uaddr2 &= ~oparg; */
641FUTEX_OP_XOR 4 /* uaddr2 ^= oparg; */
642.fi
643.in
644
645In addition, bit-wise ORing the following value into
646.I op
647causes
648.IR "(1\ <<\ oparg)"
649to be used as the operand:
650
651.in +4n
652.nf
653FUTEX_OP_ARG_SHIFT 8 /* Use (1 << oparg) as operand */
654.fi
655.in
656
657The
658.I cmp
659field is one of the following:
660
661.in +4n
662.nf
663FUTEX_OP_CMP_EQ 0 /* if (oldval == cmparg) wake */
664FUTEX_OP_CMP_NE 1 /* if (oldval != cmparg) wake */
665FUTEX_OP_CMP_LT 2 /* if (oldval < cmparg) wake */
666FUTEX_OP_CMP_LE 3 /* if (oldval <= cmparg) wake */
667FUTEX_OP_CMP_GT 4 /* if (oldval > cmparg) wake */
668FUTEX_OP_CMP_GE 5 /* if (oldval >= cmparg) wake */
669.fi
670.in
671
672The return value of
673.BR FUTEX_WAKE_OP
674is the sum of the number of waiters woken on the futex
675.IR uaddr
676plus the number of waiters woken on the futex
677.IR uaddr2 .
70b06b90
MK
678.\"
679.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
680.\"
d67e21f5 681.TP
79c9b436
TG
682.BR FUTEX_WAIT_BITSET " (since Linux 2.6.25)"
683.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
fd9e59d4 684This operation is like
79c9b436
TG
685.BR FUTEX_WAIT
686except that
687.I val3
5e1456d4
MK
688is used to provide a 32-bit mask to the kernel.
689This bit mask is stored in the kernel-internal state of the waiter.
79c9b436
TG
690See the description of
691.BR FUTEX_WAKE_BITSET
692for further details.
693
8064bfa5
MK
694If
695.I timeout
696is not NULL, the structure it points to specifies
697an absolute timeout for the wait operation.
698If
699.I timeout
700is NULL, the operation can block indefinitely.
701
702
79c9b436
TG
703The
704.I uaddr2
705argument is ignored.
70b06b90
MK
706.\"
707.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
708.\"
79c9b436 709.TP
d67e21f5
MK
710.BR FUTEX_WAKE_BITSET " (since Linux 2.6.25)"
711.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
55cc422d
TG
712This operation is the same as
713.BR FUTEX_WAKE
714except that the
e24fbf10 715.I val3
5e1456d4
MK
716argument is used to provide a 32-bit bit mask to the kernel.
717This bit mask is used to select which waiters should be woken up.
718The selection is done by a bit-wise AND of the "wake" bit mask
98d769c0
MK
719(i.e., the value in
720.IR val3 )
5e1456d4
MK
721and the bit mask which is stored in the kernel-internal
722state of the waiter (the "wait" bit mask that is set using
98d769c0
MK
723.BR FUTEX_WAIT_BITSET ).
724All of the waiters for which the result of the AND is nonzero are woken up;
725the remaining waiters are left sleeping.
726
e9d4496b
MK
727The effect of
728.BR FUTEX_WAIT_BITSET
729and
730.BR FUTEX_WAKE_BITSET
9732dd8b
MK
731is to allow selective wake-ups among multiple waiters that are blocked
732on the same futex.
ac894879 733However, note that, depending on the use case,
5e1456d4 734employing this bit-mask multiplexing feature on a
ac894879 735futex can be less efficient than simply using multiple futexes,
5e1456d4 736because employing bit-mask multiplexing requires the kernel
e9d4496b
MK
737to check all waiters on a futex,
738including those that are not interested in being woken up
5e1456d4 739(i.e., they do not have the relevant bit set in their "wait" bit mask).
e9d4496b
MK
740.\" According to http://locklessinc.com/articles/futex_cheat_sheet/:
741.\"
742.\" "The original reason for the addition of these extensions
743.\" was to improve the performance of pthread read-write locks
744.\" in glibc. However, the pthreads library no longer uses the
745.\" same locking algorithm, and these extensions are not used
746.\" without the bitset parameter being all ones.
e24fbf10 747.\"
e9d4496b 748.\" The page goes on to note that the FUTEX_WAIT_BITSET operation
5e1456d4 749.\" is nevertheless used (with a bit mask of all ones) in order to
e9d4496b
MK
750.\" obtain the absolute timeout functionality that is useful
751.\" for efficiently implementing Pthreads APIs (which use absolute
752.\" timeouts); FUTEX_WAIT provides only relative timeouts.
753
98d769c0
MK
754The
755.I uaddr2
756and
757.I timeout
758arguments are ignored.
9732dd8b
MK
759
760The
761.BR FUTEX_WAIT
762and
763.BR FUTEX_WAKE
764operations correspond to
765.BR FUTEX_WAIT_BITSET
766and
767.BR FUTEX_WAKE_BITSET
5e1456d4 768operations where the bit masks are all ones.
bd90a5f9 769.\"
70b06b90 770.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
bd90a5f9
MK
771.\"
772.SS Priority-inheritance futexes
b52e1cd4
MK
773Linux supports priority-inheritance (PI) futexes in order to handle
774priority-inversion problems that can be encountered with
775normal futex locks.
b565548b 776Priority inversion is the problem that occurs when a high-priority
bdc5957a
MK
777task is blocked waiting to acquire a lock held by a low-priority task,
778while tasks at an intermediate priority continuously preempt
779the low-priority task from the CPU.
780Consequently, the low-priority task makes no progress toward
781releasing the lock, and the high-priority task remains blocked.
7f315ae3 782
7d20efd7
MK
783Priority inheritance is a mechanism for dealing with
784the priority-inversion problem.
bdc5957a
MK
785With this mechanism, when a high-priority task becomes blocked
786by a lock held by a low-priority task,
9cee832c
MK
787the priority of the low-priority task is temporarily raised
788to that of the high-priority task,
bdc5957a 789so that it is not preempted by any intermediate level tasks,
7d20efd7
MK
790and can thus make progress toward releasing the lock.
791To be effective, priority inheritance must be transitive,
bdc5957a 792meaning that if a high-priority task blocks on a lock
ca4e5b2b 793held by a lower-priority task that is itself blocked by a lock
bdc5957a 794held by another intermediate-priority task
7d20efd7 795(and so on, for chains of arbitrary length),
b0f35fbb 796then both of those tasks
bdc5957a
MK
797(or more generally, all of the tasks in a lock chain)
798have their priorities raised to be the same as the high-priority task.
7d20efd7 799
9e2b90ee 800From a user-space perspective,
39e9b2e1
MK
801what makes a futex PI-aware is a policy agreement (described below)
802between user space and the kernel about the value of the futex word,
601399f3
MK
803coupled with the use of the PI-futex operations described below.
804(Unlike the other futex operations described above,
805the PI-futex operations are designed
806for the implementation of very specific IPC mechanisms.)
807.\"
9e2b90ee
MK
808.\" Quoting Darren Hart:
809.\" These opcodes paired with the PI futex value policy (described below)
810.\" defines a "futex" as PI aware. These were created very specifically
811.\" in support of PI pthread_mutexes, so it makes a lot more sense to
812.\" talk about a PI aware pthread_mutex, than a PI aware futex, since
813.\" there is a lot of policy and scaffolding that has to be built up
814.\" around it to use it properly (this is what a PI pthread_mutex is).
815
ac894879 816.\" mtk: The following text is drawn from the Hart/Guniguntala paper
1af427a4 817.\" (listed in SEE ALSO), but I have reworded some pieces
f0a9e8f4 818.\" significntly.
79d918c7 819.\"
f0a9e8f4 820The PI-futex operations described below differ from the other
4b35dc5d
TR
821futex operations in that they impose policy on the use of the value of the
822futex word:
79d918c7 823.IP * 3
4b35dc5d 824If the lock is not acquired, the futex word's value shall be 0.
79d918c7 825.IP *
4c8cb0ff
MK
826If the lock is acquired, the futex word's value shall
827be the thread ID (TID;
4b35dc5d 828see
79d918c7
MK
829.BR gettid (2))
830of the owning thread.
831.IP *
79d918c7
MK
832If the lock is owned and there are threads contending for the lock,
833then the
834.B FUTEX_WAITERS
4b35dc5d 835bit shall be set in the futex word's value; in other words, this value is:
79d918c7
MK
836
837 FUTEX_WAITERS | TID
601399f3
MK
838.IP
839(Note that is invalid for a PI futex word to have no owner and
840.BR FUTEX_WAITERS
841set.)
79d918c7
MK
842.PP
843With this policy in place,
fd105614 844a user-space application can acquire an unacquired
601399f3 845lock or release a lock using atomic instructions executed in user mode
fd105614 846(e.g., a compare-and-swap operation such as
b52e1cd4
MK
847.I cmpxchg
848on the x86 architecture).
4c8cb0ff
MK
849Acquiring a lock simply consists of using compare-and-swap to atomically
850set the futex word's value to the caller's TID if its previous value was 0.
4b35dc5d
TR
851Releasing a lock requires using compare-and-swap to set the futex word's
852value to 0 if the previous value was the expected TID.
b52e1cd4 853
4b35dc5d 854If a futex is already acquired (i.e., has a nonzero value),
b52e1cd4 855waiters must employ the
79d918c7
MK
856.B FUTEX_LOCK_PI
857operation to acquire the lock.
4b35dc5d 858If other threads are waiting for the lock, then the
79d918c7 859.B FUTEX_WAITERS
4c8cb0ff
MK
860bit is set in the futex value;
861in this case, the lock owner must employ the
79d918c7 862.B FUTEX_UNLOCK_PI
b52e1cd4
MK
863operation to release the lock.
864
79d918c7
MK
865In the cases where callers are forced into the kernel
866(i.e., required to perform a
867.BR futex ()
0c3ec26b 868call),
79d918c7
MK
869they then deal directly with a so-called RT-mutex,
870a kernel locking mechanism which implements the required
871priority-inheritance semantics.
872After the RT-mutex is acquired, the futex value is updated accordingly,
873before the calling thread returns to user space.
79d918c7 874
a59fca75 875It is important to note
ac894879 876.\" tglx (July 2015):
30239c10
MK
877.\" If there are multiple waiters on a pi futex then a wake pi operation
878.\" will wake the first waiter and hand over the lock to this waiter. This
879.\" includes handing over the rtmutex which represents the futex in the
880.\" kernel. The strict requirement is that the futex owner and the rtmutex
881.\" owner must be the same, except for the update period which is
882.\" serialized by the futex internal locking. That means the kernel must
1d09c150 883.\" update the user-space value prior to returning to user space
4b35dc5d 884that the kernel will update the futex word's value prior
79d918c7 885to returning to user space.
601399f3
MK
886(This prevents the possibility of the futex word's value ending
887up in an invalid state, such as having an owner but the value being 0,
888or having waiters but not having the
889.B FUTEX_WAITERS
890bit set.)
30239c10
MK
891
892If a futex has an associated RT-mutex in the kernel
893(i.e., there are blocked waiters)
894and the owner of the futex/RT-mutex dies unexpectedly,
895then the kernel cleans up the RT-mutex and hands it over to the next waiter.
896This in turn requires that the user-space value is updated accordingly.
897To indicate that this is required, the kernel sets the
898.B FUTEX_OWNER_DIED
899bit in the futex word along with the thread ID of the new owner.
8adaf0a7
MK
900User space can detect this situation via the presence of the
901.B FUTEX_OWNER_DIED
902bit and is then responsible for cleaning up the stale state left over by
1d09c150 903the dead owner.
30239c10
MK
904.\" tglx (July 2015):
905.\" The FUTEX_OWNER_DIED bit can also be set on uncontended futexes, where
906.\" the kernel has no state associated. This happens via the robust futex
907.\" mechanism. In that case the futex value will be set to
908.\" FUTEX_OWNER_DIED. The robust futex mechanism is also available for non
909.\" PI futexes.
bd90a5f9 910
601399f3
MK
911PI futexes are operated on by specifying one of the values listed below in
912.IR futex_op .
913Note that the PI futex operations must be used as paired operations
914and are subject to some additional requirements:
915.IP * 3
916.B FUTEX_LOCK_PI
917and
918.B FUTEX_TRYLOCK_PI
919pair with
920.BR FUTEX_UNLOCK_PI.
921.B FUTEX_UNLOCK_PI
922must be called only on a futex owned by the calling thread,
923as defined by the value policy, otherwise the error
924.B EPERM
925results.
926.IP *
927.B FUTEX_WAIT_REQUEUE_PI
928pairs with
929.BR FUTEX_CMP_REQUEUE_PI .
930This must be performed from a non-PI futex to a distinct PI futex
931(or the error
932.B EINVAL
933results).
934Additionally,
935.I val
936(the number of waiters to be woken) must be 1
937(or the error
938.B EINVAL
939results).
940.P
941The PI futex operations are as follows:
70b06b90
MK
942.\"
943.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
944.\"
d67e21f5
MK
945.TP
946.BR FUTEX_LOCK_PI " (since Linux 2.6.18)"
947.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
bc54ed38 948This operation is used after an attempt to acquire
fd105614 949the lock via an atomic user-mode instruction failed
4b35dc5d 950because the futex word has a nonzero value\(emspecifically,
8297383e 951because it contained the (PID-namespace-specific) TID of the lock owner.
67833bec 952
4b35dc5d 953The operation checks the value of the futex word at the address
67833bec 954.IR uaddr .
70b06b90
MK
955If the value is 0, then the kernel tries to atomically set
956the futex value to the caller's TID.
c3875d1d 957If the futex word's value is nonzero,
67833bec 958the kernel atomically sets the
e0547e70 959.B FUTEX_WAITERS
67833bec
MK
960bit, which signals the futex owner that it cannot unlock the futex in
961user space atomically by setting the futex value to 0.
c3875d1d
MK
962.\" tglx (July 2015):
963.\" The operation here is similar to the FUTEX_WAIT logic. When the user
964.\" space atomic acquire does not succeed because the futex value was non
965.\" zero, then the waiter goes into the kernel, takes the kernel internal
966.\" lock and retries the acquisition under the lock. If the acquisition
967.\" does not succeed either, then it sets the FUTEX_WAITERS bit, to signal
968.\" the lock owner that it needs to go into the kernel. Here is the pseudo
969.\" code:
970.\"
971.\" lock(kernel_lock);
972.\" retry:
973.\"
974.\" /*
975.\" * Owner might have unlocked in userspace before we
976.\" * were able to set the waiter bit.
977.\" */
978.\" if (atomic_acquire(futex) == SUCCESS) {
979.\" unlock(kernel_lock());
980.\" return 0;
981.\" }
982.\"
983.\" /*
984.\" * Owner might have unlocked after the above atomic_acquire()
985.\" * attempt.
986.\" */
987.\" if (atomic_set_waiters_bit(futex) != SUCCESS)
988.\" goto retry;
989.\"
990.\" queue_waiter();
991.\" unlock(kernel_lock);
992.\" block();
993.\"
994After that, the kernel:
995.RS
996.IP 1. 3
997Tries to find the thread which is associated with the owner TID.
998.IP 2.
999Creates or reuses kernel state on behalf of the owner.
1000(If this is the first waiter, there is no kernel state for this
1001futex, so kernel state is created by locking the RT-mutex
1002and the futex owner is made the owner of the RT-mutex.
1003If there are existing waiters, then the existing state is reused.)
1004.IP 3.
ca4e5b2b 1005Attaches the waiter to the futex
c3875d1d
MK
1006(i.e., the waiter is enqueued on the RT-mutex waiter list).
1007.RE
1008.IP
ac894879
MK
1009If more than one waiter exists,
1010the enqueueing of the waiter is in descending priority order.
1011(For information on priority ordering, see the discussion of the
1012.BR SCHED_DEADLINE ,
1013.BR SCHED_FIFO ,
1014and
1015.BR SCHED_RR
1016scheduling policies in
1017.BR sched (7).)
1018The owner inherits either the waiter's CPU bandwidth
1019(if the waiter is scheduled under the
1020.BR SCHED_DEADLINE
1021policy) or the waiter's priority (if the waiter is scheduled under the
1022.BR SCHED_RR
1023or
1024.BR SCHED_FIFO
1025policy).
1d09c150
MK
1026.\" August 2015:
1027.\" mtk: If the realm is restricted purely to SCHED_OTHER (SCHED_NORMAL)
1028.\" processes, does the nice value come into play also?
1029.\"
1030.\" tglx: No. SCHED_OTHER/NORMAL tasks are handled in FIFO order
c3875d1d 1031This inheritance follows the lock chain in the case of nested locking
ca4e5b2b
MK
1032.\" (i.e., task 1 blocks on lock A, held by task 2,
1033.\" while task 2 blocks on lock B, held by task 3)
c3875d1d
MK
1034and performs deadlock detection.
1035
e0547e70
TG
1036The
1037.I timeout
9ce19cf1 1038argument provides a timeout for the lock attempt.
8064bfa5
MK
1039If
1040.I timeout
1041is not NULL, the structure it points to specifies
1042an absolute timeout, measured against the
9ce19cf1
MK
1043.BR CLOCK_REALTIME
1044clock.
1045If
1046.I timeout
1047is NULL, the operation will block indefinitely.
e0547e70 1048
a449c634 1049The
e0547e70
TG
1050.IR uaddr2 ,
1051.IR val ,
1052and
1053.IR val3
a449c634 1054arguments are ignored.
67833bec 1055.\"
70b06b90
MK
1056.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1057.\"
d67e21f5 1058.TP
12fdbe23 1059.BR FUTEX_TRYLOCK_PI " (since Linux 2.6.18)"
d67e21f5 1060.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
3fbb1be1 1061This operation tries to acquire the lock at
12fdbe23 1062.IR uaddr .
c3875d1d
MK
1063It is invoked when a user-space atomic acquire did not
1064succeed because the futex word was not 0.
1065
8adaf0a7
MK
1066Because the kernel has access to more state information than user space,
1067acquisition of the lock might succeed if performed by the
1068kernel in cases where the futex word
1069(i.e., the state information accessible to use-space) contains stale state
c3875d1d
MK
1070.RB ( FUTEX_WAITERS
1071and/or
1072.BR FUTEX_OWNER_DIED ).
1073This can happen when the owner of the futex died.
1d09c150
MK
1074User space cannot handle this condition in a race-free manner,
1075but the kernel can fix this up and acquire the futex.
ee65b0e8
MK
1076.\" Paraphrasing a f2f conversation with Thomas Gleixner about the
1077.\" above point (Aug 2015): ###
1078.\" There is a rare possibility of a race condition involving an
1079.\" uncontended futex with no owner, but with waiters. The
1080.\" kernel-user-space contract is that if a futex is nonzero, you must
1081.\" go into kernel. The futex was owned by a task, and that task dies
1082.\" but there are no waiters, so the futex value is non zero.
1083.\" Therefore, the next locker has to go into the kernel,
1084.\" so that the kernel has a chance to clean up. (CMXCH on zero
1085.\" in user space would fail, so kernel has to clean up.)
8adaf0a7
MK
1086.\" Darren Hart (Oct 2015):
1087.\" The trylock in the kernel has more state, so it can independently
1088.\" verify the flags that userspace must trust implicitly.
084744ef
MK
1089
1090The
1091.IR uaddr2 ,
1092.IR val ,
1093.IR timeout ,
1094and
1095.IR val3
1096arguments are ignored.
70b06b90
MK
1097.\"
1098.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1099.\"
d67e21f5 1100.TP
12fdbe23 1101.BR FUTEX_UNLOCK_PI " (since Linux 2.6.18)"
d67e21f5 1102.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
d4ba4328 1103This operation wakes the top priority waiter that is waiting in
ecae2099
TG
1104.B FUTEX_LOCK_PI
1105on the futex address provided by the
1106.I uaddr
1107argument.
1108
1d09c150 1109This is called when the user-space value at
ecae2099
TG
1110.I uaddr
1111cannot be changed atomically from a TID (of the owner) to 0.
1112
1113The
1114.IR uaddr2 ,
1115.IR val ,
1116.IR timeout ,
1117and
1118.IR val3
11a194bf 1119arguments are ignored.
70b06b90
MK
1120.\"
1121.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1122.\"
d67e21f5 1123.TP
d67e21f5
MK
1124.BR FUTEX_CMP_REQUEUE_PI " (since Linux 2.6.31)"
1125.\" commit 52400ba946759af28442dee6265c5c0180ac7122
f812a08b
DH
1126This operation is a PI-aware variant of
1127.BR FUTEX_CMP_REQUEUE .
1128It requeues waiters that are blocked via
1129.B FUTEX_WAIT_REQUEUE_PI
1130on
1131.I uaddr
1132from a non-PI source futex
1133.RI ( uaddr )
1134to a PI target futex
1135.RI ( uaddr2 ).
1136
9e54d26d
MK
1137As with
1138.BR FUTEX_CMP_REQUEUE ,
1139this operation wakes up a maximum of
1140.I val
1141waiters that are waiting on the futex at
1142.IR uaddr .
1143However, for
1144.BR FUTEX_CMP_REQUEUE_PI ,
1145.I val
6fbeb8f4 1146is required to be 1
939ca89f 1147(since the main point is to avoid a thundering herd).
9e54d26d
MK
1148The remaining waiters are removed from the wait queue of the source futex at
1149.I uaddr
1150and added to the wait queue of the target futex at
1151.IR uaddr2 .
f812a08b 1152
9e54d26d 1153The
768d3c23 1154.I val2
c6d8cf21
MK
1155.\" val2 is the cap on the number of requeued waiters.
1156.\" In the glibc pthread_cond_broadcast() implementation, this argument
1157.\" is specified as INT_MAX, and for pthread_cond_signal() it is 0.
9e54d26d 1158and
768d3c23 1159.I val3
9e54d26d
MK
1160arguments serve the same purposes as for
1161.BR FUTEX_CMP_REQUEUE .
70b06b90 1162.\"
8297383e 1163.\" The page at http://locklessinc.com/articles/futex_cheat_sheet/
be376673 1164.\" notes that "priority-inheritance Futex to priority-inheritance
8297383e
MK
1165.\" Futex requeues are currently unsupported". However, probably
1166.\" the page does not need to say nothing about this, since
1167.\" Thomas Gleixner commented (July 2015): "they never will be
1168.\" supported because they make no sense at all"
70b06b90
MK
1169.\"
1170.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1171.\"
d67e21f5
MK
1172.TP
1173.BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)"
1174.\" commit 52400ba946759af28442dee6265c5c0180ac7122
70b06b90 1175.\"
c3875d1d 1176Wait on a non-PI futex at
6ff1b4c0 1177.I uaddr
c3875d1d
MK
1178and potentially be requeued (via a
1179.BR FUTEX_CMP_REQUEUE_PI
1180operation in another task) onto a PI futex at
6ff1b4c0
TG
1181.IR uaddr2 .
1182The wait operation on
1183.I uaddr
c3875d1d 1184is the same as for
6ff1b4c0 1185.BR FUTEX_WAIT .
c3875d1d 1186
6ff1b4c0
TG
1187The waiter can be removed from the wait on
1188.I uaddr
6ff1b4c0 1189without requeueing on
c3875d1d
MK
1190.IR uaddr2
1191via a
1d09c150 1192.BR FUTEX_WAKE
c3875d1d
MK
1193operation in another task.
1194In this case, the
1195.BR FUTEX_WAIT_REQUEUE_PI
3fbb1be1
MK
1196operation fails with the error
1197.BR EAGAIN .
a4e69912 1198
63bea7dc
MK
1199If
1200.I timeout
8064bfa5
MK
1201is not NULL, the structure it points to specifies
1202an absolute timeout for the wait operation.
63bea7dc
MK
1203If
1204.I timeout
1205is NULL, the operation can block indefinitely.
1206
a4e69912
MK
1207The
1208.I val3
1209argument is ignored.
abb571e8
MK
1210
1211The
1212.BR FUTEX_WAIT_REQUEUE_PI
1213and
1214.BR FUTEX_CMP_REQUEUE_PI
1215were added to support a fairly specific use case:
1216support for priority-inheritance-aware POSIX threads condition variables.
1217The idea is that these operations should always be paired,
1218in order to ensure that user space and the kernel remain in sync.
1219Thus, in the
1220.BR FUTEX_WAIT_REQUEUE_PI
1221operation, the user-space application pre-specifies the target
1222of the requeue that takes place in the
1223.BR FUTEX_CMP_REQUEUE_PI
1224operation.
1225.\"
1226.\" Darren Hart notes that a patch to allow glibc to fully support
1af427a4 1227.\" PI-aware pthreads condition variables has not yet been accepted into
abb571e8
MK
1228.\" glibc. The story is complex, and can be found at
1229.\" https://sourceware.org/bugzilla/show_bug.cgi?id=11588
1230.\" Darren notes that in the meantime, the patch is shipped with various
1af427a4 1231.\" PREEMPT_RT-enabled Linux systems.
abb571e8
MK
1232.\"
1233.\" Related to the preceding, Darren proposed that somewhere, man-pages
1234.\" should document the following point:
1af427a4 1235.\"
4c8cb0ff
MK
1236.\" While the Linux kernel, since 2.6.31, supports requeueing of
1237.\" priority-inheritance (PI) aware mutexes via the
1238.\" FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI futex operations,
1239.\" the glibc implementation does not yet take full advantage of this.
1240.\" Specifically, the condvar internal data lock remains a non-PI aware
1241.\" mutex, regardless of the type of the pthread_mutex associated with
1242.\" the condvar. This can lead to an unbounded priority inversion on
1243.\" the internal data lock even when associating a PI aware
1244.\" pthread_mutex with a condvar during a pthread_cond*_wait
1245.\" operation. For this reason, it is not recommended to rely on
1246.\" priority inheritance when using pthread condition variables.
1af427a4
MK
1247.\"
1248.\" The problem is that the obvious location for this text is
1249.\" the pthread_cond*wait(3) man page. However, such a man page
abb571e8 1250.\" does not currently exist.
70b06b90 1251.\"
6700de24 1252.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
70b06b90 1253.\"
47297adb 1254.SH RETURN VALUE
fea681da 1255.PP
a5c5a06a
MK
1256In the event of an error (and assuming that
1257.BR futex ()
1258was invoked via
1259.BR syscall (2)),
1260all operations return \-1 and set
e808bba0 1261.I errno
6f147f79 1262to indicate the cause of the error.
adc89a97 1263
e808bba0
MK
1264The return value on success depends on the operation,
1265as described in the following list:
fea681da
MK
1266.TP
1267.B FUTEX_WAIT
077981d4 1268Returns 0 if the caller was woken up.
4c8cb0ff
MK
1269Note that a wake-up can also be caused by common futex usage patterns
1270in unrelated code that happened to have previously used the futex word's
1271memory location (e.g., typical futex-based implementations of
1272Pthreads mutexes can cause this under some conditions).
1273Therefore, callers should always conservatively assume that a return
1274value of 0 can mean a spurious wake-up, and use the futex word's value
bc54ed38
MK
1275(i.e., the user-space synchronization scheme)
1276to decide whether to continue to block or not.
fea681da
MK
1277.TP
1278.B FUTEX_WAKE
bdc5957a 1279Returns the number of waiters that were woken up.
fea681da
MK
1280.TP
1281.B FUTEX_FD
1282Returns the new file descriptor associated with the futex.
1283.TP
1284.B FUTEX_REQUEUE
bdc5957a 1285Returns the number of waiters that were woken up.
fea681da
MK
1286.TP
1287.B FUTEX_CMP_REQUEUE
bdc5957a 1288Returns the total number of waiters that were woken up or
4b35dc5d 1289requeued to the futex for the futex word at
3dfcc11d
MK
1290.IR uaddr2 .
1291If this value is greater than
1292.IR val ,
fd105614 1293then the difference is the number of waiters requeued to the futex for the
4c8cb0ff 1294futex word at
3dfcc11d 1295.IR uaddr2 .
dcad19c0
MK
1296.TP
1297.B FUTEX_WAKE_OP
a8b5b324 1298Returns the total number of waiters that were woken up.
4c8cb0ff
MK
1299This is the sum of the woken waiters on the two futexes for
1300the futex words at
a8b5b324
MK
1301.I uaddr
1302and
1303.IR uaddr2 .
dcad19c0
MK
1304.TP
1305.B FUTEX_WAIT_BITSET
077981d4
MK
1306Returns 0 if the caller was woken up.
1307See
4b35dc5d
TR
1308.B FUTEX_WAIT
1309for how to interpret this correctly in practice.
dcad19c0
MK
1310.TP
1311.B FUTEX_WAKE_BITSET
bdc5957a 1312Returns the number of waiters that were woken up.
dcad19c0
MK
1313.TP
1314.B FUTEX_LOCK_PI
bf02a260 1315Returns 0 if the futex was successfully locked.
dcad19c0
MK
1316.TP
1317.B FUTEX_TRYLOCK_PI
5c716eef 1318Returns 0 if the futex was successfully locked.
dcad19c0
MK
1319.TP
1320.B FUTEX_UNLOCK_PI
52bb928f 1321Returns 0 if the futex was successfully unlocked.
dcad19c0
MK
1322.TP
1323.B FUTEX_CMP_REQUEUE_PI
bdc5957a 1324Returns the total number of waiters that were woken up or
4b35dc5d 1325requeued to the futex for the futex word at
dddd395a
MK
1326.IR uaddr2 .
1327If this value is greater than
1328.IR val ,
4c8cb0ff
MK
1329then difference is the number of waiters requeued to the futex for
1330the futex word at
dddd395a 1331.IR uaddr2 .
dcad19c0
MK
1332.TP
1333.B FUTEX_WAIT_REQUEUE_PI
4c8cb0ff
MK
1334Returns 0 if the caller was successfully requeued to the futex for
1335the futex word at
22c15de9 1336.IR uaddr2 .
70b06b90
MK
1337.\"
1338.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1339.\"
fea681da
MK
1340.SH ERRORS
1341.TP
1342.B EACCES
4b35dc5d 1343No read access to the memory of a futex word.
fea681da
MK
1344.TP
1345.B EAGAIN
f48516d1 1346.RB ( FUTEX_WAIT ,
4b35dc5d 1347.BR FUTEX_WAIT_BITSET ,
f48516d1 1348.BR FUTEX_WAIT_REQUEUE_PI )
badbf70c
MK
1349The value pointed to by
1350.I uaddr
1351was not equal to the expected value
1352.I val
1353at the time of the call.
9732dd8b
MK
1354
1355.BR Note :
1356on Linux, the symbolic names
1357.B EAGAIN
1358and
1359.B EWOULDBLOCK
77da5feb 1360(both of which appear in different parts of the kernel futex code)
9732dd8b 1361have the same value.
badbf70c
MK
1362.TP
1363.B EAGAIN
8f2068bb
MK
1364.RB ( FUTEX_CMP_REQUEUE ,
1365.BR FUTEX_CMP_REQUEUE_PI )
ce5602fd 1366The value pointed to by
9f6c40c0
МК
1367.I uaddr
1368is not equal to the expected value
1369.IR val3 .
fea681da 1370.TP
5662f56a
MK
1371.BR EAGAIN
1372.RB ( FUTEX_LOCK_PI ,
aaec9032
MK
1373.BR FUTEX_TRYLOCK_PI ,
1374.BR FUTEX_CMP_REQUEUE_PI )
1375The futex owner thread ID of
1376.I uaddr
1377(for
1378.BR FUTEX_CMP_REQUEUE_PI :
1379.IR uaddr2 )
1380is about to exit,
5662f56a
MK
1381but has not yet handled the internal state cleanup.
1382Try again.
1383.TP
7a39e745
MK
1384.BR EDEADLK
1385.RB ( FUTEX_LOCK_PI ,
9732dd8b
MK
1386.BR FUTEX_TRYLOCK_PI ,
1387.BR FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1388The futex word at
7a39e745
MK
1389.I uaddr
1390is already locked by the caller.
1391.TP
662c0da8 1392.BR EDEADLK
c3875d1d 1393.\" FIXME . I see that kernel/locking/rtmutex.c uses EDEADLK in some
d6bb5a38 1394.\" places, and EDEADLOCK in others. On almost all architectures
4c8cb0ff
MK
1395.\" these constants are synonymous. Is there a reason that both
1396.\" names are used?
8297383e
MK
1397.\"
1398.\" tglx (July 2015): "No. We should probably fix that."
1399.\"
662c0da8 1400.RB ( FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1401While requeueing a waiter to the PI futex for the futex word at
662c0da8
MK
1402.IR uaddr2 ,
1403the kernel detected a deadlock.
1404.TP
fea681da 1405.B EFAULT
1ea901e8
MK
1406A required pointer argument (i.e.,
1407.IR uaddr ,
1408.IR uaddr2 ,
1409or
1410.IR timeout )
496df304 1411did not point to a valid user-space address.
fea681da 1412.TP
9f6c40c0 1413.B EINTR
e808bba0 1414A
9f6c40c0 1415.B FUTEX_WAIT
2674f781
MK
1416or
1417.B FUTEX_WAIT_BITSET
e808bba0 1418operation was interrupted by a signal (see
f529fd20
MK
1419.BR signal (7)).
1420In kernels before Linux 2.6.22, this error could also be returned for
1421on a spurious wakeup; since Linux 2.6.22, this no longer happens.
9f6c40c0 1422.TP
fea681da 1423.B EINVAL
180f97b7
MK
1424The operation in
1425.IR futex_op
1426is one of those that employs a timeout, but the supplied
fb2f4c27
MK
1427.I timeout
1428argument was invalid
1429.RI ( tv_sec
1430was less than zero, or
1431.IR tv_nsec
cabee29d 1432was not less than 1,000,000,000).
fb2f4c27
MK
1433.TP
1434.B EINVAL
0c74df0b 1435The operation specified in
025e1374 1436.IR futex_op
0c74df0b 1437employs one or both of the pointers
51ee94be 1438.I uaddr
a1f47699 1439and
0c74df0b
MK
1440.IR uaddr2 ,
1441but one of these does not point to a valid object\(emthat is,
1442the address is not four-byte-aligned.
51ee94be
MK
1443.TP
1444.B EINVAL
55cc422d
TG
1445.RB ( FUTEX_WAIT_BITSET ,
1446.BR FUTEX_WAKE_BITSET )
5e1456d4 1447The bit mask supplied in
79c9b436
TG
1448.IR val3
1449is zero.
1450.TP
1451.B EINVAL
2abcba67 1452.RB ( FUTEX_CMP_REQUEUE_PI )
add875c0
MK
1453.I uaddr
1454equals
1455.IR uaddr2
1456(i.e., an attempt was made to requeue to the same futex).
1457.TP
ff597681
MK
1458.BR EINVAL
1459.RB ( FUTEX_FD )
1460The signal number supplied in
1461.I val
1462is invalid.
1463.TP
6bac3b85 1464.B EINVAL
476debd7
MK
1465.RB ( FUTEX_WAKE ,
1466.BR FUTEX_WAKE_OP ,
1467.BR FUTEX_WAKE_BITSET ,
1468.BR FUTEX_REQUEUE ,
1469.BR FUTEX_CMP_REQUEUE )
1470The kernel detected an inconsistency between the user-space state at
1471.I uaddr
1472and the kernel state\(emthat is, it detected a waiter which waits in
1473.BR FUTEX_LOCK_PI
1474on
1475.IR uaddr .
1476.TP
1477.B EINVAL
a218ef20 1478.RB ( FUTEX_LOCK_PI ,
ce022f18
MK
1479.BR FUTEX_TRYLOCK_PI ,
1480.BR FUTEX_UNLOCK_PI )
a218ef20
MK
1481The kernel detected an inconsistency between the user-space state at
1482.I uaddr
1483and the kernel state.
ce022f18 1484This indicates either state corruption
ce022f18 1485or that the kernel found a waiter on
a218ef20
MK
1486.I uaddr
1487which is waiting via
1488.BR FUTEX_WAIT
1489or
1490.BR FUTEX_WAIT_BITSET .
1491.TP
1492.B EINVAL
f9250b1a
MK
1493.RB ( FUTEX_CMP_REQUEUE_PI )
1494The kernel detected an inconsistency between the user-space state at
99c0041d
MK
1495.I uaddr2
1496and the kernel state;
ee65b0e8
MK
1497.\" From a conversation with Thomas Gleixner (Aug 2015): ###
1498.\" The kernel sees: I have non PI state for a futex you tried to
1499.\" tell me was PI
99c0041d
MK
1500that is, the kernel detected a waiter which waits via
1501.BR FUTEX_WAIT
8297383e
MK
1502or
1503.BR FUTEX_WAIT_BITSET
99c0041d
MK
1504on
1505.IR uaddr2 .
1506.TP
1507.B EINVAL
1508.RB ( FUTEX_CMP_REQUEUE_PI )
1509The kernel detected an inconsistency between the user-space state at
f9250b1a
MK
1510.I uaddr
1511and the kernel state;
1512that is, the kernel detected a waiter which waits via
75299c8d 1513.BR FUTEX_WAIT
99c0041d 1514or
75299c8d 1515.BR FUTEX_WAIT_BITESET
f9250b1a
MK
1516on
1517.IR uaddr .
1518.TP
1519.B EINVAL
99c0041d 1520.RB ( FUTEX_CMP_REQUEUE_PI )
75299c8d
MK
1521The kernel detected an inconsistency between the user-space state at
1522.I uaddr
1523and the kernel state;
1524that is, the kernel detected a waiter which waits on
1525.I uaddr
1526via
1527.BR FUTEX_LOCK_PI
1528(instead of
1529.BR FUTEX_WAIT_REQUEUE_PI ).
99c0041d
MK
1530.TP
1531.B EINVAL
9786b3ca 1532.RB ( FUTEX_CMP_REQUEUE_PI )
8297383e
MK
1533.\" This deals with the case:
1534.\" wait_requeue_pi(A, B);
1535.\" requeue_pi(A, C);
9786b3ca
MK
1536An attempt was made to requeue a waiter to a futex other than that
1537specified by the matching
1538.B FUTEX_WAIT_REQUEUE_PI
1539call for that waiter.
1540.TP
1541.B EINVAL
f0c0d61c
MK
1542.RB ( FUTEX_CMP_REQUEUE_PI )
1543The
1544.I val
1545argument is not 1.
1546.TP
1547.B EINVAL
4832b48a 1548Invalid argument.
fea681da 1549.TP
a449c634
MK
1550.BR ENOMEM
1551.RB ( FUTEX_LOCK_PI ,
e34a8fb6
MK
1552.BR FUTEX_TRYLOCK_PI ,
1553.BR FUTEX_CMP_REQUEUE_PI )
a449c634
MK
1554The kernel could not allocate memory to hold state information.
1555.TP
fea681da 1556.B ENFILE
ff597681 1557.RB ( FUTEX_FD )
e258766b 1558The system-wide limit on the total number of open files has been reached.
4701fc28
MK
1559.TP
1560.B ENOSYS
1561Invalid operation specified in
d33602c4 1562.IR futex_op .
9f6c40c0 1563.TP
4a7e5b05
MK
1564.B ENOSYS
1565The
1566.BR FUTEX_CLOCK_REALTIME
1567option was specified in
1afcee7c 1568.IR futex_op ,
4a7e5b05 1569but the accompanying operation was neither
017d194b
MK
1570.BR FUTEX_WAIT ,
1571.BR FUTEX_WAIT_BITSET ,
4a7e5b05
MK
1572nor
1573.BR FUTEX_WAIT_REQUEUE_PI .
1574.TP
a9dcb4d1
MK
1575.BR ENOSYS
1576.RB ( FUTEX_LOCK_PI ,
f2424fae 1577.BR FUTEX_TRYLOCK_PI ,
4945ff19 1578.BR FUTEX_UNLOCK_PI ,
4cf92894 1579.BR FUTEX_CMP_REQUEUE_PI ,
794bb106 1580.BR FUTEX_WAIT_REQUEUE_PI )
4b35dc5d 1581A run-time check determined that the operation is not available.
f0a9e8f4 1582The PI-futex operations are not implemented on all architectures and
077981d4 1583are not supported on some CPU variants.
a9dcb4d1 1584.TP
c7589177
MK
1585.BR EPERM
1586.RB ( FUTEX_LOCK_PI ,
dc2742a8
MK
1587.BR FUTEX_TRYLOCK_PI ,
1588.BR FUTEX_CMP_REQUEUE_PI )
04331c3f 1589The caller is not allowed to attach itself to the futex at
dc2742a8
MK
1590.I uaddr
1591(for
1592.BR FUTEX_CMP_REQUEUE_PI :
1593the futex at
1594.IR uaddr2 ).
c7589177
MK
1595(This may be caused by a state corruption in user space.)
1596.TP
76f347ba 1597.BR EPERM
87276709 1598.RB ( FUTEX_UNLOCK_PI )
4b35dc5d 1599The caller does not own the lock represented by the futex word.
76f347ba 1600.TP
0b0e4934
MK
1601.BR ESRCH
1602.RB ( FUTEX_LOCK_PI ,
9732dd8b
MK
1603.BR FUTEX_TRYLOCK_PI ,
1604.BR FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1605The thread ID in the futex word at
0b0e4934
MK
1606.I uaddr
1607does not exist.
1608.TP
360f773c
MK
1609.BR ESRCH
1610.RB ( FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1611The thread ID in the futex word at
360f773c
MK
1612.I uaddr2
1613does not exist.
1614.TP
9f6c40c0 1615.B ETIMEDOUT
4d85047f
MK
1616The operation in
1617.IR futex_op
1618employed the timeout specified in
1619.IR timeout ,
1620and the timeout expired before the operation completed.
70b06b90
MK
1621.\"
1622.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1623.\"
47297adb 1624.SH VERSIONS
a1d5f77c 1625.PP
81c9d87e
MK
1626Futexes were first made available in a stable kernel release
1627with Linux 2.6.0.
1628
4c8cb0ff
MK
1629Initial futex support was merged in Linux 2.5.7 but with different
1630semantics from what was described above.
52dee70e 1631A four-argument system call with the semantics
fd3fa7ef 1632described in this page was introduced in Linux 2.5.40.
d0442d14
MK
1633A fifth argument was added in Linux 2.5.70,
1634and a sixth argument was added in Linux 2.6.7.
47297adb 1635.SH CONFORMING TO
8382f16d 1636This system call is Linux-specific.
47297adb 1637.SH NOTES
baf0f1f4
MK
1638Glibc does not provide a wrapper for this system call; call it using
1639.BR syscall (2).
cf44281c 1640
02f7b623 1641Several higher-level programming abstractions are implemented via futexes,
e24fbf10 1642including POSIX semaphores and
02f7b623
MK
1643various POSIX threads synchronization mechanisms
1644(mutexes, condition variables, read-write locks, and barriers).
74f58a64
MK
1645.\" TODO FIXME(Torvald) Above, we cite this section and claim it contains
1646.\" details on the synchronization semantics; add the C11 equivalents
1647.\" here (or whatever we find consensus for).
305cc415
MK
1648.\"
1649.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1650.\"
1651.SH EXAMPLE
1652.\" FIXME Is it worth having an example program?
1653.\" FIXME Anything obviously broken in the example program?
1654.\"
bc54ed38
MK
1655The program below demonstrates use of futexes in a program where a parent
1656process and a child process use a pair of futexes located inside a
305cc415
MK
1657shared anonymous mapping to synchronize access to a shared resource:
1658the terminal.
1659The two processes each write
1660.IR nloops
1661(a command-line argument that defaults to 5 if omitted)
1662messages to the terminal and employ a synchronization protocol
1663that ensures that they alternate in writing messages.
1664Upon running this program we see output such as the following:
1665
1666.in +4n
1667.nf
1668$ \fB./futex_demo\fP
1669Parent (18534) 0
1670Child (18535) 0
1671Parent (18534) 1
1672Child (18535) 1
1673Parent (18534) 2
1674Child (18535) 2
1675Parent (18534) 3
1676Child (18535) 3
1677Parent (18534) 4
1678Child (18535) 4
1679.fi
1680.in
1681.SS Program source
1682\&
1683.nf
1684/* futex_demo.c
1685
1686 Usage: futex_demo [nloops]
1687 (Default: 5)
1688
1689 Demonstrate the use of futexes in a program where parent and child
1690 use a pair of futexes located inside a shared anonymous mapping to
1691 synchronize access to a shared resource: the terminal. The two
1692 processes each write \(aqnum\-loops\(aq messages to the terminal and employ
1693 a synchronization protocol that ensures that they alternate in
1694 writing messages.
1695*/
1696#define _GNU_SOURCE
1697#include <stdio.h>
1698#include <errno.h>
1699#include <stdlib.h>
1700#include <unistd.h>
1701#include <sys/wait.h>
1702#include <sys/mman.h>
1703#include <sys/syscall.h>
1704#include <linux/futex.h>
1705#include <sys/time.h>
1706
1707#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \\
1708 } while (0)
1709
1710static int *futex1, *futex2, *iaddr;
1711
1712static int
1713futex(int *uaddr, int futex_op, int val,
1714 const struct timespec *timeout, int *uaddr2, int val3)
1715{
1716 return syscall(SYS_futex, uaddr, futex_op, val,
1717 timeout, uaddr, val3);
1718}
1719
1720/* Acquire the futex pointed to by \(aqfutexp\(aq: wait for its value to
1721 become 1, and then set the value to 0. */
1722
1723static void
1724fwait(int *futexp)
1725{
1726 int s;
1727
1728 /* __sync_bool_compare_and_swap(ptr, oldval, newval) is a gcc
1729 built\-in function. It atomically performs the equivalent of:
1730
1731 if (*ptr == oldval)
1732 *ptr = newval;
1733
1734 It returns true if the test yielded true and *ptr was updated.
1735 The alternative here would be to employ the equivalent atomic
1736 machine\-language instructions. For further information, see
1737 the GCC Manual. */
1738
305cc415 1739 while (1) {
83e80dda 1740
63ad44cb 1741 /* Is the futex available? */
83e80dda 1742
305cc415
MK
1743 if (__sync_bool_compare_and_swap(futexp, 1, 0))
1744 break; /* Yes */
1745
63ad44cb 1746 /* Futex is not available; wait */
83e80dda 1747
63ad44cb
HS
1748 s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0);
1749 if (s == \-1 && errno != EAGAIN)
1750 errExit("futex\-FUTEX_WAIT");
305cc415
MK
1751 }
1752}
1753
1754/* Release the futex pointed to by \(aqfutexp\(aq: if the futex currently
1755 has the value 0, set its value to 1 and the wake any futex waiters,
1756 so that if the peer is blocked in fpost(), it can proceed. */
1757
1758static void
1759fpost(int *futexp)
1760{
1761 int s;
1762
1763 /* __sync_bool_compare_and_swap() was described in comments above */
1764
1765 if (__sync_bool_compare_and_swap(futexp, 0, 1)) {
1766
1767 s = futex(futexp, FUTEX_WAKE, 1, NULL, NULL, 0);
1768 if (s == \-1)
1769 errExit("futex\-FUTEX_WAKE");
1770 }
1771}
1772
1773int
1774main(int argc, char *argv[])
1775{
1776 pid_t childPid;
1777 int j, nloops;
1778
1779 setbuf(stdout, NULL);
1780
1781 nloops = (argc > 1) ? atoi(argv[1]) : 5;
1782
1783 /* Create a shared anonymous mapping that will hold the futexes.
1784 Since the futexes are being shared between processes, we
1785 subsequently use the "shared" futex operations (i.e., not the
1786 ones suffixed "_PRIVATE") */
1787
1788 iaddr = mmap(NULL, sizeof(int) * 2, PROT_READ | PROT_WRITE,
1789 MAP_ANONYMOUS | MAP_SHARED, \-1, 0);
1790 if (iaddr == MAP_FAILED)
1791 errExit("mmap");
1792
1793 futex1 = &iaddr[0];
1794 futex2 = &iaddr[1];
1795
1796 *futex1 = 0; /* State: unavailable */
1797 *futex2 = 1; /* State: available */
1798
1799 /* Create a child process that inherits the shared anonymous
35764662 1800 mapping */
305cc415
MK
1801
1802 childPid = fork();
92a46690 1803 if (childPid == \-1)
305cc415
MK
1804 errExit("fork");
1805
1806 if (childPid == 0) { /* Child */
1807 for (j = 0; j < nloops; j++) {
1808 fwait(futex1);
1809 printf("Child (%ld) %d\\n", (long) getpid(), j);
1810 fpost(futex2);
1811 }
1812
1813 exit(EXIT_SUCCESS);
1814 }
1815
1816 /* Parent falls through to here */
1817
1818 for (j = 0; j < nloops; j++) {
1819 fwait(futex2);
1820 printf("Parent (%ld) %d\\n", (long) getpid(), j);
1821 fpost(futex1);
1822 }
1823
1824 wait(NULL);
1825
1826 exit(EXIT_SUCCESS);
1827}
1828.fi
47297adb 1829.SH SEE ALSO
4c222281 1830.ad l
9913033c 1831.BR get_robust_list (2),
d806bc05 1832.BR restart_syscall (2),
e0074751 1833.BR pthread_mutexattr_getprotocol (3),
ac894879
MK
1834.BR futex (7),
1835.BR sched (7)
fea681da 1836.PP
f5ad572f
MK
1837The following kernel source files:
1838.IP * 2
1839.I Documentation/pi-futex.txt
1840.IP *
1841.I Documentation/futex-requeue-pi.txt
1842.IP *
1843.I Documentation/locking/rt-mutex.txt
1844.IP *
1845.I Documentation/locking/rt-mutex-design.txt
8fe019c7
MK
1846.IP *
1847.I Documentation/robust-futex-ABI.txt
43b99089 1848.PP
4c222281 1849Franke, H., Russell, R., and Kirwood, M., 2002.
52087dd3 1850\fIFuss, Futexes and Furwocks: Fast Userlevel Locking in Linux\fP
4c222281 1851(from proceedings of the Ottawa Linux Symposium 2002),
9b936e9e 1852.br
608bf950
SK
1853.UR http://kernel.org\:/doc\:/ols\:/2002\:/ols2002-pages-479-495.pdf
1854.UE
f42eb21b 1855
4c222281 1856Hart, D., 2009. \fIA futex overview and update\fP,
2ed26199
MK
1857.UR http://lwn.net/Articles/360699/
1858.UE
1859
4c222281 1860Hart, D. and Guniguntala, D., 2009.
0483b6cc 1861\fIRequeue-PI: Making Glibc Condvars PI-Aware\fP
4c222281 1862(from proceedings of the 2009 Real-Time Linux Workshop),
0483b6cc
MK
1863.UR http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
1864.UE
1865
4c222281 1866Drepper, U., 2011. \fIFutexes Are Tricky\fP,
f42eb21b
MK
1867.UR http://www.akkadia.org/drepper/futex.pdf
1868.UE
9b936e9e
MK
1869.PP
1870Futex example library, futex-*.tar.bz2 at
1871.br
a605264d 1872.UR ftp://ftp.kernel.org\:/pub\:/linux\:/kernel\:/people\:/rusty/
608bf950 1873.UE
34f14794
MK
1874.\"
1875.\" FIXME Are there any other resources that should be listed
1876.\" in the SEE ALSO section?
74f58a64 1877.\" FIXME(Torvald) We should probably refer to the glibc code here, in
4c8cb0ff
MK
1878.\" particular the glibc-internal futex wrapper functions that are
1879.\" WIP, and the generic pthread_mutex_t and perhaps condvar
1880.\" implementations.