]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/futex.2
standards.7: Relocate the discussion on POSIX manual pages
[thirdparty/man-pages.git] / man2 / futex.2
CommitLineData
8f0aff2a 1.\" Page by b.hubert
1abce893
MK
2.\" and Copyright (C) 2015, Thomas Gleixner <tglx@linutronix.de>
3.\" and Copyright (C) 2015, Michael Kerrisk <mtk.manpages@gmail.com>
2297bf0e 4.\"
2e46a6e7 5.\" %%%LICENSE_START(FREELY_REDISTRIBUTABLE)
8f0aff2a 6.\" may be freely modified and distributed
8ff7380d 7.\" %%%LICENSE_END
fea681da
MK
8.\"
9.\" Niki A. Rahimi (LTC Security Development, narahimi@us.ibm.com)
10.\" added ERRORS section.
11.\"
12.\" Modified 2004-06-17 mtk
13.\" Modified 2004-10-07 aeb, added FUTEX_REQUEUE, FUTEX_CMP_REQUEUE
14.\"
47f5c4ba 15.\" FIXME Still to integrate are some points from Torvald Riegel's mail of
9915ea23 16.\" 2015-01-23:
47f5c4ba
MK
17.\" http://thread.gmane.org/gmane.linux.kernel/1703405/focus=7977
18.\"
78e85692 19.\" FIXME Do we need to add some text regarding Torvald Riegel's 2015-01-24 mail
9915ea23 20.\" http://thread.gmane.org/gmane.linux.kernel/1703405/focus=1873242
02182e7c 21.\"
3fde8c2e 22.TH FUTEX 2 2020-11-01 "Linux" "Linux Programmer's Manual"
fea681da 23.SH NAME
ce154705 24futex \- fast user-space locking
fea681da 25.SH SYNOPSIS
9d9dc1e8 26.nf
68e4db0a 27.PP
8a60718e 28.B #include <linux/futex.h>
2253ecf0 29.B #include <stdint.h>
8a60718e 30.B #include <sys/time.h>
68e4db0a 31.PP
2253ecf0 32.BI "long futex(uint32_t *" uaddr ", int " futex_op ", uint32_t " val ,
768d3c23 33.BI " const struct timespec *" timeout , \
9bfc9cb1 34" \fR /* or: \fBuint32_t \fIval2\fP */"
2253ecf0 35.BI " uint32_t *" uaddr2 ", uint32_t " val3 );
9d9dc1e8 36.fi
dbfe9c70 37.PP
b939d6e4
MK
38.IR Note :
39There is no glibc wrapper for this system call; see NOTES.
47297adb 40.SH DESCRIPTION
fea681da 41The
e511ffb6 42.BR futex ()
4b35dc5d 43system call provides a method for waiting until a certain condition becomes
077981d4
MK
44true.
45It is typically used as a blocking construct in the context of
d45f244c
MK
46shared-memory synchronization.
47When using futexes, the majority of
48the synchronization operations are performed in user space.
bc54ed38 49A user-space program employs the
d45f244c 50.BR futex ()
ca4e5b2b 51system call only when it is likely that the program has to block for
4c8cb0ff 52a longer time until the condition becomes true.
bc54ed38 53Other
d45f244c 54.BR futex ()
bc54ed38
MK
55operations can be used to wake any processes or threads waiting
56for a particular condition.
efeece04 57.PP
7e8dcabc
MK
58A futex is a 32-bit value\(emreferred to below as a
59.IR "futex word" \(emwhose
60address is supplied to the
4b35dc5d 61.BR futex ()
7e8dcabc 62system call.
c3f4c019 63(Futexes are 32 bits in size on all platforms, including 64-bit systems.)
7e8dcabc
MK
64All futex operations are governed by this value.
65In order to share a futex between processes,
66the futex is placed in a region of shared memory,
67created using (for example)
68.BR mmap (2)
69or
70.BR shmat (2).
c3f4c019 71(Thus, the futex word may have different
7e8dcabc
MK
72virtual addresses in different processes,
73but these addresses all refer to the same location in physical memory.)
ca4e5b2b
MK
74In a multithreaded program, it is sufficient to place the futex word
75in a global variable shared by all threads.
efeece04 76.PP
0c3ec26b
MK
77When executing a futex operation that requests to block a thread,
78the kernel will block only if the futex word has the value that the
55f9e85e
MK
79calling thread supplied (as one of the arguments of the
80.BR futex ()
81call) as the expected value of the futex word.
9d32a39b
MK
82The loading of the futex word's value,
83the comparison of that value with the expected value,
bc54ed38 84and the actual blocking will happen atomically and will be totally ordered
da894b18 85with respect to concurrent operations performed by other threads
0fb87d16 86on the same futex word.
da894b18
MK
87.\" Notes from Darren Hart (Dec 2015):
88.\" Totally ordered with respect futex operations refers to semantics
89.\" of the ACQUIRE/RELEASE operations and how they impact ordering of
90.\" memory reads and writes. The kernel futex operations are protected
f6615c42 91.\" by spinlocks, which ensure that all operations are serialized
da894b18
MK
92.\" with respect to one another.
93.\"
94.\" This is a lot to attempt to define in this document. Perhaps a
95.\" reference to linux/Documentation/memory-barriers.txt as a footnote
96.\" would be sufficient? Or perhaps for this manual, "serialized" would
97.\" be sufficient, with a footnote regarding "totally ordered" and a
98.\" pointer to the memory-barrier documentation?
b80daba2 99Thus, the futex word is used to connect the synchronization in user space
9d32a39b 100with the implementation of blocking by the kernel.
55f9e85e 101Analogously to an atomic
4b35dc5d 102compare-and-exchange operation that potentially changes shared memory,
077981d4 103blocking via a futex is an atomic compare-and-block operation.
d6bb5a38 104.\" FIXME(Torvald Riegel):
61066e14
MK
105.\" Eventually we want to have some text in NOTES to satisfy
106.\" the reference in the following sentence
107.\" See NOTES for a detailed specification of
108.\" the synchronization semantics.
efeece04 109.PP
ca4e5b2b 110One use of futexes is for implementing locks.
c0dc758e
MK
111The state of the lock (i.e., acquired or not acquired)
112can be represented as an atomically accessed flag in shared memory.
4c8cb0ff 113In the uncontended case,
c3f4c019 114a thread can access or modify the lock state with atomic instructions,
4c8cb0ff
MK
115for example atomically changing it from not acquired to acquired
116using an atomic compare-and-exchange instruction.
55f9e85e
MK
117(Such instructions are performed entirely in user mode,
118and the kernel maintains no information about the lock state.)
119On the other hand, a thread may be unable to acquire a lock because
8e754e12 120it is already acquired by another thread.
55f9e85e 121It then may pass the lock's flag as a futex word and the value
0c3ec26b 122representing the acquired state as the expected value to a
8e754e12
HS
123.BR futex ()
124wait operation.
55f9e85e 125This
8e754e12 126.BR futex ()
bc54ed38 127operation will block if and only if the lock is still acquired
f6615c42 128(i.e., the value in the futex word still matches the "acquired state").
077981d4 129When releasing the lock, a thread has to first reset the
0c3ec26b 130lock state to not acquired and then execute a futex
55f9e85e 131operation that wakes threads blocked on the lock flag used as a futex word
f6615c42 132(this can be further optimized to avoid unnecessary wake-ups).
077981d4 133See
4b35dc5d
TR
134.BR futex (7)
135for more detail on how to use futexes.
efeece04 136.PP
4b35dc5d 137Besides the basic wait and wake-up futex functionality, there are further
077981d4 138futex operations aimed at supporting more complex use cases.
efeece04 139.PP
ca4e5b2b 140Note that
2af84f99 141no explicit initialization or destruction is necessary to use futexes;
4c8cb0ff
MK
142the kernel maintains a futex
143(i.e., the kernel-internal implementation artifact)
4b35dc5d
TR
144only while operations such as
145.BR FUTEX_WAIT ,
146described below, are being performed on a particular futex word.
a663ca5a
MK
147.\"
148.SS Arguments
fea681da
MK
149The
150.I uaddr
077981d4
MK
151argument points to the futex word.
152On all platforms, futexes are four-byte
4b35dc5d 153integers that must be aligned on a four-byte boundary.
f388ba70
MK
154The operation to perform on the futex is specified in the
155.I futex_op
156argument;
157.IR val
158is a value whose meaning and purpose depends on
159.IR futex_op .
efeece04 160.PP
36ab2074
MK
161The remaining arguments
162.RI ( timeout ,
163.IR uaddr2 ,
164and
165.IR val3 )
166are required only for certain of the futex operations described below.
167Where one of these arguments is not required, it is ignored.
efeece04 168.PP
36ab2074
MK
169For several blocking operations, the
170.I timeout
171argument is a pointer to a
172.IR timespec
173structure that specifies a timeout for the operation.
174However, notwithstanding the prototype shown above, for some operations,
eb4aa521
MK
175the least significant four bytes of this argument are instead
176used as an integer whose meaning is determined by the operation.
768d3c23
MK
177For these operations, the kernel casts the
178.I timeout
10022b8e
HS
179value first to
180.IR "unsigned long",
181then to
c6dc40a2 182.IR uint32_t ,
768d3c23
MK
183and in the remainder of this page, this argument is referred to as
184.I val2
185when interpreted in this fashion.
efeece04 186.PP
de5a3bb4 187Where it is required, the
36ab2074 188.IR uaddr2
4c8cb0ff
MK
189argument is a pointer to a second futex word that is employed
190by the operation.
efeece04 191.PP
36ab2074
MK
192The interpretation of the final integer argument,
193.IR val3 ,
194depends on the operation.
a663ca5a
MK
195.\"
196.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
197.\"
198.SS Futex operations
6be4bad7 199The
d33602c4 200.I futex_op
6be4bad7
MK
201argument consists of two parts:
202a command that specifies the operation to be performed,
5d771a4a 203bitwise ORed with zero or more options that
6be4bad7 204modify the behaviour of the operation.
fc30eb79 205The options that may be included in
d33602c4 206.I futex_op
fc30eb79
TG
207are as follows:
208.TP
209.BR FUTEX_PRIVATE_FLAG " (since Linux 2.6.22)"
210.\" commit 34f01cc1f512fa783302982776895c73714ebbc2
211This option bit can be employed with all futex operations.
e45f9735 212It tells the kernel that the futex is process-private and not shared
0c3ec26b
MK
213with another process (i.e., it is being used for synchronization
214only between threads of the same process).
943ccc52
MK
215This allows the kernel to make some additional performance optimizations.
216.\" I.e., It allows the kernel choose the fast path for validating
217.\" the user-space address and avoids expensive VMA lookups,
218.\" taking reference counts on file backing store, and so on.
efeece04 219.IP
ae2c1774
MK
220As a convenience,
221.IR <linux/futex.h>
222defines a set of constants with the suffix
223.BR _PRIVATE
224that are equivalents of all of the operations listed below,
dcdfde26 225.\" except the obsolete FUTEX_FD, for which the "private" flag was
ae2c1774
MK
226.\" meaningless
227but with the
228.BR FUTEX_PRIVATE_FLAG
229ORed into the constant value.
230Thus, there are
231.BR FUTEX_WAIT_PRIVATE ,
232.BR FUTEX_WAKE_PRIVATE ,
233and so on.
2e98bbc2
TG
234.TP
235.BR FUTEX_CLOCK_REALTIME " (since Linux 2.6.28)"
236.\" commit 1acdac104668a0834cfa267de9946fac7764d486
4a7e5b05 237This option bit can be employed only with the
949ceae3
MK
238.BR FUTEX_WAIT_BITSET ,
239.BR FUTEX_WAIT_REQUEUE_PI ,
2e98bbc2 240and
949ceae3
MK
241(since Linux 4.5)
242.\" commit 337f13046ff03717a9e99675284a817527440a49
6f19879d 243.BR FUTEX_WAIT
c84cf68c 244operations.
efeece04 245.IP
8064bfa5 246If this option is set, the kernel measures the
f2103b26 247.I timeout
8064bfa5
MK
248against the
249.BR CLOCK_REALTIME
250clock.
efeece04 251.IP
8064bfa5 252If this option is not set, the kernel measures the
f2103b26 253.I timeout
8064bfa5 254against the
1c952cf5
MK
255.BR CLOCK_MONOTONIC
256clock.
6be4bad7
MK
257.PP
258The operation specified in
d33602c4 259.I futex_op
6be4bad7 260is one of the following:
70b06b90
MK
261.\"
262.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
263.\"
fea681da 264.TP
81c9d87e
MK
265.BR FUTEX_WAIT " (since Linux 2.6.0)"
266.\" Strictly speaking, since some time in 2.5.x
f065673c 267This operation tests that the value at the
4b35dc5d 268futex word pointed to by the address
fea681da 269.I uaddr
4b35dc5d 270still contains the expected value
fea681da 271.IR val ,
fd105614 272and if so, then sleeps waiting for a
682edefb 273.B FUTEX_WAKE
fd105614 274operation on the futex word.
077981d4 275The load of the value of the futex word is an atomic memory
4b35dc5d 276access (i.e., using atomic machine instructions of the respective
077981d4
MK
277architecture).
278This load, the comparison with the expected value, and
fd105614 279starting to sleep are performed atomically
da56650a 280.\" FIXME: Torvald, I think we may need to add some explanation of
61066e14 281.\" "totally ordered" here.
fd105614
MK
282and totally ordered
283with respect to other futex operations on the same futex word.
c0dc758e
MK
284If the thread starts to sleep,
285it is considered a waiter on this futex word.
f065673c
MK
286If the futex value does not match
287.IR val ,
4710334a 288then the call fails immediately with the error
badbf70c 289.BR EAGAIN .
efeece04 290.IP
4b35dc5d 291The purpose of the comparison with the expected value is to prevent lost
fd105614
MK
292wake-ups.
293If another thread changed the value of the futex word after the
c0dc758e
MK
294calling thread decided to block based on the prior value,
295and if the other thread executed a
4b35dc5d
TR
296.BR FUTEX_WAKE
297operation (or similar wake-up) after the value change and before this
f065673c 298.BR FUTEX_WAIT
bc54ed38
MK
299operation, then the calling thread will observe the
300value change and will not start to sleep.
efeece04 301.IP
c13182ef 302If the
fea681da 303.I timeout
40d2dab9 304is not NULL, the structure it points to specifies a
40d2dab9 305timeout for the wait.
ac991a11
MK
306(This interval will be rounded up to the system clock granularity,
307and is guaranteed not to expire early.)
a6918f1d 308The timeout is by default measured according to the
1c952cf5 309.BR CLOCK_MONOTONIC
a01c3098
MK
310clock, but, since Linux 4.5, the
311.BR CLOCK_REALTIME
312clock can be selected by specifying
313.BR FUTEX_CLOCK_REALTIME
314in
315.IR futex_op .
82a6092b
MK
316If
317.I timeout
318is NULL, the call blocks indefinitely.
efeece04 319.IP
4100abc5
MK
320.IR Note :
321for
322.BR FUTEX_WAIT ,
323.IR timeout
324is interpreted as a
325.IR relative
326value.
327This differs from other futex operations, where
328.I timeout
329is interpreted as an absolute value.
330To obtain the equivalent of
331.BR FUTEX_WAIT
332with an absolute timeout, employ
333.BR FUTEX_WAIT_BITSET
334with
335.IR val3
336specified as
337.BR FUTEX_BITSET_MATCH_ANY .
efeece04 338.IP
c13182ef 339The arguments
fea681da
MK
340.I uaddr2
341and
342.I val3
343are ignored.
9915ea23
MK
344.\" FIXME . (Torvald) I think we should remove this. Or maybe adapt to a
345.\" different example.
346.\"
347.\" For
348.\" .BR futex (7),
349.\" this call is executed if decrementing the count gave a negative value
350.\" (indicating contention),
351.\" and will sleep until another process or thread releases
352.\" the futex and executes the
353.\" .B FUTEX_WAKE
354.\" operation.
70b06b90
MK
355.\"
356.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
357.\"
fea681da 358.TP
81c9d87e
MK
359.BR FUTEX_WAKE " (since Linux 2.6.0)"
360.\" Strictly speaking, since Linux 2.5.x
f065673c
MK
361This operation wakes at most
362.I val
4b35dc5d 363of the waiters that are waiting (e.g., inside
f065673c 364.BR FUTEX_WAIT )
4b35dc5d 365on the futex word at the address
f065673c
MK
366.IR uaddr .
367Most commonly,
368.I val
369is specified as either 1 (wake up a single waiter) or
370.BR INT_MAX
371(wake up all waiters).
730bfbda
MK
372No guarantee is provided about which waiters are awoken
373(e.g., a waiter with a higher scheduling priority is not guaranteed
374to be awoken in preference to a waiter with a lower priority).
efeece04 375.IP
fea681da
MK
376The arguments
377.IR timeout ,
c8b921bd 378.IR uaddr2 ,
fea681da
MK
379and
380.I val3
381are ignored.
9915ea23
MK
382.\" FIXME . (Torvald) I think we should remove this. Or maybe adapt to
383.\" a different example.
384.\"
4c8cb0ff
MK
385.\" For
386.\" .BR futex (7),
387.\" this is executed if incrementing the count showed that
388.\" there were waiters,
389.\" once the futex value has been set to 1
390.\" (indicating that it is available).
391.\"
9915ea23 392.\" How does "incrementing the count show that there were waiters"?
70b06b90
MK
393.\"
394.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
395.\"
a7c2bf45
MK
396.TP
397.BR FUTEX_FD " (from Linux 2.6.0 up to and including Linux 2.6.25)"
398.\" Strictly speaking, from Linux 2.5.x to 2.6.25
4c8cb0ff
MK
399This operation creates a file descriptor that is associated with
400the futex at
a7c2bf45 401.IR uaddr .
bdc5957a
MK
402The caller must close the returned file descriptor after use.
403When another process or thread performs a
a7c2bf45 404.BR FUTEX_WAKE
4b35dc5d 405on the futex word, the file descriptor indicates as being readable with
a7c2bf45
MK
406.BR select (2),
407.BR poll (2),
408and
409.BR epoll (7)
efeece04 410.IP
f1d2171d 411The file descriptor can be used to obtain asynchronous notifications: if
a7c2bf45 412.I val
ca4e5b2b 413is nonzero, then, when another process or thread executes a
a7c2bf45
MK
414.BR FUTEX_WAKE ,
415the caller will receive the signal number that was passed in
416.IR val .
efeece04 417.IP
a7c2bf45
MK
418The arguments
419.IR timeout ,
d556548b 420.IR uaddr2 ,
a7c2bf45
MK
421and
422.I val3
423are ignored.
efeece04 424.IP
a7c2bf45
MK
425Because it was inherently racy,
426.B FUTEX_FD
427has been removed
428.\" commit 82af7aca56c67061420d618cc5a30f0fd4106b80
429from Linux 2.6.26 onward.
70b06b90
MK
430.\"
431.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
432.\"
a7c2bf45
MK
433.TP
434.BR FUTEX_REQUEUE " (since Linux 2.6.0)"
a7c2bf45 435This operation performs the same task as
27dd3a6e
MK
436.BR FUTEX_CMP_REQUEUE
437(see below), except that no check is made using the value in
a7c2bf45
MK
438.IR val3 .
439(The argument
440.I val3
441is ignored.)
70b06b90
MK
442.\"
443.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
444.\"
a7c2bf45
MK
445.TP
446.BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)"
4b35dc5d 447This operation first checks whether the location
a7c2bf45
MK
448.I uaddr
449still contains the value
450.IR val3 .
451If not, the operation fails with the error
452.BR EAGAIN .
4b35dc5d 453Otherwise, the operation wakes up a maximum of
a7c2bf45
MK
454.I val
455waiters that are waiting on the futex at
456.IR uaddr .
457If there are more than
458.I val
459waiters, then the remaining waiters are removed
460from the wait queue of the source futex at
461.I uaddr
462and added to the wait queue of the target futex at
463.IR uaddr2 .
464The
768d3c23 465.I val2
936876a9 466argument specifies an upper limit on the number of waiters
a7c2bf45 467that are requeued to the futex at
768d3c23 468.IR uaddr2 .
efeece04 469.IP
d6bb5a38
MK
470.\" FIXME(Torvald) Is the following correct? Or is just the decision
471.\" which threads to wake or requeue part of the atomic operation?
4b35dc5d
TR
472The load from
473.I uaddr
4c8cb0ff
MK
474is an atomic memory access (i.e., using atomic machine instructions of
475the respective architecture).
077981d4 476This load, the comparison with
4b35dc5d 477.IR val3 ,
4c8cb0ff
MK
478and the requeueing of any waiters are performed atomically and totally
479ordered with respect to other operations on the same futex word.
ee65b0e8
MK
480.\" Notes from a f2f conversation with Thomas Gleixner (Aug 2015): ###
481.\" The operation is serialized with respect to operations on both
482.\" source and target futex. No other waiter can enqueue itself
483.\" for waiting and no other waiter can dequeue itself because of
484.\" a timeout or signal.
efeece04 485.IP
a7c2bf45
MK
486Typical values to specify for
487.I val
ed1819cf 488are 0 or 1.
a7c2bf45
MK
489(Specifying
490.BR INT_MAX
491is not useful, because it would make the
492.BR FUTEX_CMP_REQUEUE
493operation equivalent to
494.BR FUTEX_WAKE .)
936876a9 495The limit value specified via
768d3c23
MK
496.I val2
497is typically either 1 or
a7c2bf45
MK
498.BR INT_MAX .
499(Specifying the argument as 0 is not useful, because it would make the
500.BR FUTEX_CMP_REQUEUE
501operation equivalent to
502.BR FUTEX_WAIT .)
efeece04 503.IP
627b50ce
MK
504The
505.B FUTEX_CMP_REQUEUE
506operation was added as a replacement for the earlier
507.BR FUTEX_REQUEUE .
508The difference is that the check of the value at
509.I uaddr
510can be used to ensure that requeueing happens only under certain
511conditions, which allows race conditions to be avoided in certain use cases.
dcb410c3 512.\" But, as Rich Felker points out, there remain valid use cases for
627b50ce
MK
513.\" FUTEX_REQUEUE, for example, when the calling thread is requeuing
514.\" the target(s) to a lock that the calling thread owns
515.\" From: Rich Felker <dalias@libc.org>
516.\" Date: Wed, 29 Oct 2014 22:43:17 -0400
517.\" To: Darren Hart <dvhart@infradead.org>
518.\" CC: libc-alpha@sourceware.org, ...
519.\" Subject: Re: Add futex wrapper to glibc?
efeece04 520.IP
627b50ce
MK
521Both
522.BR FUTEX_REQUEUE
523and
524.BR FUTEX_CMP_REQUEUE
525can be used to avoid "thundering herd" wake-ups that could occur when using
526.B FUTEX_WAKE
527in cases where all of the waiters that are woken need to acquire
528another futex.
529Consider the following scenario,
530where multiple waiter threads are waiting on B,
531a wait queue implemented using a futex:
efeece04 532.IP
627b50ce 533.in +4n
b76974c1 534.EX
627b50ce
MK
535lock(A)
536while (!check_value(V)) {
537 unlock(A);
538 block_on(B);
539 lock(A);
540};
541unlock(A);
b76974c1 542.EE
627b50ce 543.in
efeece04 544.IP
627b50ce
MK
545If a waker thread used
546.BR FUTEX_WAKE ,
547then all waiters waiting on B would be woken up,
67c67ff2 548and they would all try to acquire lock A.
627b50ce
MK
549However, waking all of the threads in this manner would be pointless because
550all except one of the threads would immediately block on lock A again.
551By contrast, a requeue operation wakes just one waiter and moves
552the other waiters to lock A,
553and when the woken waiter unlocks A then the next waiter can proceed.
43d16602 554.\"
70b06b90
MK
555.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
556.\"
fea681da 557.TP
d67e21f5
MK
558.BR FUTEX_WAKE_OP " (since Linux 2.6.14)"
559.\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721
6bac3b85
MK
560.\" Author: Jakub Jelinek <jakub@redhat.com>
561.\" Date: Tue Sep 6 15:16:25 2005 -0700
9915ea23 562.\" FIXME. (Torvald) The glibc condvar implementation is currently being
4c8cb0ff
MK
563.\" revised (e.g., to not use an internal lock anymore).
564.\" It is probably more future-proof to remove this paragraph.
d6bb5a38 565.\" [Torvald, do you have an update here?]
6bac3b85
MK
566This operation was added to support some user-space use cases
567where more than one futex must be handled at the same time.
568The most notable example is the implementation of
569.BR pthread_cond_signal (3),
570which requires operations on two futexes,
571the one used to implement the mutex and the one used in the implementation
572of the wait queue associated with the condition variable.
573.BR FUTEX_WAKE_OP
574allows such cases to be implemented without leading to
575high rates of contention and context switching.
efeece04 576.IP
6bac3b85 577The
57f2d48b 578.BR FUTEX_WAKE_OP
e61abc20 579operation is equivalent to executing the following code atomically
4c8cb0ff
MK
580and totally ordered with respect to other futex operations on
581any of the two supplied futex words:
efeece04 582.IP
6bac3b85 583.in +4n
b76974c1 584.EX
2253ecf0
AC
585uint32_t oldval = *(uint32_t *) uaddr2;
586*(uint32_t *) uaddr2 = oldval \fIop\fP \fIoparg\fP;
6bac3b85
MK
587futex(uaddr, FUTEX_WAKE, val, 0, 0, 0);
588if (oldval \fIcmp\fP \fIcmparg\fP)
768d3c23 589 futex(uaddr2, FUTEX_WAKE, val2, 0, 0, 0);
b76974c1 590.EE
6bac3b85 591.in
efeece04 592.IP
6bac3b85 593In other words,
57f2d48b 594.BR FUTEX_WAKE_OP
6bac3b85
MK
595does the following:
596.RS
597.IP * 3
4b35dc5d
TR
598saves the original value of the futex word at
599.IR uaddr2
600and performs an operation to modify the value of the futex at
6bac3b85 601.IR uaddr2 ;
4c8cb0ff
MK
602this is an atomic read-modify-write memory access (i.e., using atomic
603machine instructions of the respective architecture)
6bac3b85
MK
604.IP *
605wakes up a maximum of
606.I val
4b35dc5d 607waiters on the futex for the futex word at
6bac3b85
MK
608.IR uaddr ;
609and
610.IP *
4c8cb0ff
MK
611dependent on the results of a test of the original value of the
612futex word at
6bac3b85
MK
613.IR uaddr2 ,
614wakes up a maximum of
768d3c23 615.I val2
4b35dc5d 616waiters on the futex for the futex word at
6bac3b85
MK
617.IR uaddr2 .
618.RE
619.IP
6bac3b85
MK
620The operation and comparison that are to be performed are encoded
621in the bits of the argument
622.IR val3 .
623Pictorially, the encoding is:
efeece04 624.IP
3cf61490 625.in +4n
b76974c1 626.EX
f6af90e7
MK
627+---+---+-----------+-----------+
628|op |cmp| oparg | cmparg |
629+---+---+-----------+-----------+
630 4 4 12 12 <== # of bits
b76974c1 631.EE
6bac3b85 632.in
efeece04 633.IP
6bac3b85 634Expressed in code, the encoding is:
efeece04 635.IP
6bac3b85 636.in +4n
b76974c1 637.EX
d1a71985
MK
638#define FUTEX_OP(op, oparg, cmp, cmparg) \e
639 (((op & 0xf) << 28) | \e
640 ((cmp & 0xf) << 24) | \e
641 ((oparg & 0xfff) << 12) | \e
6bac3b85 642 (cmparg & 0xfff))
b76974c1 643.EE
6bac3b85 644.in
efeece04 645.IP
6bac3b85
MK
646In the above,
647.I op
648and
649.I cmp
650are each one of the codes listed below.
651The
652.I oparg
653and
654.I cmparg
655components are literal numeric values, except as noted below.
efeece04 656.IP
6bac3b85
MK
657The
658.I op
659component has one of the following values:
efeece04 660.IP
6bac3b85 661.in +4n
b76974c1 662.EX
6bac3b85
MK
663FUTEX_OP_SET 0 /* uaddr2 = oparg; */
664FUTEX_OP_ADD 1 /* uaddr2 += oparg; */
665FUTEX_OP_OR 2 /* uaddr2 |= oparg; */
af2d18b2 666FUTEX_OP_ANDN 3 /* uaddr2 &= \(tioparg; */
9ca13180 667FUTEX_OP_XOR 4 /* uaddr2 \(ha= oparg; */
b76974c1 668.EE
6bac3b85 669.in
efeece04 670.IP
5d771a4a 671In addition, bitwise ORing the following value into
6bac3b85
MK
672.I op
673causes
674.IR "(1\ <<\ oparg)"
675to be used as the operand:
efeece04 676.IP
6bac3b85 677.in +4n
b76974c1 678.EX
6bac3b85 679FUTEX_OP_ARG_SHIFT 8 /* Use (1 << oparg) as operand */
b76974c1 680.EE
6bac3b85 681.in
efeece04 682.IP
6bac3b85
MK
683The
684.I cmp
685field is one of the following:
efeece04 686.IP
6bac3b85 687.in +4n
b76974c1 688.EX
6bac3b85
MK
689FUTEX_OP_CMP_EQ 0 /* if (oldval == cmparg) wake */
690FUTEX_OP_CMP_NE 1 /* if (oldval != cmparg) wake */
691FUTEX_OP_CMP_LT 2 /* if (oldval < cmparg) wake */
692FUTEX_OP_CMP_LE 3 /* if (oldval <= cmparg) wake */
693FUTEX_OP_CMP_GT 4 /* if (oldval > cmparg) wake */
694FUTEX_OP_CMP_GE 5 /* if (oldval >= cmparg) wake */
b76974c1 695.EE
6bac3b85 696.in
efeece04 697.IP
6bac3b85
MK
698The return value of
699.BR FUTEX_WAKE_OP
700is the sum of the number of waiters woken on the futex
701.IR uaddr
702plus the number of waiters woken on the futex
703.IR uaddr2 .
70b06b90
MK
704.\"
705.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
706.\"
d67e21f5 707.TP
79c9b436
TG
708.BR FUTEX_WAIT_BITSET " (since Linux 2.6.25)"
709.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
fd9e59d4 710This operation is like
79c9b436
TG
711.BR FUTEX_WAIT
712except that
713.I val3
84abf4ba 714is used to provide a 32-bit bit mask to the kernel.
2ae96e8a 715This bit mask, in which at least one bit must be set,
6c38ce7f 716is stored in the kernel-internal state of the waiter.
79c9b436
TG
717See the description of
718.BR FUTEX_WAKE_BITSET
719for further details.
efeece04 720.IP
8064bfa5
MK
721If
722.I timeout
723is not NULL, the structure it points to specifies
724an absolute timeout for the wait operation.
725If
726.I timeout
727is NULL, the operation can block indefinitely.
efeece04 728.IP
79c9b436
TG
729The
730.I uaddr2
731argument is ignored.
70b06b90
MK
732.\"
733.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
734.\"
79c9b436 735.TP
d67e21f5
MK
736.BR FUTEX_WAKE_BITSET " (since Linux 2.6.25)"
737.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
55cc422d
TG
738This operation is the same as
739.BR FUTEX_WAKE
740except that the
e24fbf10 741.I val3
5e1456d4 742argument is used to provide a 32-bit bit mask to the kernel.
6c38ce7f
MK
743This bit mask, in which at least one bit must be set,
744is used to select which waiters should be woken up.
5d771a4a 745The selection is done by a bitwise AND of the "wake" bit mask
98d769c0
MK
746(i.e., the value in
747.IR val3 )
5e1456d4
MK
748and the bit mask which is stored in the kernel-internal
749state of the waiter (the "wait" bit mask that is set using
98d769c0
MK
750.BR FUTEX_WAIT_BITSET ).
751All of the waiters for which the result of the AND is nonzero are woken up;
752the remaining waiters are left sleeping.
efeece04 753.IP
e9d4496b
MK
754The effect of
755.BR FUTEX_WAIT_BITSET
756and
757.BR FUTEX_WAKE_BITSET
9732dd8b
MK
758is to allow selective wake-ups among multiple waiters that are blocked
759on the same futex.
ac894879 760However, note that, depending on the use case,
5e1456d4 761employing this bit-mask multiplexing feature on a
ac894879 762futex can be less efficient than simply using multiple futexes,
5e1456d4 763because employing bit-mask multiplexing requires the kernel
e9d4496b
MK
764to check all waiters on a futex,
765including those that are not interested in being woken up
5e1456d4 766(i.e., they do not have the relevant bit set in their "wait" bit mask).
e9d4496b
MK
767.\" According to http://locklessinc.com/articles/futex_cheat_sheet/:
768.\"
769.\" "The original reason for the addition of these extensions
770.\" was to improve the performance of pthread read-write locks
771.\" in glibc. However, the pthreads library no longer uses the
772.\" same locking algorithm, and these extensions are not used
773.\" without the bitset parameter being all ones.
e24fbf10 774.\"
e9d4496b 775.\" The page goes on to note that the FUTEX_WAIT_BITSET operation
5e1456d4 776.\" is nevertheless used (with a bit mask of all ones) in order to
e9d4496b
MK
777.\" obtain the absolute timeout functionality that is useful
778.\" for efficiently implementing Pthreads APIs (which use absolute
779.\" timeouts); FUTEX_WAIT provides only relative timeouts.
efeece04 780.IP
678c9986
MK
781The constant
782.BR FUTEX_BITSET_MATCH_ANY ,
783which corresponds to all 32 bits set in the bit mask, can be used as the
784.I val3
785argument for
786.BR FUTEX_WAIT_BITSET
98d769c0 787and
678c9986
MK
788.BR FUTEX_WAKE_BITSET .
789Other than differences in the handling of the
98d769c0 790.I timeout
678c9986 791argument, the
9732dd8b 792.BR FUTEX_WAIT
678c9986 793operation is equivalent to
9732dd8b 794.BR FUTEX_WAIT_BITSET
678c9986
MK
795with
796.IR val3
797specified as
798.BR FUTEX_BITSET_MATCH_ANY ;
799that is, allow a wake-up by any waker.
800The
801.BR FUTEX_WAKE
802operation is equivalent to
9732dd8b 803.BR FUTEX_WAKE_BITSET
678c9986
MK
804with
805.IR val3
806specified as
807.BR FUTEX_BITSET_MATCH_ANY ;
808that is, wake up any waiter(s).
efeece04 809.IP
678c9986
MK
810The
811.I uaddr2
812and
813.I timeout
814arguments are ignored.
bd90a5f9 815.\"
70b06b90 816.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
bd90a5f9
MK
817.\"
818.SS Priority-inheritance futexes
b52e1cd4
MK
819Linux supports priority-inheritance (PI) futexes in order to handle
820priority-inversion problems that can be encountered with
821normal futex locks.
b565548b 822Priority inversion is the problem that occurs when a high-priority
bdc5957a
MK
823task is blocked waiting to acquire a lock held by a low-priority task,
824while tasks at an intermediate priority continuously preempt
825the low-priority task from the CPU.
826Consequently, the low-priority task makes no progress toward
827releasing the lock, and the high-priority task remains blocked.
efeece04 828.PP
7d20efd7
MK
829Priority inheritance is a mechanism for dealing with
830the priority-inversion problem.
bdc5957a
MK
831With this mechanism, when a high-priority task becomes blocked
832by a lock held by a low-priority task,
9cee832c
MK
833the priority of the low-priority task is temporarily raised
834to that of the high-priority task,
bdc5957a 835so that it is not preempted by any intermediate level tasks,
7d20efd7
MK
836and can thus make progress toward releasing the lock.
837To be effective, priority inheritance must be transitive,
bdc5957a 838meaning that if a high-priority task blocks on a lock
ca4e5b2b 839held by a lower-priority task that is itself blocked by a lock
bdc5957a 840held by another intermediate-priority task
7d20efd7 841(and so on, for chains of arbitrary length),
b0f35fbb 842then both of those tasks
bdc5957a
MK
843(or more generally, all of the tasks in a lock chain)
844have their priorities raised to be the same as the high-priority task.
efeece04 845.PP
9e2b90ee 846From a user-space perspective,
39e9b2e1
MK
847what makes a futex PI-aware is a policy agreement (described below)
848between user space and the kernel about the value of the futex word,
601399f3
MK
849coupled with the use of the PI-futex operations described below.
850(Unlike the other futex operations described above,
851the PI-futex operations are designed
852for the implementation of very specific IPC mechanisms.)
853.\"
9e2b90ee
MK
854.\" Quoting Darren Hart:
855.\" These opcodes paired with the PI futex value policy (described below)
856.\" defines a "futex" as PI aware. These were created very specifically
857.\" in support of PI pthread_mutexes, so it makes a lot more sense to
858.\" talk about a PI aware pthread_mutex, than a PI aware futex, since
859.\" there is a lot of policy and scaffolding that has to be built up
860.\" around it to use it properly (this is what a PI pthread_mutex is).
efeece04 861.PP
ac894879 862.\" mtk: The following text is drawn from the Hart/Guniguntala paper
1af427a4 863.\" (listed in SEE ALSO), but I have reworded some pieces
8d825152 864.\" significantly.
79d918c7 865.\"
f0a9e8f4 866The PI-futex operations described below differ from the other
4b35dc5d
TR
867futex operations in that they impose policy on the use of the value of the
868futex word:
79d918c7 869.IP * 3
4b35dc5d 870If the lock is not acquired, the futex word's value shall be 0.
79d918c7 871.IP *
4c8cb0ff
MK
872If the lock is acquired, the futex word's value shall
873be the thread ID (TID;
4b35dc5d 874see
79d918c7
MK
875.BR gettid (2))
876of the owning thread.
877.IP *
79d918c7
MK
878If the lock is owned and there are threads contending for the lock,
879then the
880.B FUTEX_WAITERS
4b35dc5d 881bit shall be set in the futex word's value; in other words, this value is:
efeece04 882.IP
79d918c7 883 FUTEX_WAITERS | TID
601399f3
MK
884.IP
885(Note that is invalid for a PI futex word to have no owner and
886.BR FUTEX_WAITERS
887set.)
79d918c7
MK
888.PP
889With this policy in place,
fd105614 890a user-space application can acquire an unacquired
601399f3 891lock or release a lock using atomic instructions executed in user mode
fd105614 892(e.g., a compare-and-swap operation such as
b52e1cd4
MK
893.I cmpxchg
894on the x86 architecture).
4c8cb0ff
MK
895Acquiring a lock simply consists of using compare-and-swap to atomically
896set the futex word's value to the caller's TID if its previous value was 0.
4b35dc5d
TR
897Releasing a lock requires using compare-and-swap to set the futex word's
898value to 0 if the previous value was the expected TID.
efeece04 899.PP
4b35dc5d 900If a futex is already acquired (i.e., has a nonzero value),
b52e1cd4 901waiters must employ the
79d918c7
MK
902.B FUTEX_LOCK_PI
903operation to acquire the lock.
4b35dc5d 904If other threads are waiting for the lock, then the
79d918c7 905.B FUTEX_WAITERS
4c8cb0ff
MK
906bit is set in the futex value;
907in this case, the lock owner must employ the
79d918c7 908.B FUTEX_UNLOCK_PI
b52e1cd4 909operation to release the lock.
efeece04 910.PP
79d918c7
MK
911In the cases where callers are forced into the kernel
912(i.e., required to perform a
913.BR futex ()
0c3ec26b 914call),
79d918c7
MK
915they then deal directly with a so-called RT-mutex,
916a kernel locking mechanism which implements the required
917priority-inheritance semantics.
918After the RT-mutex is acquired, the futex value is updated accordingly,
919before the calling thread returns to user space.
efeece04 920.PP
a59fca75 921It is important to note
ac894879 922.\" tglx (July 2015):
30239c10
MK
923.\" If there are multiple waiters on a pi futex then a wake pi operation
924.\" will wake the first waiter and hand over the lock to this waiter. This
925.\" includes handing over the rtmutex which represents the futex in the
926.\" kernel. The strict requirement is that the futex owner and the rtmutex
927.\" owner must be the same, except for the update period which is
928.\" serialized by the futex internal locking. That means the kernel must
1d09c150 929.\" update the user-space value prior to returning to user space
4b35dc5d 930that the kernel will update the futex word's value prior
79d918c7 931to returning to user space.
601399f3
MK
932(This prevents the possibility of the futex word's value ending
933up in an invalid state, such as having an owner but the value being 0,
934or having waiters but not having the
935.B FUTEX_WAITERS
936bit set.)
efeece04 937.PP
30239c10
MK
938If a futex has an associated RT-mutex in the kernel
939(i.e., there are blocked waiters)
940and the owner of the futex/RT-mutex dies unexpectedly,
941then the kernel cleans up the RT-mutex and hands it over to the next waiter.
942This in turn requires that the user-space value is updated accordingly.
943To indicate that this is required, the kernel sets the
944.B FUTEX_OWNER_DIED
945bit in the futex word along with the thread ID of the new owner.
8adaf0a7
MK
946User space can detect this situation via the presence of the
947.B FUTEX_OWNER_DIED
948bit and is then responsible for cleaning up the stale state left over by
1d09c150 949the dead owner.
30239c10
MK
950.\" tglx (July 2015):
951.\" The FUTEX_OWNER_DIED bit can also be set on uncontended futexes, where
952.\" the kernel has no state associated. This happens via the robust futex
953.\" mechanism. In that case the futex value will be set to
954.\" FUTEX_OWNER_DIED. The robust futex mechanism is also available for non
955.\" PI futexes.
efeece04 956.PP
601399f3
MK
957PI futexes are operated on by specifying one of the values listed below in
958.IR futex_op .
959Note that the PI futex operations must be used as paired operations
960and are subject to some additional requirements:
961.IP * 3
962.B FUTEX_LOCK_PI
963and
964.B FUTEX_TRYLOCK_PI
965pair with
d8012462 966.BR FUTEX_UNLOCK_PI .
601399f3
MK
967.B FUTEX_UNLOCK_PI
968must be called only on a futex owned by the calling thread,
969as defined by the value policy, otherwise the error
970.B EPERM
971results.
972.IP *
973.B FUTEX_WAIT_REQUEUE_PI
974pairs with
975.BR FUTEX_CMP_REQUEUE_PI .
976This must be performed from a non-PI futex to a distinct PI futex
977(or the error
978.B EINVAL
979results).
980Additionally,
981.I val
982(the number of waiters to be woken) must be 1
983(or the error
984.B EINVAL
985results).
11ac5b51 986.PP
601399f3 987The PI futex operations are as follows:
70b06b90
MK
988.\"
989.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
990.\"
d67e21f5
MK
991.TP
992.BR FUTEX_LOCK_PI " (since Linux 2.6.18)"
993.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
bc54ed38 994This operation is used after an attempt to acquire
fd105614 995the lock via an atomic user-mode instruction failed
4b35dc5d 996because the futex word has a nonzero value\(emspecifically,
8297383e 997because it contained the (PID-namespace-specific) TID of the lock owner.
efeece04 998.IP
4b35dc5d 999The operation checks the value of the futex word at the address
67833bec 1000.IR uaddr .
70b06b90
MK
1001If the value is 0, then the kernel tries to atomically set
1002the futex value to the caller's TID.
c3875d1d 1003If the futex word's value is nonzero,
67833bec 1004the kernel atomically sets the
e0547e70 1005.B FUTEX_WAITERS
67833bec
MK
1006bit, which signals the futex owner that it cannot unlock the futex in
1007user space atomically by setting the futex value to 0.
c3875d1d
MK
1008.\" tglx (July 2015):
1009.\" The operation here is similar to the FUTEX_WAIT logic. When the user
1010.\" space atomic acquire does not succeed because the futex value was non
1011.\" zero, then the waiter goes into the kernel, takes the kernel internal
1012.\" lock and retries the acquisition under the lock. If the acquisition
1013.\" does not succeed either, then it sets the FUTEX_WAITERS bit, to signal
1014.\" the lock owner that it needs to go into the kernel. Here is the pseudo
1015.\" code:
1016.\"
1017.\" lock(kernel_lock);
1018.\" retry:
9bfc9cb1 1019.\"
c3875d1d
MK
1020.\" /*
1021.\" * Owner might have unlocked in userspace before we
1022.\" * were able to set the waiter bit.
1023.\" */
1024.\" if (atomic_acquire(futex) == SUCCESS) {
1025.\" unlock(kernel_lock());
1026.\" return 0;
1027.\" }
1028.\"
1029.\" /*
1030.\" * Owner might have unlocked after the above atomic_acquire()
1031.\" * attempt.
1032.\" */
1033.\" if (atomic_set_waiters_bit(futex) != SUCCESS)
1034.\" goto retry;
1035.\"
1036.\" queue_waiter();
1037.\" unlock(kernel_lock);
1038.\" block();
1039.\"
1040After that, the kernel:
1041.RS
1042.IP 1. 3
1043Tries to find the thread which is associated with the owner TID.
1044.IP 2.
1045Creates or reuses kernel state on behalf of the owner.
1046(If this is the first waiter, there is no kernel state for this
1047futex, so kernel state is created by locking the RT-mutex
1048and the futex owner is made the owner of the RT-mutex.
1049If there are existing waiters, then the existing state is reused.)
1050.IP 3.
ca4e5b2b 1051Attaches the waiter to the futex
c3875d1d
MK
1052(i.e., the waiter is enqueued on the RT-mutex waiter list).
1053.RE
1054.IP
ac894879
MK
1055If more than one waiter exists,
1056the enqueueing of the waiter is in descending priority order.
1057(For information on priority ordering, see the discussion of the
1058.BR SCHED_DEADLINE ,
1059.BR SCHED_FIFO ,
1060and
1061.BR SCHED_RR
1062scheduling policies in
1063.BR sched (7).)
1064The owner inherits either the waiter's CPU bandwidth
1065(if the waiter is scheduled under the
1066.BR SCHED_DEADLINE
1067policy) or the waiter's priority (if the waiter is scheduled under the
1068.BR SCHED_RR
1069or
1070.BR SCHED_FIFO
1071policy).
1d09c150
MK
1072.\" August 2015:
1073.\" mtk: If the realm is restricted purely to SCHED_OTHER (SCHED_NORMAL)
1074.\" processes, does the nice value come into play also?
1075.\"
1076.\" tglx: No. SCHED_OTHER/NORMAL tasks are handled in FIFO order
c3875d1d 1077This inheritance follows the lock chain in the case of nested locking
ca4e5b2b
MK
1078.\" (i.e., task 1 blocks on lock A, held by task 2,
1079.\" while task 2 blocks on lock B, held by task 3)
c3875d1d 1080and performs deadlock detection.
efeece04 1081.IP
e0547e70
TG
1082The
1083.I timeout
9ce19cf1 1084argument provides a timeout for the lock attempt.
8064bfa5
MK
1085If
1086.I timeout
1087is not NULL, the structure it points to specifies
1088an absolute timeout, measured against the
9ce19cf1
MK
1089.BR CLOCK_REALTIME
1090clock.
c082f385
MK
1091.\" 2016-07-07 response from Thomas Gleixner on LKML:
1092.\" From: Thomas Gleixner <tglx@linutronix.de>
1093.\" Date: 6 July 2016 at 20:57
1094.\" Subject: Re: futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
2ae96e8a 1095.\"
c082f385
MK
1096.\" On Thu, 23 Jun 2016, Michael Kerrisk (man-pages) wrote:
1097.\" > On 06/23/2016 08:28 PM, Darren Hart wrote:
1098.\" > > And as a follow-on, what is the reason for FUTEX_LOCK_PI only using
1099.\" > > CLOCK_REALTIME? It seems reasonable to me that a user may want to wait a
1100.\" > > specific amount of time, regardless of wall time.
1101.\" >
1102.\" > Yes, that's another weird inconsistency.
2ae96e8a 1103.\"
c082f385
MK
1104.\" The reason is that phtread_mutex_timedlock() uses absolute timeouts based on
1105.\" CLOCK_REALTIME. glibc folks asked to make that the default behaviour back
1106.\" then when we added LOCK_PI.
9ce19cf1
MK
1107If
1108.I timeout
1109is NULL, the operation will block indefinitely.
efeece04 1110.IP
a449c634 1111The
e0547e70
TG
1112.IR uaddr2 ,
1113.IR val ,
1114and
1115.IR val3
a449c634 1116arguments are ignored.
67833bec 1117.\"
70b06b90
MK
1118.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1119.\"
d67e21f5 1120.TP
12fdbe23 1121.BR FUTEX_TRYLOCK_PI " (since Linux 2.6.18)"
d67e21f5 1122.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
3fbb1be1 1123This operation tries to acquire the lock at
12fdbe23 1124.IR uaddr .
c3875d1d
MK
1125It is invoked when a user-space atomic acquire did not
1126succeed because the futex word was not 0.
efeece04 1127.IP
8adaf0a7
MK
1128Because the kernel has access to more state information than user space,
1129acquisition of the lock might succeed if performed by the
1130kernel in cases where the futex word
1131(i.e., the state information accessible to use-space) contains stale state
c3875d1d
MK
1132.RB ( FUTEX_WAITERS
1133and/or
1134.BR FUTEX_OWNER_DIED ).
1135This can happen when the owner of the futex died.
1d09c150
MK
1136User space cannot handle this condition in a race-free manner,
1137but the kernel can fix this up and acquire the futex.
ee65b0e8
MK
1138.\" Paraphrasing a f2f conversation with Thomas Gleixner about the
1139.\" above point (Aug 2015): ###
1140.\" There is a rare possibility of a race condition involving an
1141.\" uncontended futex with no owner, but with waiters. The
1142.\" kernel-user-space contract is that if a futex is nonzero, you must
1143.\" go into kernel. The futex was owned by a task, and that task dies
1144.\" but there are no waiters, so the futex value is non zero.
1145.\" Therefore, the next locker has to go into the kernel,
1146.\" so that the kernel has a chance to clean up. (CMXCH on zero
1147.\" in user space would fail, so kernel has to clean up.)
8adaf0a7
MK
1148.\" Darren Hart (Oct 2015):
1149.\" The trylock in the kernel has more state, so it can independently
1150.\" verify the flags that userspace must trust implicitly.
efeece04 1151.IP
084744ef
MK
1152The
1153.IR uaddr2 ,
1154.IR val ,
1155.IR timeout ,
1156and
1157.IR val3
1158arguments are ignored.
70b06b90
MK
1159.\"
1160.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1161.\"
d67e21f5 1162.TP
12fdbe23 1163.BR FUTEX_UNLOCK_PI " (since Linux 2.6.18)"
d67e21f5 1164.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
d4ba4328 1165This operation wakes the top priority waiter that is waiting in
ecae2099
TG
1166.B FUTEX_LOCK_PI
1167on the futex address provided by the
1168.I uaddr
1169argument.
efeece04 1170.IP
1d09c150 1171This is called when the user-space value at
ecae2099
TG
1172.I uaddr
1173cannot be changed atomically from a TID (of the owner) to 0.
efeece04 1174.IP
ecae2099
TG
1175The
1176.IR uaddr2 ,
1177.IR val ,
1178.IR timeout ,
1179and
1180.IR val3
11a194bf 1181arguments are ignored.
70b06b90
MK
1182.\"
1183.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1184.\"
d67e21f5 1185.TP
d67e21f5
MK
1186.BR FUTEX_CMP_REQUEUE_PI " (since Linux 2.6.31)"
1187.\" commit 52400ba946759af28442dee6265c5c0180ac7122
f812a08b
DH
1188This operation is a PI-aware variant of
1189.BR FUTEX_CMP_REQUEUE .
1190It requeues waiters that are blocked via
1191.B FUTEX_WAIT_REQUEUE_PI
1192on
1193.I uaddr
1194from a non-PI source futex
1195.RI ( uaddr )
1196to a PI target futex
1197.RI ( uaddr2 ).
efeece04 1198.IP
9e54d26d
MK
1199As with
1200.BR FUTEX_CMP_REQUEUE ,
1201this operation wakes up a maximum of
1202.I val
1203waiters that are waiting on the futex at
1204.IR uaddr .
1205However, for
1206.BR FUTEX_CMP_REQUEUE_PI ,
1207.I val
6fbeb8f4 1208is required to be 1
939ca89f 1209(since the main point is to avoid a thundering herd).
9e54d26d
MK
1210The remaining waiters are removed from the wait queue of the source futex at
1211.I uaddr
1212and added to the wait queue of the target futex at
1213.IR uaddr2 .
efeece04 1214.IP
9e54d26d 1215The
768d3c23 1216.I val2
c6d8cf21
MK
1217.\" val2 is the cap on the number of requeued waiters.
1218.\" In the glibc pthread_cond_broadcast() implementation, this argument
1219.\" is specified as INT_MAX, and for pthread_cond_signal() it is 0.
9e54d26d 1220and
768d3c23 1221.I val3
9e54d26d
MK
1222arguments serve the same purposes as for
1223.BR FUTEX_CMP_REQUEUE .
70b06b90 1224.\"
8297383e 1225.\" The page at http://locklessinc.com/articles/futex_cheat_sheet/
be376673 1226.\" notes that "priority-inheritance Futex to priority-inheritance
8297383e
MK
1227.\" Futex requeues are currently unsupported". However, probably
1228.\" the page does not need to say nothing about this, since
1229.\" Thomas Gleixner commented (July 2015): "they never will be
1230.\" supported because they make no sense at all"
70b06b90
MK
1231.\"
1232.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1233.\"
d67e21f5
MK
1234.TP
1235.BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)"
1236.\" commit 52400ba946759af28442dee6265c5c0180ac7122
70b06b90 1237.\"
c3875d1d 1238Wait on a non-PI futex at
6ff1b4c0 1239.I uaddr
c3875d1d
MK
1240and potentially be requeued (via a
1241.BR FUTEX_CMP_REQUEUE_PI
1242operation in another task) onto a PI futex at
6ff1b4c0
TG
1243.IR uaddr2 .
1244The wait operation on
1245.I uaddr
c3875d1d 1246is the same as for
6ff1b4c0 1247.BR FUTEX_WAIT .
efeece04 1248.IP
6ff1b4c0
TG
1249The waiter can be removed from the wait on
1250.I uaddr
6ff1b4c0 1251without requeueing on
c3875d1d
MK
1252.IR uaddr2
1253via a
1d09c150 1254.BR FUTEX_WAKE
c3875d1d
MK
1255operation in another task.
1256In this case, the
1257.BR FUTEX_WAIT_REQUEUE_PI
3fbb1be1
MK
1258operation fails with the error
1259.BR EAGAIN .
efeece04 1260.IP
63bea7dc
MK
1261If
1262.I timeout
8064bfa5
MK
1263is not NULL, the structure it points to specifies
1264an absolute timeout for the wait operation.
63bea7dc
MK
1265If
1266.I timeout
1267is NULL, the operation can block indefinitely.
efeece04 1268.IP
a4e69912
MK
1269The
1270.I val3
1271argument is ignored.
efeece04 1272.IP
abb571e8
MK
1273The
1274.BR FUTEX_WAIT_REQUEUE_PI
1275and
1276.BR FUTEX_CMP_REQUEUE_PI
1277were added to support a fairly specific use case:
1278support for priority-inheritance-aware POSIX threads condition variables.
1279The idea is that these operations should always be paired,
1280in order to ensure that user space and the kernel remain in sync.
1281Thus, in the
1282.BR FUTEX_WAIT_REQUEUE_PI
1283operation, the user-space application pre-specifies the target
1284of the requeue that takes place in the
1285.BR FUTEX_CMP_REQUEUE_PI
1286operation.
1287.\"
1288.\" Darren Hart notes that a patch to allow glibc to fully support
1af427a4 1289.\" PI-aware pthreads condition variables has not yet been accepted into
abb571e8
MK
1290.\" glibc. The story is complex, and can be found at
1291.\" https://sourceware.org/bugzilla/show_bug.cgi?id=11588
1292.\" Darren notes that in the meantime, the patch is shipped with various
1af427a4 1293.\" PREEMPT_RT-enabled Linux systems.
abb571e8
MK
1294.\"
1295.\" Related to the preceding, Darren proposed that somewhere, man-pages
1296.\" should document the following point:
1af427a4 1297.\"
4c8cb0ff
MK
1298.\" While the Linux kernel, since 2.6.31, supports requeueing of
1299.\" priority-inheritance (PI) aware mutexes via the
1300.\" FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI futex operations,
1301.\" the glibc implementation does not yet take full advantage of this.
1302.\" Specifically, the condvar internal data lock remains a non-PI aware
1303.\" mutex, regardless of the type of the pthread_mutex associated with
1304.\" the condvar. This can lead to an unbounded priority inversion on
1305.\" the internal data lock even when associating a PI aware
1306.\" pthread_mutex with a condvar during a pthread_cond*_wait
1307.\" operation. For this reason, it is not recommended to rely on
1308.\" priority inheritance when using pthread condition variables.
1af427a4
MK
1309.\"
1310.\" The problem is that the obvious location for this text is
1311.\" the pthread_cond*wait(3) man page. However, such a man page
abb571e8 1312.\" does not currently exist.
70b06b90 1313.\"
6700de24 1314.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
70b06b90 1315.\"
47297adb 1316.SH RETURN VALUE
a5c5a06a
MK
1317In the event of an error (and assuming that
1318.BR futex ()
1319was invoked via
1320.BR syscall (2)),
1321all operations return \-1 and set
e808bba0 1322.I errno
6f147f79 1323to indicate the cause of the error.
efeece04 1324.PP
e808bba0
MK
1325The return value on success depends on the operation,
1326as described in the following list:
fea681da
MK
1327.TP
1328.B FUTEX_WAIT
077981d4 1329Returns 0 if the caller was woken up.
4c8cb0ff
MK
1330Note that a wake-up can also be caused by common futex usage patterns
1331in unrelated code that happened to have previously used the futex word's
1332memory location (e.g., typical futex-based implementations of
1333Pthreads mutexes can cause this under some conditions).
1334Therefore, callers should always conservatively assume that a return
1335value of 0 can mean a spurious wake-up, and use the futex word's value
bc54ed38
MK
1336(i.e., the user-space synchronization scheme)
1337to decide whether to continue to block or not.
fea681da
MK
1338.TP
1339.B FUTEX_WAKE
bdc5957a 1340Returns the number of waiters that were woken up.
fea681da
MK
1341.TP
1342.B FUTEX_FD
1343Returns the new file descriptor associated with the futex.
1344.TP
1345.B FUTEX_REQUEUE
bdc5957a 1346Returns the number of waiters that were woken up.
fea681da
MK
1347.TP
1348.B FUTEX_CMP_REQUEUE
bdc5957a 1349Returns the total number of waiters that were woken up or
4b35dc5d 1350requeued to the futex for the futex word at
3dfcc11d
MK
1351.IR uaddr2 .
1352If this value is greater than
1353.IR val ,
fd105614 1354then the difference is the number of waiters requeued to the futex for the
4c8cb0ff 1355futex word at
3dfcc11d 1356.IR uaddr2 .
dcad19c0
MK
1357.TP
1358.B FUTEX_WAKE_OP
a8b5b324 1359Returns the total number of waiters that were woken up.
4c8cb0ff
MK
1360This is the sum of the woken waiters on the two futexes for
1361the futex words at
a8b5b324
MK
1362.I uaddr
1363and
1364.IR uaddr2 .
dcad19c0
MK
1365.TP
1366.B FUTEX_WAIT_BITSET
077981d4
MK
1367Returns 0 if the caller was woken up.
1368See
4b35dc5d
TR
1369.B FUTEX_WAIT
1370for how to interpret this correctly in practice.
dcad19c0
MK
1371.TP
1372.B FUTEX_WAKE_BITSET
bdc5957a 1373Returns the number of waiters that were woken up.
dcad19c0
MK
1374.TP
1375.B FUTEX_LOCK_PI
bf02a260 1376Returns 0 if the futex was successfully locked.
dcad19c0
MK
1377.TP
1378.B FUTEX_TRYLOCK_PI
5c716eef 1379Returns 0 if the futex was successfully locked.
dcad19c0
MK
1380.TP
1381.B FUTEX_UNLOCK_PI
52bb928f 1382Returns 0 if the futex was successfully unlocked.
dcad19c0
MK
1383.TP
1384.B FUTEX_CMP_REQUEUE_PI
bdc5957a 1385Returns the total number of waiters that were woken up or
4b35dc5d 1386requeued to the futex for the futex word at
dddd395a
MK
1387.IR uaddr2 .
1388If this value is greater than
1389.IR val ,
4c8cb0ff
MK
1390then difference is the number of waiters requeued to the futex for
1391the futex word at
dddd395a 1392.IR uaddr2 .
dcad19c0
MK
1393.TP
1394.B FUTEX_WAIT_REQUEUE_PI
4c8cb0ff
MK
1395Returns 0 if the caller was successfully requeued to the futex for
1396the futex word at
22c15de9 1397.IR uaddr2 .
70b06b90
MK
1398.\"
1399.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1400.\"
fea681da
MK
1401.SH ERRORS
1402.TP
1403.B EACCES
4b35dc5d 1404No read access to the memory of a futex word.
fea681da
MK
1405.TP
1406.B EAGAIN
f48516d1 1407.RB ( FUTEX_WAIT ,
4b35dc5d 1408.BR FUTEX_WAIT_BITSET ,
f48516d1 1409.BR FUTEX_WAIT_REQUEUE_PI )
badbf70c
MK
1410The value pointed to by
1411.I uaddr
1412was not equal to the expected value
1413.I val
1414at the time of the call.
efeece04 1415.IP
9732dd8b
MK
1416.BR Note :
1417on Linux, the symbolic names
1418.B EAGAIN
1419and
1420.B EWOULDBLOCK
77da5feb 1421(both of which appear in different parts of the kernel futex code)
9732dd8b 1422have the same value.
badbf70c
MK
1423.TP
1424.B EAGAIN
8f2068bb
MK
1425.RB ( FUTEX_CMP_REQUEUE ,
1426.BR FUTEX_CMP_REQUEUE_PI )
ce5602fd 1427The value pointed to by
9f6c40c0
МК
1428.I uaddr
1429is not equal to the expected value
1430.IR val3 .
fea681da 1431.TP
5662f56a
MK
1432.BR EAGAIN
1433.RB ( FUTEX_LOCK_PI ,
aaec9032
MK
1434.BR FUTEX_TRYLOCK_PI ,
1435.BR FUTEX_CMP_REQUEUE_PI )
1436The futex owner thread ID of
1437.I uaddr
1438(for
1439.BR FUTEX_CMP_REQUEUE_PI :
1440.IR uaddr2 )
1441is about to exit,
5662f56a
MK
1442but has not yet handled the internal state cleanup.
1443Try again.
1444.TP
7a39e745
MK
1445.BR EDEADLK
1446.RB ( FUTEX_LOCK_PI ,
9732dd8b
MK
1447.BR FUTEX_TRYLOCK_PI ,
1448.BR FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1449The futex word at
7a39e745
MK
1450.I uaddr
1451is already locked by the caller.
1452.TP
662c0da8 1453.BR EDEADLK
c3875d1d 1454.\" FIXME . I see that kernel/locking/rtmutex.c uses EDEADLK in some
d6bb5a38 1455.\" places, and EDEADLOCK in others. On almost all architectures
4c8cb0ff
MK
1456.\" these constants are synonymous. Is there a reason that both
1457.\" names are used?
8297383e
MK
1458.\"
1459.\" tglx (July 2015): "No. We should probably fix that."
1460.\"
662c0da8 1461.RB ( FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1462While requeueing a waiter to the PI futex for the futex word at
662c0da8
MK
1463.IR uaddr2 ,
1464the kernel detected a deadlock.
1465.TP
fea681da 1466.B EFAULT
1ea901e8
MK
1467A required pointer argument (i.e.,
1468.IR uaddr ,
1469.IR uaddr2 ,
1470or
1471.IR timeout )
496df304 1472did not point to a valid user-space address.
fea681da 1473.TP
9f6c40c0 1474.B EINTR
e808bba0 1475A
9f6c40c0 1476.B FUTEX_WAIT
2674f781
MK
1477or
1478.B FUTEX_WAIT_BITSET
e808bba0 1479operation was interrupted by a signal (see
f529fd20
MK
1480.BR signal (7)).
1481In kernels before Linux 2.6.22, this error could also be returned for
b5fff4ea 1482a spurious wakeup; since Linux 2.6.22, this no longer happens.
9f6c40c0 1483.TP
fea681da 1484.B EINVAL
180f97b7
MK
1485The operation in
1486.IR futex_op
1487is one of those that employs a timeout, but the supplied
fb2f4c27
MK
1488.I timeout
1489argument was invalid
1490.RI ( tv_sec
1491was less than zero, or
1492.IR tv_nsec
cabee29d 1493was not less than 1,000,000,000).
fb2f4c27
MK
1494.TP
1495.B EINVAL
0c74df0b 1496The operation specified in
025e1374 1497.IR futex_op
0c74df0b 1498employs one or both of the pointers
51ee94be 1499.I uaddr
a1f47699 1500and
0c74df0b
MK
1501.IR uaddr2 ,
1502but one of these does not point to a valid object\(emthat is,
1503the address is not four-byte-aligned.
51ee94be
MK
1504.TP
1505.B EINVAL
55cc422d
TG
1506.RB ( FUTEX_WAIT_BITSET ,
1507.BR FUTEX_WAKE_BITSET )
5e1456d4 1508The bit mask supplied in
79c9b436
TG
1509.IR val3
1510is zero.
1511.TP
1512.B EINVAL
2abcba67 1513.RB ( FUTEX_CMP_REQUEUE_PI )
add875c0
MK
1514.I uaddr
1515equals
1516.IR uaddr2
1517(i.e., an attempt was made to requeue to the same futex).
1518.TP
ff597681
MK
1519.BR EINVAL
1520.RB ( FUTEX_FD )
1521The signal number supplied in
1522.I val
1523is invalid.
1524.TP
6bac3b85 1525.B EINVAL
476debd7
MK
1526.RB ( FUTEX_WAKE ,
1527.BR FUTEX_WAKE_OP ,
1528.BR FUTEX_WAKE_BITSET ,
1529.BR FUTEX_REQUEUE ,
1530.BR FUTEX_CMP_REQUEUE )
1531The kernel detected an inconsistency between the user-space state at
1532.I uaddr
1533and the kernel state\(emthat is, it detected a waiter which waits in
1534.BR FUTEX_LOCK_PI
1535on
1536.IR uaddr .
1537.TP
1538.B EINVAL
a218ef20 1539.RB ( FUTEX_LOCK_PI ,
ce022f18
MK
1540.BR FUTEX_TRYLOCK_PI ,
1541.BR FUTEX_UNLOCK_PI )
a218ef20
MK
1542The kernel detected an inconsistency between the user-space state at
1543.I uaddr
1544and the kernel state.
ce022f18 1545This indicates either state corruption
ce022f18 1546or that the kernel found a waiter on
a218ef20
MK
1547.I uaddr
1548which is waiting via
1549.BR FUTEX_WAIT
1550or
1551.BR FUTEX_WAIT_BITSET .
1552.TP
1553.B EINVAL
f9250b1a
MK
1554.RB ( FUTEX_CMP_REQUEUE_PI )
1555The kernel detected an inconsistency between the user-space state at
99c0041d
MK
1556.I uaddr2
1557and the kernel state;
ee65b0e8
MK
1558.\" From a conversation with Thomas Gleixner (Aug 2015): ###
1559.\" The kernel sees: I have non PI state for a futex you tried to
1560.\" tell me was PI
99c0041d
MK
1561that is, the kernel detected a waiter which waits via
1562.BR FUTEX_WAIT
8297383e
MK
1563or
1564.BR FUTEX_WAIT_BITSET
99c0041d
MK
1565on
1566.IR uaddr2 .
1567.TP
1568.B EINVAL
1569.RB ( FUTEX_CMP_REQUEUE_PI )
1570The kernel detected an inconsistency between the user-space state at
f9250b1a
MK
1571.I uaddr
1572and the kernel state;
1573that is, the kernel detected a waiter which waits via
75299c8d 1574.BR FUTEX_WAIT
99c0041d 1575or
75299c8d 1576.BR FUTEX_WAIT_BITESET
f9250b1a
MK
1577on
1578.IR uaddr .
1579.TP
1580.B EINVAL
99c0041d 1581.RB ( FUTEX_CMP_REQUEUE_PI )
75299c8d
MK
1582The kernel detected an inconsistency between the user-space state at
1583.I uaddr
1584and the kernel state;
1585that is, the kernel detected a waiter which waits on
1586.I uaddr
1587via
1588.BR FUTEX_LOCK_PI
1589(instead of
1590.BR FUTEX_WAIT_REQUEUE_PI ).
99c0041d
MK
1591.TP
1592.B EINVAL
9786b3ca 1593.RB ( FUTEX_CMP_REQUEUE_PI )
8297383e
MK
1594.\" This deals with the case:
1595.\" wait_requeue_pi(A, B);
1596.\" requeue_pi(A, C);
9786b3ca
MK
1597An attempt was made to requeue a waiter to a futex other than that
1598specified by the matching
1599.B FUTEX_WAIT_REQUEUE_PI
1600call for that waiter.
1601.TP
1602.B EINVAL
f0c0d61c
MK
1603.RB ( FUTEX_CMP_REQUEUE_PI )
1604The
1605.I val
1606argument is not 1.
1607.TP
1608.B EINVAL
4832b48a 1609Invalid argument.
fea681da 1610.TP
d07d4ef3
MK
1611.B ENFILE
1612.RB ( FUTEX_FD )
1613The system-wide limit on the total number of open files has been reached.
1614.TP
a449c634
MK
1615.BR ENOMEM
1616.RB ( FUTEX_LOCK_PI ,
e34a8fb6
MK
1617.BR FUTEX_TRYLOCK_PI ,
1618.BR FUTEX_CMP_REQUEUE_PI )
a449c634
MK
1619The kernel could not allocate memory to hold state information.
1620.TP
4701fc28
MK
1621.B ENOSYS
1622Invalid operation specified in
d33602c4 1623.IR futex_op .
9f6c40c0 1624.TP
4a7e5b05
MK
1625.B ENOSYS
1626The
1627.BR FUTEX_CLOCK_REALTIME
1628option was specified in
1afcee7c 1629.IR futex_op ,
4a7e5b05 1630but the accompanying operation was neither
017d194b
MK
1631.BR FUTEX_WAIT ,
1632.BR FUTEX_WAIT_BITSET ,
4a7e5b05
MK
1633nor
1634.BR FUTEX_WAIT_REQUEUE_PI .
1635.TP
a9dcb4d1
MK
1636.BR ENOSYS
1637.RB ( FUTEX_LOCK_PI ,
f2424fae 1638.BR FUTEX_TRYLOCK_PI ,
4945ff19 1639.BR FUTEX_UNLOCK_PI ,
4cf92894 1640.BR FUTEX_CMP_REQUEUE_PI ,
794bb106 1641.BR FUTEX_WAIT_REQUEUE_PI )
4b35dc5d 1642A run-time check determined that the operation is not available.
f0a9e8f4 1643The PI-futex operations are not implemented on all architectures and
077981d4 1644are not supported on some CPU variants.
a9dcb4d1 1645.TP
c7589177
MK
1646.BR EPERM
1647.RB ( FUTEX_LOCK_PI ,
dc2742a8
MK
1648.BR FUTEX_TRYLOCK_PI ,
1649.BR FUTEX_CMP_REQUEUE_PI )
04331c3f 1650The caller is not allowed to attach itself to the futex at
dc2742a8
MK
1651.I uaddr
1652(for
1653.BR FUTEX_CMP_REQUEUE_PI :
1654the futex at
1655.IR uaddr2 ).
c7589177
MK
1656(This may be caused by a state corruption in user space.)
1657.TP
76f347ba 1658.BR EPERM
87276709 1659.RB ( FUTEX_UNLOCK_PI )
4b35dc5d 1660The caller does not own the lock represented by the futex word.
76f347ba 1661.TP
0b0e4934
MK
1662.BR ESRCH
1663.RB ( FUTEX_LOCK_PI ,
9732dd8b
MK
1664.BR FUTEX_TRYLOCK_PI ,
1665.BR FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1666The thread ID in the futex word at
0b0e4934
MK
1667.I uaddr
1668does not exist.
1669.TP
360f773c
MK
1670.BR ESRCH
1671.RB ( FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1672The thread ID in the futex word at
360f773c
MK
1673.I uaddr2
1674does not exist.
1675.TP
9f6c40c0 1676.B ETIMEDOUT
4d85047f
MK
1677The operation in
1678.IR futex_op
1679employed the timeout specified in
1680.IR timeout ,
1681and the timeout expired before the operation completed.
70b06b90
MK
1682.\"
1683.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1684.\"
47297adb 1685.SH VERSIONS
81c9d87e
MK
1686Futexes were first made available in a stable kernel release
1687with Linux 2.6.0.
efeece04 1688.PP
4c8cb0ff
MK
1689Initial futex support was merged in Linux 2.5.7 but with different
1690semantics from what was described above.
52dee70e 1691A four-argument system call with the semantics
fd3fa7ef 1692described in this page was introduced in Linux 2.5.40.
d0442d14
MK
1693A fifth argument was added in Linux 2.5.70,
1694and a sixth argument was added in Linux 2.6.7.
47297adb 1695.SH CONFORMING TO
8382f16d 1696This system call is Linux-specific.
47297adb 1697.SH NOTES
baf0f1f4
MK
1698Glibc does not provide a wrapper for this system call; call it using
1699.BR syscall (2).
efeece04 1700.PP
02f7b623 1701Several higher-level programming abstractions are implemented via futexes,
e24fbf10 1702including POSIX semaphores and
02f7b623
MK
1703various POSIX threads synchronization mechanisms
1704(mutexes, condition variables, read-write locks, and barriers).
74f58a64
MK
1705.\" TODO FIXME(Torvald) Above, we cite this section and claim it contains
1706.\" details on the synchronization semantics; add the C11 equivalents
1707.\" here (or whatever we find consensus for).
305cc415
MK
1708.\"
1709.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1710.\"
a14af333 1711.SH EXAMPLES
bc54ed38
MK
1712The program below demonstrates use of futexes in a program where a parent
1713process and a child process use a pair of futexes located inside a
305cc415
MK
1714shared anonymous mapping to synchronize access to a shared resource:
1715the terminal.
1716The two processes each write
1717.IR nloops
1718(a command-line argument that defaults to 5 if omitted)
1719messages to the terminal and employ a synchronization protocol
1720that ensures that they alternate in writing messages.
1721Upon running this program we see output such as the following:
efeece04 1722.PP
305cc415 1723.in +4n
b76974c1 1724.EX
305cc415
MK
1725$ \fB./futex_demo\fP
1726Parent (18534) 0
1727Child (18535) 0
1728Parent (18534) 1
1729Child (18535) 1
1730Parent (18534) 2
1731Child (18535) 2
1732Parent (18534) 3
1733Child (18535) 3
1734Parent (18534) 4
1735Child (18535) 4
b76974c1 1736.EE
305cc415
MK
1737.in
1738.SS Program source
1739\&
e7d0bb47 1740.EX
305cc415
MK
1741/* futex_demo.c
1742
1743 Usage: futex_demo [nloops]
1744 (Default: 5)
1745
1746 Demonstrate the use of futexes in a program where parent and child
1747 use a pair of futexes located inside a shared anonymous mapping to
1748 synchronize access to a shared resource: the terminal. The two
1749 processes each write \(aqnum\-loops\(aq messages to the terminal and employ
1750 a synchronization protocol that ensures that they alternate in
1751 writing messages.
1752*/
1753#define _GNU_SOURCE
1754#include <stdio.h>
1755#include <errno.h>
915c4ba3 1756#include <stdatomic.h>
8eb90116 1757#include <stdint.h>
305cc415
MK
1758#include <stdlib.h>
1759#include <unistd.h>
1760#include <sys/wait.h>
1761#include <sys/mman.h>
1762#include <sys/syscall.h>
1763#include <linux/futex.h>
1764#include <sys/time.h>
1765
d1a71985 1766#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \e
305cc415
MK
1767 } while (0)
1768
2253ecf0 1769static uint32_t *futex1, *futex2, *iaddr;
305cc415
MK
1770
1771static int
2253ecf0
AC
1772futex(uint32_t *uaddr, int futex_op, uint32_t val,
1773 const struct timespec *timeout, uint32_t *uaddr2, uint32_t val3)
305cc415
MK
1774{
1775 return syscall(SYS_futex, uaddr, futex_op, val,
c1e04f01 1776 timeout, uaddr2, val3);
305cc415
MK
1777}
1778
1779/* Acquire the futex pointed to by \(aqfutexp\(aq: wait for its value to
1780 become 1, and then set the value to 0. */
1781
1782static void
2253ecf0 1783fwait(uint32_t *futexp)
305cc415 1784{
2253ecf0 1785 long s;
305cc415 1786
915c4ba3
BP
1787 /* atomic_compare_exchange_strong(ptr, oldval, newval)
1788 atomically performs the equivalent of:
305cc415 1789
915c4ba3 1790 if (*ptr == *oldval)
305cc415
MK
1791 *ptr = newval;
1792
915c4ba3 1793 It returns true if the test yielded true and *ptr was updated. */
305cc415 1794
305cc415 1795 while (1) {
83e80dda 1796
63ad44cb 1797 /* Is the futex available? */
2253ecf0 1798 const uint32_t one = 1;
09e456c2 1799 if (atomic_compare_exchange_strong(futexp, &one, 0))
305cc415
MK
1800 break; /* Yes */
1801
63ad44cb 1802 /* Futex is not available; wait */
83e80dda 1803
63ad44cb
HS
1804 s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0);
1805 if (s == \-1 && errno != EAGAIN)
1806 errExit("futex\-FUTEX_WAIT");
305cc415
MK
1807 }
1808}
1809
1810/* Release the futex pointed to by \(aqfutexp\(aq: if the futex currently
1811 has the value 0, set its value to 1 and the wake any futex waiters,
1812 so that if the peer is blocked in fpost(), it can proceed. */
1813
1814static void
2253ecf0 1815fpost(uint32_t *futexp)
305cc415 1816{
2253ecf0 1817 long s;
305cc415 1818
68219aba
AC
1819 /* atomic_compare_exchange_strong() was described
1820 in comments above */
305cc415 1821
2253ecf0 1822 const uint32_t zero = 0;
09e456c2 1823 if (atomic_compare_exchange_strong(futexp, &zero, 1)) {
305cc415
MK
1824 s = futex(futexp, FUTEX_WAKE, 1, NULL, NULL, 0);
1825 if (s == \-1)
1826 errExit("futex\-FUTEX_WAKE");
1827 }
1828}
1829
1830int
1831main(int argc, char *argv[])
1832{
1833 pid_t childPid;
88893a77 1834 int nloops;
305cc415
MK
1835
1836 setbuf(stdout, NULL);
1837
1838 nloops = (argc > 1) ? atoi(argv[1]) : 5;
1839
1840 /* Create a shared anonymous mapping that will hold the futexes.
1841 Since the futexes are being shared between processes, we
1842 subsequently use the "shared" futex operations (i.e., not the
1843 ones suffixed "_PRIVATE") */
1844
d60a7a9a 1845 iaddr = mmap(NULL, sizeof(*iaddr) * 2, PROT_READ | PROT_WRITE,
305cc415
MK
1846 MAP_ANONYMOUS | MAP_SHARED, \-1, 0);
1847 if (iaddr == MAP_FAILED)
1848 errExit("mmap");
1849
1850 futex1 = &iaddr[0];
1851 futex2 = &iaddr[1];
1852
1853 *futex1 = 0; /* State: unavailable */
1854 *futex2 = 1; /* State: available */
1855
1856 /* Create a child process that inherits the shared anonymous
35764662 1857 mapping */
305cc415
MK
1858
1859 childPid = fork();
92a46690 1860 if (childPid == \-1)
305cc415
MK
1861 errExit("fork");
1862
1863 if (childPid == 0) { /* Child */
88893a77 1864 for (int j = 0; j < nloops; j++) {
305cc415 1865 fwait(futex1);
8eb90116 1866 printf("Child (%jd) %d\en", (intmax_t) getpid(), j);
305cc415
MK
1867 fpost(futex2);
1868 }
1869
1870 exit(EXIT_SUCCESS);
1871 }
1872
1873 /* Parent falls through to here */
1874
88893a77 1875 for (int j = 0; j < nloops; j++) {
305cc415 1876 fwait(futex2);
8eb90116 1877 printf("Parent (%jd) %d\en", (intmax_t) getpid(), j);
305cc415
MK
1878 fpost(futex1);
1879 }
1880
1881 wait(NULL);
1882
1883 exit(EXIT_SUCCESS);
1884}
e7d0bb47 1885.EE
47297adb 1886.SH SEE ALSO
4c222281 1887.ad l
9913033c 1888.BR get_robust_list (2),
d806bc05 1889.BR restart_syscall (2),
e0074751 1890.BR pthread_mutexattr_getprotocol (3),
ac894879
MK
1891.BR futex (7),
1892.BR sched (7)
fea681da 1893.PP
f5ad572f
MK
1894The following kernel source files:
1895.IP * 2
1896.I Documentation/pi-futex.txt
1897.IP *
1898.I Documentation/futex-requeue-pi.txt
1899.IP *
1900.I Documentation/locking/rt-mutex.txt
1901.IP *
1902.I Documentation/locking/rt-mutex-design.txt
8fe019c7
MK
1903.IP *
1904.I Documentation/robust-futex-ABI.txt
43b99089 1905.PP
4c222281 1906Franke, H., Russell, R., and Kirwood, M., 2002.
52087dd3 1907\fIFuss, Futexes and Furwocks: Fast Userlevel Locking in Linux\fP
4c222281 1908(from proceedings of the Ottawa Linux Symposium 2002),
9b936e9e 1909.br
5465ae95 1910.UR http://kernel.org\:/doc\:/ols\:/2002\:/ols2002\-pages\-479\-495.pdf
608bf950 1911.UE
efeece04 1912.PP
4c222281 1913Hart, D., 2009. \fIA futex overview and update\fP,
2ed26199
MK
1914.UR http://lwn.net/Articles/360699/
1915.UE
efeece04 1916.PP
8fb01fde 1917Hart, D.\& and Guniguntala, D., 2009.
0483b6cc 1918\fIRequeue-PI: Making Glibc Condvars PI-Aware\fP
4c222281 1919(from proceedings of the 2009 Real-Time Linux Workshop),
0483b6cc
MK
1920.UR http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
1921.UE
efeece04 1922.PP
4c222281 1923Drepper, U., 2011. \fIFutexes Are Tricky\fP,
f42eb21b
MK
1924.UR http://www.akkadia.org/drepper/futex.pdf
1925.UE
9b936e9e
MK
1926.PP
1927Futex example library, futex-*.tar.bz2 at
1928.br
a605264d 1929.UR ftp://ftp.kernel.org\:/pub\:/linux\:/kernel\:/people\:/rusty/
608bf950 1930.UE
34f14794 1931.\"
74f58a64 1932.\" FIXME(Torvald) We should probably refer to the glibc code here, in
9915ea23
MK
1933.\" particular the glibc-internal futex wrapper functions that are
1934.\" WIP, and the generic pthread_mutex_t and perhaps condvar
1935.\" implementations.