]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/futex.2
termios.3: srcfix
[thirdparty/man-pages.git] / man2 / futex.2
CommitLineData
8f0aff2a 1.\" Page by b.hubert
1abce893
MK
2.\" and Copyright (C) 2015, Thomas Gleixner <tglx@linutronix.de>
3.\" and Copyright (C) 2015, Michael Kerrisk <mtk.manpages@gmail.com>
2297bf0e 4.\"
2e46a6e7 5.\" %%%LICENSE_START(FREELY_REDISTRIBUTABLE)
8f0aff2a 6.\" may be freely modified and distributed
8ff7380d 7.\" %%%LICENSE_END
fea681da
MK
8.\"
9.\" Niki A. Rahimi (LTC Security Development, narahimi@us.ibm.com)
10.\" added ERRORS section.
11.\"
12.\" Modified 2004-06-17 mtk
13.\" Modified 2004-10-07 aeb, added FUTEX_REQUEUE, FUTEX_CMP_REQUEUE
14.\"
47f5c4ba 15.\" FIXME Still to integrate are some points from Torvald Riegel's mail of
9915ea23 16.\" 2015-01-23:
47f5c4ba
MK
17.\" http://thread.gmane.org/gmane.linux.kernel/1703405/focus=7977
18.\"
78e85692 19.\" FIXME Do we need to add some text regarding Torvald Riegel's 2015-01-24 mail
9915ea23 20.\" http://thread.gmane.org/gmane.linux.kernel/1703405/focus=1873242
02182e7c 21.\"
6e00b7a8 22.TH FUTEX 2 2021-08-27 "Linux" "Linux Programmer's Manual"
fea681da 23.SH NAME
ce154705 24futex \- fast user-space locking
fea681da 25.SH SYNOPSIS
9d9dc1e8 26.nf
68e4db0a 27.PP
0aa385fe
AC
28.BR "#include <linux/futex.h>" " /* Definition of " FUTEX_* " constants */"
29.BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
30.B #include <unistd.h>
68e4db0a 31.PP
0aa385fe
AC
32.BI "long syscall(SYS_futex, uint32_t *" uaddr ", int " futex_op \
33", uint32_t " val ,
34.BI " const struct timespec *" timeout , \
9bfc9cb1 35" \fR /* or: \fBuint32_t \fIval2\fP */"
0aa385fe 36.BI " uint32_t *" uaddr2 ", uint32_t " val3 );
9d9dc1e8 37.fi
dbfe9c70 38.PP
b939d6e4 39.IR Note :
0aa385fe
AC
40glibc provides no wrapper for
41.BR futex (),
42necessitating the use of
43.BR syscall (2).
47297adb 44.SH DESCRIPTION
fea681da 45The
e511ffb6 46.BR futex ()
4b35dc5d 47system call provides a method for waiting until a certain condition becomes
077981d4
MK
48true.
49It is typically used as a blocking construct in the context of
d45f244c
MK
50shared-memory synchronization.
51When using futexes, the majority of
52the synchronization operations are performed in user space.
bc54ed38 53A user-space program employs the
d45f244c 54.BR futex ()
ca4e5b2b 55system call only when it is likely that the program has to block for
4c8cb0ff 56a longer time until the condition becomes true.
bc54ed38 57Other
d45f244c 58.BR futex ()
bc54ed38
MK
59operations can be used to wake any processes or threads waiting
60for a particular condition.
efeece04 61.PP
7e8dcabc
MK
62A futex is a 32-bit value\(emreferred to below as a
63.IR "futex word" \(emwhose
64address is supplied to the
4b35dc5d 65.BR futex ()
7e8dcabc 66system call.
c3f4c019 67(Futexes are 32 bits in size on all platforms, including 64-bit systems.)
7e8dcabc
MK
68All futex operations are governed by this value.
69In order to share a futex between processes,
70the futex is placed in a region of shared memory,
71created using (for example)
72.BR mmap (2)
73or
74.BR shmat (2).
c3f4c019 75(Thus, the futex word may have different
7e8dcabc
MK
76virtual addresses in different processes,
77but these addresses all refer to the same location in physical memory.)
ca4e5b2b
MK
78In a multithreaded program, it is sufficient to place the futex word
79in a global variable shared by all threads.
efeece04 80.PP
0c3ec26b
MK
81When executing a futex operation that requests to block a thread,
82the kernel will block only if the futex word has the value that the
55f9e85e
MK
83calling thread supplied (as one of the arguments of the
84.BR futex ()
85call) as the expected value of the futex word.
9d32a39b
MK
86The loading of the futex word's value,
87the comparison of that value with the expected value,
bc54ed38 88and the actual blocking will happen atomically and will be totally ordered
da894b18 89with respect to concurrent operations performed by other threads
0fb87d16 90on the same futex word.
da894b18
MK
91.\" Notes from Darren Hart (Dec 2015):
92.\" Totally ordered with respect futex operations refers to semantics
93.\" of the ACQUIRE/RELEASE operations and how they impact ordering of
94.\" memory reads and writes. The kernel futex operations are protected
f6615c42 95.\" by spinlocks, which ensure that all operations are serialized
da894b18
MK
96.\" with respect to one another.
97.\"
98.\" This is a lot to attempt to define in this document. Perhaps a
99.\" reference to linux/Documentation/memory-barriers.txt as a footnote
100.\" would be sufficient? Or perhaps for this manual, "serialized" would
101.\" be sufficient, with a footnote regarding "totally ordered" and a
102.\" pointer to the memory-barrier documentation?
b80daba2 103Thus, the futex word is used to connect the synchronization in user space
9d32a39b 104with the implementation of blocking by the kernel.
55f9e85e 105Analogously to an atomic
4b35dc5d 106compare-and-exchange operation that potentially changes shared memory,
077981d4 107blocking via a futex is an atomic compare-and-block operation.
d6bb5a38 108.\" FIXME(Torvald Riegel):
61066e14
MK
109.\" Eventually we want to have some text in NOTES to satisfy
110.\" the reference in the following sentence
111.\" See NOTES for a detailed specification of
112.\" the synchronization semantics.
efeece04 113.PP
ca4e5b2b 114One use of futexes is for implementing locks.
c0dc758e
MK
115The state of the lock (i.e., acquired or not acquired)
116can be represented as an atomically accessed flag in shared memory.
4c8cb0ff 117In the uncontended case,
c3f4c019 118a thread can access or modify the lock state with atomic instructions,
4c8cb0ff
MK
119for example atomically changing it from not acquired to acquired
120using an atomic compare-and-exchange instruction.
55f9e85e
MK
121(Such instructions are performed entirely in user mode,
122and the kernel maintains no information about the lock state.)
123On the other hand, a thread may be unable to acquire a lock because
8e754e12 124it is already acquired by another thread.
55f9e85e 125It then may pass the lock's flag as a futex word and the value
0c3ec26b 126representing the acquired state as the expected value to a
8e754e12
HS
127.BR futex ()
128wait operation.
55f9e85e 129This
8e754e12 130.BR futex ()
bc54ed38 131operation will block if and only if the lock is still acquired
f6615c42 132(i.e., the value in the futex word still matches the "acquired state").
077981d4 133When releasing the lock, a thread has to first reset the
0c3ec26b 134lock state to not acquired and then execute a futex
55f9e85e 135operation that wakes threads blocked on the lock flag used as a futex word
f6615c42 136(this can be further optimized to avoid unnecessary wake-ups).
077981d4 137See
4b35dc5d
TR
138.BR futex (7)
139for more detail on how to use futexes.
efeece04 140.PP
4b35dc5d 141Besides the basic wait and wake-up futex functionality, there are further
077981d4 142futex operations aimed at supporting more complex use cases.
efeece04 143.PP
ca4e5b2b 144Note that
2af84f99 145no explicit initialization or destruction is necessary to use futexes;
4c8cb0ff
MK
146the kernel maintains a futex
147(i.e., the kernel-internal implementation artifact)
4b35dc5d
TR
148only while operations such as
149.BR FUTEX_WAIT ,
150described below, are being performed on a particular futex word.
a663ca5a
MK
151.\"
152.SS Arguments
fea681da
MK
153The
154.I uaddr
077981d4
MK
155argument points to the futex word.
156On all platforms, futexes are four-byte
4b35dc5d 157integers that must be aligned on a four-byte boundary.
f388ba70
MK
158The operation to perform on the futex is specified in the
159.I futex_op
160argument;
161.IR val
162is a value whose meaning and purpose depends on
163.IR futex_op .
efeece04 164.PP
36ab2074
MK
165The remaining arguments
166.RI ( timeout ,
167.IR uaddr2 ,
168and
169.IR val3 )
170are required only for certain of the futex operations described below.
171Where one of these arguments is not required, it is ignored.
efeece04 172.PP
36ab2074
MK
173For several blocking operations, the
174.I timeout
175argument is a pointer to a
176.IR timespec
177structure that specifies a timeout for the operation.
178However, notwithstanding the prototype shown above, for some operations,
eb4aa521
MK
179the least significant four bytes of this argument are instead
180used as an integer whose meaning is determined by the operation.
768d3c23
MK
181For these operations, the kernel casts the
182.I timeout
10022b8e
HS
183value first to
184.IR "unsigned long",
185then to
c6dc40a2 186.IR uint32_t ,
768d3c23
MK
187and in the remainder of this page, this argument is referred to as
188.I val2
189when interpreted in this fashion.
efeece04 190.PP
de5a3bb4 191Where it is required, the
36ab2074 192.IR uaddr2
4c8cb0ff
MK
193argument is a pointer to a second futex word that is employed
194by the operation.
efeece04 195.PP
36ab2074
MK
196The interpretation of the final integer argument,
197.IR val3 ,
198depends on the operation.
a663ca5a
MK
199.\"
200.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
201.\"
202.SS Futex operations
6be4bad7 203The
d33602c4 204.I futex_op
6be4bad7
MK
205argument consists of two parts:
206a command that specifies the operation to be performed,
5d771a4a 207bitwise ORed with zero or more options that
6be4bad7 208modify the behaviour of the operation.
fc30eb79 209The options that may be included in
d33602c4 210.I futex_op
fc30eb79
TG
211are as follows:
212.TP
213.BR FUTEX_PRIVATE_FLAG " (since Linux 2.6.22)"
214.\" commit 34f01cc1f512fa783302982776895c73714ebbc2
215This option bit can be employed with all futex operations.
e45f9735 216It tells the kernel that the futex is process-private and not shared
0c3ec26b
MK
217with another process (i.e., it is being used for synchronization
218only between threads of the same process).
943ccc52
MK
219This allows the kernel to make some additional performance optimizations.
220.\" I.e., It allows the kernel choose the fast path for validating
221.\" the user-space address and avoids expensive VMA lookups,
222.\" taking reference counts on file backing store, and so on.
efeece04 223.IP
ae2c1774 224As a convenience,
eeeee811 225.I <linux/futex.h>
ae2c1774 226defines a set of constants with the suffix
eeeee811 227.B _PRIVATE
ae2c1774 228that are equivalents of all of the operations listed below,
dcdfde26 229.\" except the obsolete FUTEX_FD, for which the "private" flag was
ae2c1774
MK
230.\" meaningless
231but with the
232.BR FUTEX_PRIVATE_FLAG
233ORed into the constant value.
234Thus, there are
235.BR FUTEX_WAIT_PRIVATE ,
236.BR FUTEX_WAKE_PRIVATE ,
237and so on.
2e98bbc2
TG
238.TP
239.BR FUTEX_CLOCK_REALTIME " (since Linux 2.6.28)"
240.\" commit 1acdac104668a0834cfa267de9946fac7764d486
4a7e5b05 241This option bit can be employed only with the
949ceae3
MK
242.BR FUTEX_WAIT_BITSET ,
243.BR FUTEX_WAIT_REQUEUE_PI ,
949ceae3
MK
244(since Linux 4.5)
245.\" commit 337f13046ff03717a9e99675284a817527440a49
e79977ae
KK
246.BR FUTEX_WAIT ,
247and
eeeee811 248(since Linux 5.14)
e79977ae 249.\" commit bf22a6976897977b0a3f1aeba6823c959fc4fdae
eeeee811 250.B FUTEX_LOCK_PI2
c84cf68c 251operations.
efeece04 252.IP
8064bfa5 253If this option is set, the kernel measures the
f2103b26 254.I timeout
8064bfa5 255against the
eeeee811 256.B CLOCK_REALTIME
8064bfa5 257clock.
efeece04 258.IP
8064bfa5 259If this option is not set, the kernel measures the
f2103b26 260.I timeout
8064bfa5 261against the
eeeee811 262.B CLOCK_MONOTONIC
1c952cf5 263clock.
6be4bad7
MK
264.PP
265The operation specified in
d33602c4 266.I futex_op
6be4bad7 267is one of the following:
70b06b90
MK
268.\"
269.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
270.\"
fea681da 271.TP
81c9d87e
MK
272.BR FUTEX_WAIT " (since Linux 2.6.0)"
273.\" Strictly speaking, since some time in 2.5.x
f065673c 274This operation tests that the value at the
4b35dc5d 275futex word pointed to by the address
fea681da 276.I uaddr
4b35dc5d 277still contains the expected value
fea681da 278.IR val ,
fd105614 279and if so, then sleeps waiting for a
682edefb 280.B FUTEX_WAKE
fd105614 281operation on the futex word.
077981d4 282The load of the value of the futex word is an atomic memory
4b35dc5d 283access (i.e., using atomic machine instructions of the respective
077981d4
MK
284architecture).
285This load, the comparison with the expected value, and
fd105614 286starting to sleep are performed atomically
da56650a 287.\" FIXME: Torvald, I think we may need to add some explanation of
61066e14 288.\" "totally ordered" here.
fd105614
MK
289and totally ordered
290with respect to other futex operations on the same futex word.
c0dc758e
MK
291If the thread starts to sleep,
292it is considered a waiter on this futex word.
f065673c
MK
293If the futex value does not match
294.IR val ,
4710334a 295then the call fails immediately with the error
badbf70c 296.BR EAGAIN .
efeece04 297.IP
4b35dc5d 298The purpose of the comparison with the expected value is to prevent lost
fd105614
MK
299wake-ups.
300If another thread changed the value of the futex word after the
c0dc758e
MK
301calling thread decided to block based on the prior value,
302and if the other thread executed a
4b35dc5d
TR
303.BR FUTEX_WAKE
304operation (or similar wake-up) after the value change and before this
f065673c 305.BR FUTEX_WAIT
bc54ed38
MK
306operation, then the calling thread will observe the
307value change and will not start to sleep.
efeece04 308.IP
c13182ef 309If the
fea681da 310.I timeout
40d2dab9 311is not NULL, the structure it points to specifies a
40d2dab9 312timeout for the wait.
ac991a11
MK
313(This interval will be rounded up to the system clock granularity,
314and is guaranteed not to expire early.)
a6918f1d 315The timeout is by default measured according to the
1c952cf5 316.BR CLOCK_MONOTONIC
a01c3098
MK
317clock, but, since Linux 4.5, the
318.BR CLOCK_REALTIME
319clock can be selected by specifying
320.BR FUTEX_CLOCK_REALTIME
321in
322.IR futex_op .
82a6092b
MK
323If
324.I timeout
325is NULL, the call blocks indefinitely.
efeece04 326.IP
4100abc5
MK
327.IR Note :
328for
329.BR FUTEX_WAIT ,
330.IR timeout
331is interpreted as a
332.IR relative
333value.
334This differs from other futex operations, where
335.I timeout
336is interpreted as an absolute value.
337To obtain the equivalent of
338.BR FUTEX_WAIT
339with an absolute timeout, employ
340.BR FUTEX_WAIT_BITSET
341with
342.IR val3
343specified as
344.BR FUTEX_BITSET_MATCH_ANY .
efeece04 345.IP
c13182ef 346The arguments
fea681da
MK
347.I uaddr2
348and
349.I val3
350are ignored.
9915ea23
MK
351.\" FIXME . (Torvald) I think we should remove this. Or maybe adapt to a
352.\" different example.
353.\"
354.\" For
355.\" .BR futex (7),
356.\" this call is executed if decrementing the count gave a negative value
357.\" (indicating contention),
358.\" and will sleep until another process or thread releases
359.\" the futex and executes the
360.\" .B FUTEX_WAKE
361.\" operation.
70b06b90
MK
362.\"
363.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
364.\"
fea681da 365.TP
81c9d87e
MK
366.BR FUTEX_WAKE " (since Linux 2.6.0)"
367.\" Strictly speaking, since Linux 2.5.x
f065673c
MK
368This operation wakes at most
369.I val
4b35dc5d 370of the waiters that are waiting (e.g., inside
f065673c 371.BR FUTEX_WAIT )
4b35dc5d 372on the futex word at the address
f065673c
MK
373.IR uaddr .
374Most commonly,
375.I val
376is specified as either 1 (wake up a single waiter) or
377.BR INT_MAX
378(wake up all waiters).
730bfbda
MK
379No guarantee is provided about which waiters are awoken
380(e.g., a waiter with a higher scheduling priority is not guaranteed
381to be awoken in preference to a waiter with a lower priority).
efeece04 382.IP
fea681da
MK
383The arguments
384.IR timeout ,
c8b921bd 385.IR uaddr2 ,
fea681da
MK
386and
387.I val3
388are ignored.
9915ea23
MK
389.\" FIXME . (Torvald) I think we should remove this. Or maybe adapt to
390.\" a different example.
391.\"
4c8cb0ff
MK
392.\" For
393.\" .BR futex (7),
394.\" this is executed if incrementing the count showed that
395.\" there were waiters,
396.\" once the futex value has been set to 1
397.\" (indicating that it is available).
398.\"
9915ea23 399.\" How does "incrementing the count show that there were waiters"?
70b06b90
MK
400.\"
401.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
402.\"
a7c2bf45
MK
403.TP
404.BR FUTEX_FD " (from Linux 2.6.0 up to and including Linux 2.6.25)"
405.\" Strictly speaking, from Linux 2.5.x to 2.6.25
4c8cb0ff
MK
406This operation creates a file descriptor that is associated with
407the futex at
a7c2bf45 408.IR uaddr .
bdc5957a
MK
409The caller must close the returned file descriptor after use.
410When another process or thread performs a
a7c2bf45 411.BR FUTEX_WAKE
4b35dc5d 412on the futex word, the file descriptor indicates as being readable with
a7c2bf45
MK
413.BR select (2),
414.BR poll (2),
415and
416.BR epoll (7)
efeece04 417.IP
f1d2171d 418The file descriptor can be used to obtain asynchronous notifications: if
a7c2bf45 419.I val
ca4e5b2b 420is nonzero, then, when another process or thread executes a
a7c2bf45
MK
421.BR FUTEX_WAKE ,
422the caller will receive the signal number that was passed in
423.IR val .
efeece04 424.IP
a7c2bf45
MK
425The arguments
426.IR timeout ,
d556548b 427.IR uaddr2 ,
a7c2bf45
MK
428and
429.I val3
430are ignored.
efeece04 431.IP
a7c2bf45
MK
432Because it was inherently racy,
433.B FUTEX_FD
434has been removed
435.\" commit 82af7aca56c67061420d618cc5a30f0fd4106b80
436from Linux 2.6.26 onward.
70b06b90
MK
437.\"
438.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
439.\"
a7c2bf45
MK
440.TP
441.BR FUTEX_REQUEUE " (since Linux 2.6.0)"
a7c2bf45 442This operation performs the same task as
27dd3a6e
MK
443.BR FUTEX_CMP_REQUEUE
444(see below), except that no check is made using the value in
a7c2bf45
MK
445.IR val3 .
446(The argument
447.I val3
448is ignored.)
70b06b90
MK
449.\"
450.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
451.\"
a7c2bf45
MK
452.TP
453.BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)"
4b35dc5d 454This operation first checks whether the location
a7c2bf45
MK
455.I uaddr
456still contains the value
457.IR val3 .
458If not, the operation fails with the error
459.BR EAGAIN .
4b35dc5d 460Otherwise, the operation wakes up a maximum of
a7c2bf45
MK
461.I val
462waiters that are waiting on the futex at
463.IR uaddr .
464If there are more than
465.I val
466waiters, then the remaining waiters are removed
467from the wait queue of the source futex at
468.I uaddr
469and added to the wait queue of the target futex at
470.IR uaddr2 .
471The
768d3c23 472.I val2
936876a9 473argument specifies an upper limit on the number of waiters
a7c2bf45 474that are requeued to the futex at
768d3c23 475.IR uaddr2 .
efeece04 476.IP
d6bb5a38
MK
477.\" FIXME(Torvald) Is the following correct? Or is just the decision
478.\" which threads to wake or requeue part of the atomic operation?
4b35dc5d
TR
479The load from
480.I uaddr
4c8cb0ff
MK
481is an atomic memory access (i.e., using atomic machine instructions of
482the respective architecture).
077981d4 483This load, the comparison with
4b35dc5d 484.IR val3 ,
4c8cb0ff
MK
485and the requeueing of any waiters are performed atomically and totally
486ordered with respect to other operations on the same futex word.
ee65b0e8
MK
487.\" Notes from a f2f conversation with Thomas Gleixner (Aug 2015): ###
488.\" The operation is serialized with respect to operations on both
489.\" source and target futex. No other waiter can enqueue itself
490.\" for waiting and no other waiter can dequeue itself because of
491.\" a timeout or signal.
efeece04 492.IP
a7c2bf45
MK
493Typical values to specify for
494.I val
ed1819cf 495are 0 or 1.
a7c2bf45
MK
496(Specifying
497.BR INT_MAX
498is not useful, because it would make the
499.BR FUTEX_CMP_REQUEUE
500operation equivalent to
501.BR FUTEX_WAKE .)
936876a9 502The limit value specified via
768d3c23
MK
503.I val2
504is typically either 1 or
a7c2bf45
MK
505.BR INT_MAX .
506(Specifying the argument as 0 is not useful, because it would make the
507.BR FUTEX_CMP_REQUEUE
508operation equivalent to
509.BR FUTEX_WAIT .)
efeece04 510.IP
627b50ce
MK
511The
512.B FUTEX_CMP_REQUEUE
513operation was added as a replacement for the earlier
514.BR FUTEX_REQUEUE .
515The difference is that the check of the value at
516.I uaddr
517can be used to ensure that requeueing happens only under certain
518conditions, which allows race conditions to be avoided in certain use cases.
dcb410c3 519.\" But, as Rich Felker points out, there remain valid use cases for
627b50ce
MK
520.\" FUTEX_REQUEUE, for example, when the calling thread is requeuing
521.\" the target(s) to a lock that the calling thread owns
522.\" From: Rich Felker <dalias@libc.org>
523.\" Date: Wed, 29 Oct 2014 22:43:17 -0400
524.\" To: Darren Hart <dvhart@infradead.org>
525.\" CC: libc-alpha@sourceware.org, ...
526.\" Subject: Re: Add futex wrapper to glibc?
efeece04 527.IP
627b50ce
MK
528Both
529.BR FUTEX_REQUEUE
530and
531.BR FUTEX_CMP_REQUEUE
532can be used to avoid "thundering herd" wake-ups that could occur when using
533.B FUTEX_WAKE
534in cases where all of the waiters that are woken need to acquire
535another futex.
536Consider the following scenario,
537where multiple waiter threads are waiting on B,
538a wait queue implemented using a futex:
efeece04 539.IP
627b50ce 540.in +4n
b76974c1 541.EX
627b50ce
MK
542lock(A)
543while (!check_value(V)) {
544 unlock(A);
545 block_on(B);
546 lock(A);
547};
548unlock(A);
b76974c1 549.EE
627b50ce 550.in
efeece04 551.IP
627b50ce
MK
552If a waker thread used
553.BR FUTEX_WAKE ,
554then all waiters waiting on B would be woken up,
67c67ff2 555and they would all try to acquire lock A.
627b50ce
MK
556However, waking all of the threads in this manner would be pointless because
557all except one of the threads would immediately block on lock A again.
558By contrast, a requeue operation wakes just one waiter and moves
559the other waiters to lock A,
560and when the woken waiter unlocks A then the next waiter can proceed.
43d16602 561.\"
70b06b90
MK
562.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
563.\"
fea681da 564.TP
d67e21f5
MK
565.BR FUTEX_WAKE_OP " (since Linux 2.6.14)"
566.\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721
6bac3b85
MK
567.\" Author: Jakub Jelinek <jakub@redhat.com>
568.\" Date: Tue Sep 6 15:16:25 2005 -0700
9915ea23 569.\" FIXME. (Torvald) The glibc condvar implementation is currently being
4c8cb0ff
MK
570.\" revised (e.g., to not use an internal lock anymore).
571.\" It is probably more future-proof to remove this paragraph.
d6bb5a38 572.\" [Torvald, do you have an update here?]
6bac3b85
MK
573This operation was added to support some user-space use cases
574where more than one futex must be handled at the same time.
575The most notable example is the implementation of
576.BR pthread_cond_signal (3),
577which requires operations on two futexes,
578the one used to implement the mutex and the one used in the implementation
579of the wait queue associated with the condition variable.
580.BR FUTEX_WAKE_OP
581allows such cases to be implemented without leading to
582high rates of contention and context switching.
efeece04 583.IP
6bac3b85 584The
57f2d48b 585.BR FUTEX_WAKE_OP
e61abc20 586operation is equivalent to executing the following code atomically
4c8cb0ff
MK
587and totally ordered with respect to other futex operations on
588any of the two supplied futex words:
efeece04 589.IP
6bac3b85 590.in +4n
b76974c1 591.EX
2253ecf0
AC
592uint32_t oldval = *(uint32_t *) uaddr2;
593*(uint32_t *) uaddr2 = oldval \fIop\fP \fIoparg\fP;
6bac3b85
MK
594futex(uaddr, FUTEX_WAKE, val, 0, 0, 0);
595if (oldval \fIcmp\fP \fIcmparg\fP)
768d3c23 596 futex(uaddr2, FUTEX_WAKE, val2, 0, 0, 0);
b76974c1 597.EE
6bac3b85 598.in
efeece04 599.IP
6bac3b85 600In other words,
57f2d48b 601.BR FUTEX_WAKE_OP
6bac3b85
MK
602does the following:
603.RS
604.IP * 3
4b35dc5d
TR
605saves the original value of the futex word at
606.IR uaddr2
607and performs an operation to modify the value of the futex at
6bac3b85 608.IR uaddr2 ;
4c8cb0ff
MK
609this is an atomic read-modify-write memory access (i.e., using atomic
610machine instructions of the respective architecture)
6bac3b85
MK
611.IP *
612wakes up a maximum of
613.I val
4b35dc5d 614waiters on the futex for the futex word at
6bac3b85
MK
615.IR uaddr ;
616and
617.IP *
4c8cb0ff
MK
618dependent on the results of a test of the original value of the
619futex word at
6bac3b85
MK
620.IR uaddr2 ,
621wakes up a maximum of
768d3c23 622.I val2
4b35dc5d 623waiters on the futex for the futex word at
6bac3b85
MK
624.IR uaddr2 .
625.RE
626.IP
6bac3b85
MK
627The operation and comparison that are to be performed are encoded
628in the bits of the argument
629.IR val3 .
630Pictorially, the encoding is:
efeece04 631.IP
3cf61490 632.in +4n
b76974c1 633.EX
f6af90e7
MK
634+---+---+-----------+-----------+
635|op |cmp| oparg | cmparg |
636+---+---+-----------+-----------+
637 4 4 12 12 <== # of bits
b76974c1 638.EE
6bac3b85 639.in
efeece04 640.IP
6bac3b85 641Expressed in code, the encoding is:
efeece04 642.IP
6bac3b85 643.in +4n
b76974c1 644.EX
d1a71985
MK
645#define FUTEX_OP(op, oparg, cmp, cmparg) \e
646 (((op & 0xf) << 28) | \e
647 ((cmp & 0xf) << 24) | \e
648 ((oparg & 0xfff) << 12) | \e
6bac3b85 649 (cmparg & 0xfff))
b76974c1 650.EE
6bac3b85 651.in
efeece04 652.IP
6bac3b85
MK
653In the above,
654.I op
655and
656.I cmp
657are each one of the codes listed below.
658The
659.I oparg
660and
661.I cmparg
662components are literal numeric values, except as noted below.
efeece04 663.IP
6bac3b85
MK
664The
665.I op
666component has one of the following values:
efeece04 667.IP
6bac3b85 668.in +4n
b76974c1 669.EX
6bac3b85
MK
670FUTEX_OP_SET 0 /* uaddr2 = oparg; */
671FUTEX_OP_ADD 1 /* uaddr2 += oparg; */
672FUTEX_OP_OR 2 /* uaddr2 |= oparg; */
af2d18b2 673FUTEX_OP_ANDN 3 /* uaddr2 &= \(tioparg; */
9ca13180 674FUTEX_OP_XOR 4 /* uaddr2 \(ha= oparg; */
b76974c1 675.EE
6bac3b85 676.in
efeece04 677.IP
5d771a4a 678In addition, bitwise ORing the following value into
6bac3b85
MK
679.I op
680causes
681.IR "(1\ <<\ oparg)"
682to be used as the operand:
efeece04 683.IP
6bac3b85 684.in +4n
b76974c1 685.EX
6bac3b85 686FUTEX_OP_ARG_SHIFT 8 /* Use (1 << oparg) as operand */
b76974c1 687.EE
6bac3b85 688.in
efeece04 689.IP
6bac3b85
MK
690The
691.I cmp
692field is one of the following:
efeece04 693.IP
6bac3b85 694.in +4n
b76974c1 695.EX
6bac3b85
MK
696FUTEX_OP_CMP_EQ 0 /* if (oldval == cmparg) wake */
697FUTEX_OP_CMP_NE 1 /* if (oldval != cmparg) wake */
698FUTEX_OP_CMP_LT 2 /* if (oldval < cmparg) wake */
699FUTEX_OP_CMP_LE 3 /* if (oldval <= cmparg) wake */
700FUTEX_OP_CMP_GT 4 /* if (oldval > cmparg) wake */
701FUTEX_OP_CMP_GE 5 /* if (oldval >= cmparg) wake */
b76974c1 702.EE
6bac3b85 703.in
efeece04 704.IP
6bac3b85
MK
705The return value of
706.BR FUTEX_WAKE_OP
707is the sum of the number of waiters woken on the futex
708.IR uaddr
709plus the number of waiters woken on the futex
710.IR uaddr2 .
70b06b90
MK
711.\"
712.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
713.\"
d67e21f5 714.TP
79c9b436
TG
715.BR FUTEX_WAIT_BITSET " (since Linux 2.6.25)"
716.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
fd9e59d4 717This operation is like
79c9b436
TG
718.BR FUTEX_WAIT
719except that
720.I val3
84abf4ba 721is used to provide a 32-bit bit mask to the kernel.
2ae96e8a 722This bit mask, in which at least one bit must be set,
6c38ce7f 723is stored in the kernel-internal state of the waiter.
79c9b436
TG
724See the description of
725.BR FUTEX_WAKE_BITSET
726for further details.
efeece04 727.IP
8064bfa5
MK
728If
729.I timeout
730is not NULL, the structure it points to specifies
731an absolute timeout for the wait operation.
732If
733.I timeout
734is NULL, the operation can block indefinitely.
efeece04 735.IP
79c9b436
TG
736The
737.I uaddr2
738argument is ignored.
70b06b90
MK
739.\"
740.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
741.\"
79c9b436 742.TP
d67e21f5
MK
743.BR FUTEX_WAKE_BITSET " (since Linux 2.6.25)"
744.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
55cc422d
TG
745This operation is the same as
746.BR FUTEX_WAKE
747except that the
e24fbf10 748.I val3
5e1456d4 749argument is used to provide a 32-bit bit mask to the kernel.
6c38ce7f
MK
750This bit mask, in which at least one bit must be set,
751is used to select which waiters should be woken up.
5d771a4a 752The selection is done by a bitwise AND of the "wake" bit mask
98d769c0
MK
753(i.e., the value in
754.IR val3 )
5e1456d4
MK
755and the bit mask which is stored in the kernel-internal
756state of the waiter (the "wait" bit mask that is set using
98d769c0
MK
757.BR FUTEX_WAIT_BITSET ).
758All of the waiters for which the result of the AND is nonzero are woken up;
759the remaining waiters are left sleeping.
efeece04 760.IP
e9d4496b
MK
761The effect of
762.BR FUTEX_WAIT_BITSET
763and
764.BR FUTEX_WAKE_BITSET
9732dd8b
MK
765is to allow selective wake-ups among multiple waiters that are blocked
766on the same futex.
ac894879 767However, note that, depending on the use case,
5e1456d4 768employing this bit-mask multiplexing feature on a
ac894879 769futex can be less efficient than simply using multiple futexes,
5e1456d4 770because employing bit-mask multiplexing requires the kernel
e9d4496b
MK
771to check all waiters on a futex,
772including those that are not interested in being woken up
5e1456d4 773(i.e., they do not have the relevant bit set in their "wait" bit mask).
e9d4496b
MK
774.\" According to http://locklessinc.com/articles/futex_cheat_sheet/:
775.\"
776.\" "The original reason for the addition of these extensions
777.\" was to improve the performance of pthread read-write locks
778.\" in glibc. However, the pthreads library no longer uses the
779.\" same locking algorithm, and these extensions are not used
780.\" without the bitset parameter being all ones.
e24fbf10 781.\"
e9d4496b 782.\" The page goes on to note that the FUTEX_WAIT_BITSET operation
5e1456d4 783.\" is nevertheless used (with a bit mask of all ones) in order to
e9d4496b
MK
784.\" obtain the absolute timeout functionality that is useful
785.\" for efficiently implementing Pthreads APIs (which use absolute
786.\" timeouts); FUTEX_WAIT provides only relative timeouts.
efeece04 787.IP
678c9986
MK
788The constant
789.BR FUTEX_BITSET_MATCH_ANY ,
790which corresponds to all 32 bits set in the bit mask, can be used as the
791.I val3
792argument for
793.BR FUTEX_WAIT_BITSET
98d769c0 794and
678c9986
MK
795.BR FUTEX_WAKE_BITSET .
796Other than differences in the handling of the
98d769c0 797.I timeout
678c9986 798argument, the
9732dd8b 799.BR FUTEX_WAIT
678c9986 800operation is equivalent to
9732dd8b 801.BR FUTEX_WAIT_BITSET
678c9986
MK
802with
803.IR val3
804specified as
805.BR FUTEX_BITSET_MATCH_ANY ;
806that is, allow a wake-up by any waker.
807The
808.BR FUTEX_WAKE
809operation is equivalent to
9732dd8b 810.BR FUTEX_WAKE_BITSET
678c9986
MK
811with
812.IR val3
813specified as
814.BR FUTEX_BITSET_MATCH_ANY ;
815that is, wake up any waiter(s).
efeece04 816.IP
678c9986
MK
817The
818.I uaddr2
819and
820.I timeout
821arguments are ignored.
bd90a5f9 822.\"
70b06b90 823.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
bd90a5f9
MK
824.\"
825.SS Priority-inheritance futexes
b52e1cd4
MK
826Linux supports priority-inheritance (PI) futexes in order to handle
827priority-inversion problems that can be encountered with
828normal futex locks.
b565548b 829Priority inversion is the problem that occurs when a high-priority
bdc5957a
MK
830task is blocked waiting to acquire a lock held by a low-priority task,
831while tasks at an intermediate priority continuously preempt
832the low-priority task from the CPU.
833Consequently, the low-priority task makes no progress toward
834releasing the lock, and the high-priority task remains blocked.
efeece04 835.PP
7d20efd7
MK
836Priority inheritance is a mechanism for dealing with
837the priority-inversion problem.
bdc5957a
MK
838With this mechanism, when a high-priority task becomes blocked
839by a lock held by a low-priority task,
9cee832c
MK
840the priority of the low-priority task is temporarily raised
841to that of the high-priority task,
bdc5957a 842so that it is not preempted by any intermediate level tasks,
7d20efd7
MK
843and can thus make progress toward releasing the lock.
844To be effective, priority inheritance must be transitive,
bdc5957a 845meaning that if a high-priority task blocks on a lock
ca4e5b2b 846held by a lower-priority task that is itself blocked by a lock
bdc5957a 847held by another intermediate-priority task
7d20efd7 848(and so on, for chains of arbitrary length),
b0f35fbb 849then both of those tasks
bdc5957a
MK
850(or more generally, all of the tasks in a lock chain)
851have their priorities raised to be the same as the high-priority task.
efeece04 852.PP
9e2b90ee 853From a user-space perspective,
39e9b2e1
MK
854what makes a futex PI-aware is a policy agreement (described below)
855between user space and the kernel about the value of the futex word,
601399f3
MK
856coupled with the use of the PI-futex operations described below.
857(Unlike the other futex operations described above,
858the PI-futex operations are designed
859for the implementation of very specific IPC mechanisms.)
860.\"
9e2b90ee
MK
861.\" Quoting Darren Hart:
862.\" These opcodes paired with the PI futex value policy (described below)
863.\" defines a "futex" as PI aware. These were created very specifically
864.\" in support of PI pthread_mutexes, so it makes a lot more sense to
865.\" talk about a PI aware pthread_mutex, than a PI aware futex, since
866.\" there is a lot of policy and scaffolding that has to be built up
867.\" around it to use it properly (this is what a PI pthread_mutex is).
efeece04 868.PP
ac894879 869.\" mtk: The following text is drawn from the Hart/Guniguntala paper
1af427a4 870.\" (listed in SEE ALSO), but I have reworded some pieces
8d825152 871.\" significantly.
79d918c7 872.\"
f0a9e8f4 873The PI-futex operations described below differ from the other
4b35dc5d
TR
874futex operations in that they impose policy on the use of the value of the
875futex word:
79d918c7 876.IP * 3
4b35dc5d 877If the lock is not acquired, the futex word's value shall be 0.
79d918c7 878.IP *
4c8cb0ff
MK
879If the lock is acquired, the futex word's value shall
880be the thread ID (TID;
4b35dc5d 881see
79d918c7
MK
882.BR gettid (2))
883of the owning thread.
884.IP *
79d918c7
MK
885If the lock is owned and there are threads contending for the lock,
886then the
887.B FUTEX_WAITERS
4b35dc5d 888bit shall be set in the futex word's value; in other words, this value is:
efeece04 889.IP
79d918c7 890 FUTEX_WAITERS | TID
601399f3
MK
891.IP
892(Note that is invalid for a PI futex word to have no owner and
893.BR FUTEX_WAITERS
894set.)
79d918c7
MK
895.PP
896With this policy in place,
fd105614 897a user-space application can acquire an unacquired
601399f3 898lock or release a lock using atomic instructions executed in user mode
fd105614 899(e.g., a compare-and-swap operation such as
b52e1cd4
MK
900.I cmpxchg
901on the x86 architecture).
4c8cb0ff
MK
902Acquiring a lock simply consists of using compare-and-swap to atomically
903set the futex word's value to the caller's TID if its previous value was 0.
4b35dc5d
TR
904Releasing a lock requires using compare-and-swap to set the futex word's
905value to 0 if the previous value was the expected TID.
efeece04 906.PP
4b35dc5d 907If a futex is already acquired (i.e., has a nonzero value),
b52e1cd4 908waiters must employ the
79d918c7 909.B FUTEX_LOCK_PI
e79977ae
KK
910or
911.B FUTEX_LOCK_PI2
912operations to acquire the lock.
4b35dc5d 913If other threads are waiting for the lock, then the
79d918c7 914.B FUTEX_WAITERS
4c8cb0ff
MK
915bit is set in the futex value;
916in this case, the lock owner must employ the
79d918c7 917.B FUTEX_UNLOCK_PI
b52e1cd4 918operation to release the lock.
efeece04 919.PP
79d918c7
MK
920In the cases where callers are forced into the kernel
921(i.e., required to perform a
922.BR futex ()
0c3ec26b 923call),
79d918c7
MK
924they then deal directly with a so-called RT-mutex,
925a kernel locking mechanism which implements the required
926priority-inheritance semantics.
927After the RT-mutex is acquired, the futex value is updated accordingly,
928before the calling thread returns to user space.
efeece04 929.PP
a59fca75 930It is important to note
ac894879 931.\" tglx (July 2015):
30239c10
MK
932.\" If there are multiple waiters on a pi futex then a wake pi operation
933.\" will wake the first waiter and hand over the lock to this waiter. This
934.\" includes handing over the rtmutex which represents the futex in the
935.\" kernel. The strict requirement is that the futex owner and the rtmutex
936.\" owner must be the same, except for the update period which is
937.\" serialized by the futex internal locking. That means the kernel must
1d09c150 938.\" update the user-space value prior to returning to user space
4b35dc5d 939that the kernel will update the futex word's value prior
79d918c7 940to returning to user space.
601399f3
MK
941(This prevents the possibility of the futex word's value ending
942up in an invalid state, such as having an owner but the value being 0,
943or having waiters but not having the
944.B FUTEX_WAITERS
945bit set.)
efeece04 946.PP
30239c10
MK
947If a futex has an associated RT-mutex in the kernel
948(i.e., there are blocked waiters)
949and the owner of the futex/RT-mutex dies unexpectedly,
950then the kernel cleans up the RT-mutex and hands it over to the next waiter.
951This in turn requires that the user-space value is updated accordingly.
952To indicate that this is required, the kernel sets the
953.B FUTEX_OWNER_DIED
954bit in the futex word along with the thread ID of the new owner.
8adaf0a7
MK
955User space can detect this situation via the presence of the
956.B FUTEX_OWNER_DIED
957bit and is then responsible for cleaning up the stale state left over by
1d09c150 958the dead owner.
30239c10
MK
959.\" tglx (July 2015):
960.\" The FUTEX_OWNER_DIED bit can also be set on uncontended futexes, where
961.\" the kernel has no state associated. This happens via the robust futex
962.\" mechanism. In that case the futex value will be set to
963.\" FUTEX_OWNER_DIED. The robust futex mechanism is also available for non
964.\" PI futexes.
efeece04 965.PP
601399f3
MK
966PI futexes are operated on by specifying one of the values listed below in
967.IR futex_op .
968Note that the PI futex operations must be used as paired operations
969and are subject to some additional requirements:
970.IP * 3
eeeee811
AC
971.BR FUTEX_LOCK_PI ,
972.BR FUTEX_LOCK_PI2 ,
601399f3
MK
973and
974.B FUTEX_TRYLOCK_PI
975pair with
d8012462 976.BR FUTEX_UNLOCK_PI .
601399f3
MK
977.B FUTEX_UNLOCK_PI
978must be called only on a futex owned by the calling thread,
979as defined by the value policy, otherwise the error
980.B EPERM
981results.
982.IP *
983.B FUTEX_WAIT_REQUEUE_PI
984pairs with
985.BR FUTEX_CMP_REQUEUE_PI .
986This must be performed from a non-PI futex to a distinct PI futex
987(or the error
988.B EINVAL
989results).
990Additionally,
991.I val
992(the number of waiters to be woken) must be 1
993(or the error
994.B EINVAL
995results).
11ac5b51 996.PP
601399f3 997The PI futex operations are as follows:
70b06b90
MK
998.\"
999.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1000.\"
d67e21f5
MK
1001.TP
1002.BR FUTEX_LOCK_PI " (since Linux 2.6.18)"
1003.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
bc54ed38 1004This operation is used after an attempt to acquire
fd105614 1005the lock via an atomic user-mode instruction failed
4b35dc5d 1006because the futex word has a nonzero value\(emspecifically,
8297383e 1007because it contained the (PID-namespace-specific) TID of the lock owner.
efeece04 1008.IP
4b35dc5d 1009The operation checks the value of the futex word at the address
67833bec 1010.IR uaddr .
70b06b90
MK
1011If the value is 0, then the kernel tries to atomically set
1012the futex value to the caller's TID.
c3875d1d 1013If the futex word's value is nonzero,
67833bec 1014the kernel atomically sets the
e0547e70 1015.B FUTEX_WAITERS
67833bec
MK
1016bit, which signals the futex owner that it cannot unlock the futex in
1017user space atomically by setting the futex value to 0.
c3875d1d
MK
1018.\" tglx (July 2015):
1019.\" The operation here is similar to the FUTEX_WAIT logic. When the user
1020.\" space atomic acquire does not succeed because the futex value was non
1021.\" zero, then the waiter goes into the kernel, takes the kernel internal
1022.\" lock and retries the acquisition under the lock. If the acquisition
1023.\" does not succeed either, then it sets the FUTEX_WAITERS bit, to signal
1024.\" the lock owner that it needs to go into the kernel. Here is the pseudo
1025.\" code:
1026.\"
1027.\" lock(kernel_lock);
1028.\" retry:
9bfc9cb1 1029.\"
c3875d1d 1030.\" /*
ba2c4752 1031.\" * Owner might have unlocked in user space before we
c3875d1d
MK
1032.\" * were able to set the waiter bit.
1033.\" */
1034.\" if (atomic_acquire(futex) == SUCCESS) {
1035.\" unlock(kernel_lock());
1036.\" return 0;
1037.\" }
1038.\"
1039.\" /*
1040.\" * Owner might have unlocked after the above atomic_acquire()
1041.\" * attempt.
1042.\" */
1043.\" if (atomic_set_waiters_bit(futex) != SUCCESS)
1044.\" goto retry;
1045.\"
1046.\" queue_waiter();
1047.\" unlock(kernel_lock);
1048.\" block();
1049.\"
1050After that, the kernel:
1051.RS
1052.IP 1. 3
1053Tries to find the thread which is associated with the owner TID.
1054.IP 2.
1055Creates or reuses kernel state on behalf of the owner.
1056(If this is the first waiter, there is no kernel state for this
1057futex, so kernel state is created by locking the RT-mutex
1058and the futex owner is made the owner of the RT-mutex.
1059If there are existing waiters, then the existing state is reused.)
1060.IP 3.
ca4e5b2b 1061Attaches the waiter to the futex
c3875d1d
MK
1062(i.e., the waiter is enqueued on the RT-mutex waiter list).
1063.RE
1064.IP
ac894879
MK
1065If more than one waiter exists,
1066the enqueueing of the waiter is in descending priority order.
1067(For information on priority ordering, see the discussion of the
1068.BR SCHED_DEADLINE ,
1069.BR SCHED_FIFO ,
1070and
1071.BR SCHED_RR
1072scheduling policies in
1073.BR sched (7).)
1074The owner inherits either the waiter's CPU bandwidth
1075(if the waiter is scheduled under the
1076.BR SCHED_DEADLINE
1077policy) or the waiter's priority (if the waiter is scheduled under the
1078.BR SCHED_RR
1079or
1080.BR SCHED_FIFO
1081policy).
1d09c150
MK
1082.\" August 2015:
1083.\" mtk: If the realm is restricted purely to SCHED_OTHER (SCHED_NORMAL)
1084.\" processes, does the nice value come into play also?
1085.\"
1086.\" tglx: No. SCHED_OTHER/NORMAL tasks are handled in FIFO order
c3875d1d 1087This inheritance follows the lock chain in the case of nested locking
ca4e5b2b
MK
1088.\" (i.e., task 1 blocks on lock A, held by task 2,
1089.\" while task 2 blocks on lock B, held by task 3)
c3875d1d 1090and performs deadlock detection.
efeece04 1091.IP
e0547e70
TG
1092The
1093.I timeout
9ce19cf1 1094argument provides a timeout for the lock attempt.
8064bfa5
MK
1095If
1096.I timeout
1097is not NULL, the structure it points to specifies
1098an absolute timeout, measured against the
9ce19cf1
MK
1099.BR CLOCK_REALTIME
1100clock.
c082f385
MK
1101.\" 2016-07-07 response from Thomas Gleixner on LKML:
1102.\" From: Thomas Gleixner <tglx@linutronix.de>
1103.\" Date: 6 July 2016 at 20:57
1104.\" Subject: Re: futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
2ae96e8a 1105.\"
c082f385
MK
1106.\" On Thu, 23 Jun 2016, Michael Kerrisk (man-pages) wrote:
1107.\" > On 06/23/2016 08:28 PM, Darren Hart wrote:
1108.\" > > And as a follow-on, what is the reason for FUTEX_LOCK_PI only using
1109.\" > > CLOCK_REALTIME? It seems reasonable to me that a user may want to wait a
1110.\" > > specific amount of time, regardless of wall time.
1111.\" >
1112.\" > Yes, that's another weird inconsistency.
2ae96e8a 1113.\"
c082f385
MK
1114.\" The reason is that phtread_mutex_timedlock() uses absolute timeouts based on
1115.\" CLOCK_REALTIME. glibc folks asked to make that the default behaviour back
1116.\" then when we added LOCK_PI.
9ce19cf1
MK
1117If
1118.I timeout
1119is NULL, the operation will block indefinitely.
efeece04 1120.IP
a449c634 1121The
e0547e70
TG
1122.IR uaddr2 ,
1123.IR val ,
1124and
eeeee811 1125.I val3
a449c634 1126arguments are ignored.
67833bec 1127.\"
70b06b90
MK
1128.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1129.\"
d67e21f5 1130.TP
eeeee811 1131.BR FUTEX_LOCK_PI2 " (since Linux 5.14)"
e79977ae 1132.\" commit bf22a6976897977b0a3f1aeba6823c959fc4fdae
69bc3836
MK
1133This operation is the same as
1134.BR FUTEX_LOCK_PI ,
1135except that the clock against which
eeeee811 1136.I timeout
69bc3836
MK
1137is measured is selectable.
1138By default, the (absolute) timeout specified in
e79977ae 1139.I timeout
69bc3836
MK
1140is measured againt the
1141.B CLOCK_MONOTONIC
1142clock, but if the
1143.B FUTEX_CLOCK_REALTIME
1144flag is specified in
1145.IR futex_op ,
1146then the timeout is measured against the
1147.B CLOCK_REALTIME
1148clock.
e79977ae
KK
1149.\"
1150.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1151.\"
1152.TP
12fdbe23 1153.BR FUTEX_TRYLOCK_PI " (since Linux 2.6.18)"
d67e21f5 1154.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
3fbb1be1 1155This operation tries to acquire the lock at
12fdbe23 1156.IR uaddr .
c3875d1d
MK
1157It is invoked when a user-space atomic acquire did not
1158succeed because the futex word was not 0.
efeece04 1159.IP
8adaf0a7
MK
1160Because the kernel has access to more state information than user space,
1161acquisition of the lock might succeed if performed by the
1162kernel in cases where the futex word
1163(i.e., the state information accessible to use-space) contains stale state
c3875d1d
MK
1164.RB ( FUTEX_WAITERS
1165and/or
1166.BR FUTEX_OWNER_DIED ).
1167This can happen when the owner of the futex died.
1d09c150
MK
1168User space cannot handle this condition in a race-free manner,
1169but the kernel can fix this up and acquire the futex.
ee65b0e8
MK
1170.\" Paraphrasing a f2f conversation with Thomas Gleixner about the
1171.\" above point (Aug 2015): ###
1172.\" There is a rare possibility of a race condition involving an
1173.\" uncontended futex with no owner, but with waiters. The
1174.\" kernel-user-space contract is that if a futex is nonzero, you must
1175.\" go into kernel. The futex was owned by a task, and that task dies
1176.\" but there are no waiters, so the futex value is non zero.
1177.\" Therefore, the next locker has to go into the kernel,
1178.\" so that the kernel has a chance to clean up. (CMXCH on zero
1179.\" in user space would fail, so kernel has to clean up.)
8adaf0a7
MK
1180.\" Darren Hart (Oct 2015):
1181.\" The trylock in the kernel has more state, so it can independently
ba2c4752 1182.\" verify the flags that user space must trust implicitly.
efeece04 1183.IP
084744ef
MK
1184The
1185.IR uaddr2 ,
1186.IR val ,
1187.IR timeout ,
1188and
1189.IR val3
1190arguments are ignored.
70b06b90
MK
1191.\"
1192.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1193.\"
d67e21f5 1194.TP
12fdbe23 1195.BR FUTEX_UNLOCK_PI " (since Linux 2.6.18)"
d67e21f5 1196.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
d4ba4328 1197This operation wakes the top priority waiter that is waiting in
ecae2099 1198.B FUTEX_LOCK_PI
e79977ae
KK
1199or
1200.B FUTEX_LOCK_PI2
ecae2099
TG
1201on the futex address provided by the
1202.I uaddr
1203argument.
efeece04 1204.IP
1d09c150 1205This is called when the user-space value at
ecae2099
TG
1206.I uaddr
1207cannot be changed atomically from a TID (of the owner) to 0.
efeece04 1208.IP
ecae2099
TG
1209The
1210.IR uaddr2 ,
1211.IR val ,
1212.IR timeout ,
1213and
1214.IR val3
11a194bf 1215arguments are ignored.
70b06b90
MK
1216.\"
1217.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1218.\"
d67e21f5 1219.TP
d67e21f5
MK
1220.BR FUTEX_CMP_REQUEUE_PI " (since Linux 2.6.31)"
1221.\" commit 52400ba946759af28442dee6265c5c0180ac7122
f812a08b
DH
1222This operation is a PI-aware variant of
1223.BR FUTEX_CMP_REQUEUE .
1224It requeues waiters that are blocked via
1225.B FUTEX_WAIT_REQUEUE_PI
1226on
1227.I uaddr
1228from a non-PI source futex
1229.RI ( uaddr )
1230to a PI target futex
1231.RI ( uaddr2 ).
efeece04 1232.IP
9e54d26d
MK
1233As with
1234.BR FUTEX_CMP_REQUEUE ,
1235this operation wakes up a maximum of
1236.I val
1237waiters that are waiting on the futex at
1238.IR uaddr .
1239However, for
1240.BR FUTEX_CMP_REQUEUE_PI ,
1241.I val
6fbeb8f4 1242is required to be 1
939ca89f 1243(since the main point is to avoid a thundering herd).
9e54d26d
MK
1244The remaining waiters are removed from the wait queue of the source futex at
1245.I uaddr
1246and added to the wait queue of the target futex at
1247.IR uaddr2 .
efeece04 1248.IP
9e54d26d 1249The
768d3c23 1250.I val2
c6d8cf21
MK
1251.\" val2 is the cap on the number of requeued waiters.
1252.\" In the glibc pthread_cond_broadcast() implementation, this argument
1253.\" is specified as INT_MAX, and for pthread_cond_signal() it is 0.
9e54d26d 1254and
768d3c23 1255.I val3
9e54d26d
MK
1256arguments serve the same purposes as for
1257.BR FUTEX_CMP_REQUEUE .
70b06b90 1258.\"
8297383e 1259.\" The page at http://locklessinc.com/articles/futex_cheat_sheet/
be376673 1260.\" notes that "priority-inheritance Futex to priority-inheritance
8297383e
MK
1261.\" Futex requeues are currently unsupported". However, probably
1262.\" the page does not need to say nothing about this, since
1263.\" Thomas Gleixner commented (July 2015): "they never will be
1264.\" supported because they make no sense at all"
70b06b90
MK
1265.\"
1266.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1267.\"
d67e21f5
MK
1268.TP
1269.BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)"
1270.\" commit 52400ba946759af28442dee6265c5c0180ac7122
70b06b90 1271.\"
c3875d1d 1272Wait on a non-PI futex at
6ff1b4c0 1273.I uaddr
c3875d1d
MK
1274and potentially be requeued (via a
1275.BR FUTEX_CMP_REQUEUE_PI
1276operation in another task) onto a PI futex at
6ff1b4c0
TG
1277.IR uaddr2 .
1278The wait operation on
1279.I uaddr
c3875d1d 1280is the same as for
6ff1b4c0 1281.BR FUTEX_WAIT .
efeece04 1282.IP
6ff1b4c0
TG
1283The waiter can be removed from the wait on
1284.I uaddr
6ff1b4c0 1285without requeueing on
c3875d1d
MK
1286.IR uaddr2
1287via a
1d09c150 1288.BR FUTEX_WAKE
c3875d1d
MK
1289operation in another task.
1290In this case, the
1291.BR FUTEX_WAIT_REQUEUE_PI
3fbb1be1
MK
1292operation fails with the error
1293.BR EAGAIN .
efeece04 1294.IP
63bea7dc
MK
1295If
1296.I timeout
8064bfa5
MK
1297is not NULL, the structure it points to specifies
1298an absolute timeout for the wait operation.
63bea7dc
MK
1299If
1300.I timeout
1301is NULL, the operation can block indefinitely.
efeece04 1302.IP
a4e69912
MK
1303The
1304.I val3
1305argument is ignored.
efeece04 1306.IP
abb571e8
MK
1307The
1308.BR FUTEX_WAIT_REQUEUE_PI
1309and
1310.BR FUTEX_CMP_REQUEUE_PI
1311were added to support a fairly specific use case:
1312support for priority-inheritance-aware POSIX threads condition variables.
1313The idea is that these operations should always be paired,
1314in order to ensure that user space and the kernel remain in sync.
1315Thus, in the
1316.BR FUTEX_WAIT_REQUEUE_PI
1317operation, the user-space application pre-specifies the target
1318of the requeue that takes place in the
1319.BR FUTEX_CMP_REQUEUE_PI
1320operation.
1321.\"
1322.\" Darren Hart notes that a patch to allow glibc to fully support
1af427a4 1323.\" PI-aware pthreads condition variables has not yet been accepted into
abb571e8
MK
1324.\" glibc. The story is complex, and can be found at
1325.\" https://sourceware.org/bugzilla/show_bug.cgi?id=11588
1326.\" Darren notes that in the meantime, the patch is shipped with various
1af427a4 1327.\" PREEMPT_RT-enabled Linux systems.
abb571e8
MK
1328.\"
1329.\" Related to the preceding, Darren proposed that somewhere, man-pages
1330.\" should document the following point:
1af427a4 1331.\"
4c8cb0ff
MK
1332.\" While the Linux kernel, since 2.6.31, supports requeueing of
1333.\" priority-inheritance (PI) aware mutexes via the
1334.\" FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI futex operations,
1335.\" the glibc implementation does not yet take full advantage of this.
1336.\" Specifically, the condvar internal data lock remains a non-PI aware
1337.\" mutex, regardless of the type of the pthread_mutex associated with
1338.\" the condvar. This can lead to an unbounded priority inversion on
1339.\" the internal data lock even when associating a PI aware
1340.\" pthread_mutex with a condvar during a pthread_cond*_wait
1341.\" operation. For this reason, it is not recommended to rely on
1342.\" priority inheritance when using pthread condition variables.
1af427a4
MK
1343.\"
1344.\" The problem is that the obvious location for this text is
1345.\" the pthread_cond*wait(3) man page. However, such a man page
abb571e8 1346.\" does not currently exist.
70b06b90 1347.\"
6700de24 1348.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
70b06b90 1349.\"
47297adb 1350.SH RETURN VALUE
a5c5a06a
MK
1351In the event of an error (and assuming that
1352.BR futex ()
1353was invoked via
1354.BR syscall (2)),
1355all operations return \-1 and set
e808bba0 1356.I errno
855d489a 1357to indicate the error.
efeece04 1358.PP
e808bba0
MK
1359The return value on success depends on the operation,
1360as described in the following list:
fea681da
MK
1361.TP
1362.B FUTEX_WAIT
077981d4 1363Returns 0 if the caller was woken up.
4c8cb0ff
MK
1364Note that a wake-up can also be caused by common futex usage patterns
1365in unrelated code that happened to have previously used the futex word's
1366memory location (e.g., typical futex-based implementations of
1367Pthreads mutexes can cause this under some conditions).
1368Therefore, callers should always conservatively assume that a return
1369value of 0 can mean a spurious wake-up, and use the futex word's value
bc54ed38
MK
1370(i.e., the user-space synchronization scheme)
1371to decide whether to continue to block or not.
fea681da
MK
1372.TP
1373.B FUTEX_WAKE
bdc5957a 1374Returns the number of waiters that were woken up.
fea681da
MK
1375.TP
1376.B FUTEX_FD
1377Returns the new file descriptor associated with the futex.
1378.TP
1379.B FUTEX_REQUEUE
bdc5957a 1380Returns the number of waiters that were woken up.
fea681da
MK
1381.TP
1382.B FUTEX_CMP_REQUEUE
bdc5957a 1383Returns the total number of waiters that were woken up or
4b35dc5d 1384requeued to the futex for the futex word at
3dfcc11d
MK
1385.IR uaddr2 .
1386If this value is greater than
1387.IR val ,
fd105614 1388then the difference is the number of waiters requeued to the futex for the
4c8cb0ff 1389futex word at
3dfcc11d 1390.IR uaddr2 .
dcad19c0
MK
1391.TP
1392.B FUTEX_WAKE_OP
a8b5b324 1393Returns the total number of waiters that were woken up.
4c8cb0ff
MK
1394This is the sum of the woken waiters on the two futexes for
1395the futex words at
a8b5b324
MK
1396.I uaddr
1397and
1398.IR uaddr2 .
dcad19c0
MK
1399.TP
1400.B FUTEX_WAIT_BITSET
077981d4
MK
1401Returns 0 if the caller was woken up.
1402See
4b35dc5d
TR
1403.B FUTEX_WAIT
1404for how to interpret this correctly in practice.
dcad19c0
MK
1405.TP
1406.B FUTEX_WAKE_BITSET
bdc5957a 1407Returns the number of waiters that were woken up.
dcad19c0
MK
1408.TP
1409.B FUTEX_LOCK_PI
bf02a260 1410Returns 0 if the futex was successfully locked.
dcad19c0 1411.TP
e79977ae
KK
1412.B FUTEX_LOCK_PI2
1413Returns 0 if the futex was successfully locked.
1414.TP
dcad19c0 1415.B FUTEX_TRYLOCK_PI
5c716eef 1416Returns 0 if the futex was successfully locked.
dcad19c0
MK
1417.TP
1418.B FUTEX_UNLOCK_PI
52bb928f 1419Returns 0 if the futex was successfully unlocked.
dcad19c0
MK
1420.TP
1421.B FUTEX_CMP_REQUEUE_PI
bdc5957a 1422Returns the total number of waiters that were woken up or
4b35dc5d 1423requeued to the futex for the futex word at
dddd395a
MK
1424.IR uaddr2 .
1425If this value is greater than
1426.IR val ,
4c8cb0ff
MK
1427then difference is the number of waiters requeued to the futex for
1428the futex word at
dddd395a 1429.IR uaddr2 .
dcad19c0
MK
1430.TP
1431.B FUTEX_WAIT_REQUEUE_PI
4c8cb0ff
MK
1432Returns 0 if the caller was successfully requeued to the futex for
1433the futex word at
22c15de9 1434.IR uaddr2 .
70b06b90
MK
1435.\"
1436.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1437.\"
fea681da
MK
1438.SH ERRORS
1439.TP
1440.B EACCES
4b35dc5d 1441No read access to the memory of a futex word.
fea681da
MK
1442.TP
1443.B EAGAIN
f48516d1 1444.RB ( FUTEX_WAIT ,
4b35dc5d 1445.BR FUTEX_WAIT_BITSET ,
f48516d1 1446.BR FUTEX_WAIT_REQUEUE_PI )
badbf70c
MK
1447The value pointed to by
1448.I uaddr
1449was not equal to the expected value
1450.I val
1451at the time of the call.
efeece04 1452.IP
9732dd8b
MK
1453.BR Note :
1454on Linux, the symbolic names
1455.B EAGAIN
1456and
1457.B EWOULDBLOCK
77da5feb 1458(both of which appear in different parts of the kernel futex code)
9732dd8b 1459have the same value.
badbf70c
MK
1460.TP
1461.B EAGAIN
8f2068bb
MK
1462.RB ( FUTEX_CMP_REQUEUE ,
1463.BR FUTEX_CMP_REQUEUE_PI )
ce5602fd 1464The value pointed to by
9f6c40c0
МК
1465.I uaddr
1466is not equal to the expected value
1467.IR val3 .
fea681da 1468.TP
eeeee811 1469.B EAGAIN
5662f56a 1470.RB ( FUTEX_LOCK_PI ,
e79977ae 1471.BR FUTEX_LOCK_PI2 ,
aaec9032
MK
1472.BR FUTEX_TRYLOCK_PI ,
1473.BR FUTEX_CMP_REQUEUE_PI )
1474The futex owner thread ID of
1475.I uaddr
1476(for
1477.BR FUTEX_CMP_REQUEUE_PI :
1478.IR uaddr2 )
1479is about to exit,
5662f56a
MK
1480but has not yet handled the internal state cleanup.
1481Try again.
1482.TP
eeeee811 1483.B EDEADLK
7a39e745 1484.RB ( FUTEX_LOCK_PI ,
e79977ae 1485.BR FUTEX_LOCK_PI2 ,
9732dd8b
MK
1486.BR FUTEX_TRYLOCK_PI ,
1487.BR FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1488The futex word at
7a39e745
MK
1489.I uaddr
1490is already locked by the caller.
1491.TP
662c0da8 1492.BR EDEADLK
c3875d1d 1493.\" FIXME . I see that kernel/locking/rtmutex.c uses EDEADLK in some
d6bb5a38 1494.\" places, and EDEADLOCK in others. On almost all architectures
4c8cb0ff
MK
1495.\" these constants are synonymous. Is there a reason that both
1496.\" names are used?
8297383e
MK
1497.\"
1498.\" tglx (July 2015): "No. We should probably fix that."
1499.\"
662c0da8 1500.RB ( FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1501While requeueing a waiter to the PI futex for the futex word at
662c0da8
MK
1502.IR uaddr2 ,
1503the kernel detected a deadlock.
1504.TP
fea681da 1505.B EFAULT
1ea901e8
MK
1506A required pointer argument (i.e.,
1507.IR uaddr ,
1508.IR uaddr2 ,
1509or
1510.IR timeout )
496df304 1511did not point to a valid user-space address.
fea681da 1512.TP
9f6c40c0 1513.B EINTR
e808bba0 1514A
9f6c40c0 1515.B FUTEX_WAIT
2674f781
MK
1516or
1517.B FUTEX_WAIT_BITSET
e808bba0 1518operation was interrupted by a signal (see
f529fd20
MK
1519.BR signal (7)).
1520In kernels before Linux 2.6.22, this error could also be returned for
b5fff4ea 1521a spurious wakeup; since Linux 2.6.22, this no longer happens.
9f6c40c0 1522.TP
fea681da 1523.B EINVAL
180f97b7 1524The operation in
eeeee811 1525.I futex_op
180f97b7 1526is one of those that employs a timeout, but the supplied
fb2f4c27
MK
1527.I timeout
1528argument was invalid
1529.RI ( tv_sec
1530was less than zero, or
eeeee811 1531.I tv_nsec
cabee29d 1532was not less than 1,000,000,000).
fb2f4c27
MK
1533.TP
1534.B EINVAL
0c74df0b 1535The operation specified in
eeeee811 1536.I futex_op
0c74df0b 1537employs one or both of the pointers
51ee94be 1538.I uaddr
a1f47699 1539and
0c74df0b
MK
1540.IR uaddr2 ,
1541but one of these does not point to a valid object\(emthat is,
1542the address is not four-byte-aligned.
51ee94be
MK
1543.TP
1544.B EINVAL
55cc422d
TG
1545.RB ( FUTEX_WAIT_BITSET ,
1546.BR FUTEX_WAKE_BITSET )
5e1456d4 1547The bit mask supplied in
eeeee811 1548.I val3
79c9b436
TG
1549is zero.
1550.TP
1551.B EINVAL
2abcba67 1552.RB ( FUTEX_CMP_REQUEUE_PI )
add875c0
MK
1553.I uaddr
1554equals
eeeee811 1555.I uaddr2
add875c0
MK
1556(i.e., an attempt was made to requeue to the same futex).
1557.TP
eeeee811 1558.B EINVAL
ff597681
MK
1559.RB ( FUTEX_FD )
1560The signal number supplied in
1561.I val
1562is invalid.
1563.TP
6bac3b85 1564.B EINVAL
476debd7
MK
1565.RB ( FUTEX_WAKE ,
1566.BR FUTEX_WAKE_OP ,
1567.BR FUTEX_WAKE_BITSET ,
1568.BR FUTEX_REQUEUE ,
1569.BR FUTEX_CMP_REQUEUE )
1570The kernel detected an inconsistency between the user-space state at
1571.I uaddr
1572and the kernel state\(emthat is, it detected a waiter which waits in
eeeee811 1573.B FUTEX_LOCK_PI
e79977ae 1574or
eeeee811 1575.B FUTEX_LOCK_PI2
476debd7
MK
1576on
1577.IR uaddr .
1578.TP
1579.B EINVAL
a218ef20 1580.RB ( FUTEX_LOCK_PI ,
e79977ae 1581.BR FUTEX_LOCK_PI2 ,
ce022f18
MK
1582.BR FUTEX_TRYLOCK_PI ,
1583.BR FUTEX_UNLOCK_PI )
a218ef20
MK
1584The kernel detected an inconsistency between the user-space state at
1585.I uaddr
1586and the kernel state.
ce022f18 1587This indicates either state corruption
ce022f18 1588or that the kernel found a waiter on
a218ef20
MK
1589.I uaddr
1590which is waiting via
eeeee811 1591.B FUTEX_WAIT
a218ef20
MK
1592or
1593.BR FUTEX_WAIT_BITSET .
1594.TP
1595.B EINVAL
f9250b1a
MK
1596.RB ( FUTEX_CMP_REQUEUE_PI )
1597The kernel detected an inconsistency between the user-space state at
99c0041d
MK
1598.I uaddr2
1599and the kernel state;
ee65b0e8
MK
1600.\" From a conversation with Thomas Gleixner (Aug 2015): ###
1601.\" The kernel sees: I have non PI state for a futex you tried to
1602.\" tell me was PI
99c0041d 1603that is, the kernel detected a waiter which waits via
eeeee811 1604.B FUTEX_WAIT
8297383e 1605or
eeeee811 1606.B FUTEX_WAIT_BITSET
99c0041d
MK
1607on
1608.IR uaddr2 .
1609.TP
1610.B EINVAL
1611.RB ( FUTEX_CMP_REQUEUE_PI )
1612The kernel detected an inconsistency between the user-space state at
f9250b1a
MK
1613.I uaddr
1614and the kernel state;
1615that is, the kernel detected a waiter which waits via
eeeee811 1616.B FUTEX_WAIT
99c0041d 1617or
eeeee811 1618.B FUTEX_WAIT_BITSET
f9250b1a
MK
1619on
1620.IR uaddr .
1621.TP
1622.B EINVAL
99c0041d 1623.RB ( FUTEX_CMP_REQUEUE_PI )
75299c8d
MK
1624The kernel detected an inconsistency between the user-space state at
1625.I uaddr
1626and the kernel state;
1627that is, the kernel detected a waiter which waits on
1628.I uaddr
1629via
eeeee811 1630.B FUTEX_LOCK_PI
e79977ae 1631or
eeeee811 1632.B FUTEX_LOCK_PI2
75299c8d
MK
1633(instead of
1634.BR FUTEX_WAIT_REQUEUE_PI ).
99c0041d
MK
1635.TP
1636.B EINVAL
9786b3ca 1637.RB ( FUTEX_CMP_REQUEUE_PI )
8297383e
MK
1638.\" This deals with the case:
1639.\" wait_requeue_pi(A, B);
1640.\" requeue_pi(A, C);
9786b3ca
MK
1641An attempt was made to requeue a waiter to a futex other than that
1642specified by the matching
1643.B FUTEX_WAIT_REQUEUE_PI
1644call for that waiter.
1645.TP
1646.B EINVAL
f0c0d61c
MK
1647.RB ( FUTEX_CMP_REQUEUE_PI )
1648The
1649.I val
1650argument is not 1.
1651.TP
1652.B EINVAL
4832b48a 1653Invalid argument.
fea681da 1654.TP
d07d4ef3
MK
1655.B ENFILE
1656.RB ( FUTEX_FD )
1657The system-wide limit on the total number of open files has been reached.
1658.TP
eeeee811 1659.B ENOMEM
a449c634 1660.RB ( FUTEX_LOCK_PI ,
e79977ae 1661.BR FUTEX_LOCK_PI2 ,
e34a8fb6
MK
1662.BR FUTEX_TRYLOCK_PI ,
1663.BR FUTEX_CMP_REQUEUE_PI )
a449c634
MK
1664The kernel could not allocate memory to hold state information.
1665.TP
4701fc28
MK
1666.B ENOSYS
1667Invalid operation specified in
d33602c4 1668.IR futex_op .
9f6c40c0 1669.TP
4a7e5b05
MK
1670.B ENOSYS
1671The
eeeee811 1672.B FUTEX_CLOCK_REALTIME
4a7e5b05 1673option was specified in
1afcee7c 1674.IR futex_op ,
4a7e5b05 1675but the accompanying operation was neither
017d194b
MK
1676.BR FUTEX_WAIT ,
1677.BR FUTEX_WAIT_BITSET ,
e79977ae 1678.BR FUTEX_WAIT_REQUEUE_PI ,
4a7e5b05 1679nor
e79977ae 1680.BR FUTEX_LOCK_PI2 .
4a7e5b05 1681.TP
eeeee811 1682.B ENOSYS
a9dcb4d1 1683.RB ( FUTEX_LOCK_PI ,
e79977ae 1684.BR FUTEX_LOCK_PI2 ,
f2424fae 1685.BR FUTEX_TRYLOCK_PI ,
4945ff19 1686.BR FUTEX_UNLOCK_PI ,
4cf92894 1687.BR FUTEX_CMP_REQUEUE_PI ,
794bb106 1688.BR FUTEX_WAIT_REQUEUE_PI )
4b35dc5d 1689A run-time check determined that the operation is not available.
f0a9e8f4 1690The PI-futex operations are not implemented on all architectures and
077981d4 1691are not supported on some CPU variants.
a9dcb4d1 1692.TP
eeeee811 1693.B EPERM
c7589177 1694.RB ( FUTEX_LOCK_PI ,
e79977ae 1695.BR FUTEX_LOCK_PI2 ,
dc2742a8
MK
1696.BR FUTEX_TRYLOCK_PI ,
1697.BR FUTEX_CMP_REQUEUE_PI )
04331c3f 1698The caller is not allowed to attach itself to the futex at
dc2742a8
MK
1699.I uaddr
1700(for
1701.BR FUTEX_CMP_REQUEUE_PI :
1702the futex at
1703.IR uaddr2 ).
c7589177
MK
1704(This may be caused by a state corruption in user space.)
1705.TP
eeeee811 1706.B EPERM
87276709 1707.RB ( FUTEX_UNLOCK_PI )
4b35dc5d 1708The caller does not own the lock represented by the futex word.
76f347ba 1709.TP
eeeee811 1710.B ESRCH
0b0e4934 1711.RB ( FUTEX_LOCK_PI ,
e79977ae 1712.BR FUTEX_LOCK_PI2 ,
9732dd8b
MK
1713.BR FUTEX_TRYLOCK_PI ,
1714.BR FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1715The thread ID in the futex word at
0b0e4934
MK
1716.I uaddr
1717does not exist.
1718.TP
eeeee811 1719.B ESRCH
360f773c 1720.RB ( FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1721The thread ID in the futex word at
360f773c
MK
1722.I uaddr2
1723does not exist.
1724.TP
9f6c40c0 1725.B ETIMEDOUT
4d85047f 1726The operation in
eeeee811 1727.I futex_op
4d85047f
MK
1728employed the timeout specified in
1729.IR timeout ,
1730and the timeout expired before the operation completed.
70b06b90
MK
1731.\"
1732.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1733.\"
47297adb 1734.SH VERSIONS
81c9d87e
MK
1735Futexes were first made available in a stable kernel release
1736with Linux 2.6.0.
efeece04 1737.PP
4c8cb0ff
MK
1738Initial futex support was merged in Linux 2.5.7 but with different
1739semantics from what was described above.
52dee70e 1740A four-argument system call with the semantics
fd3fa7ef 1741described in this page was introduced in Linux 2.5.40.
d0442d14
MK
1742A fifth argument was added in Linux 2.5.70,
1743and a sixth argument was added in Linux 2.6.7.
47297adb 1744.SH CONFORMING TO
8382f16d 1745This system call is Linux-specific.
47297adb 1746.SH NOTES
02f7b623 1747Several higher-level programming abstractions are implemented via futexes,
e24fbf10 1748including POSIX semaphores and
02f7b623
MK
1749various POSIX threads synchronization mechanisms
1750(mutexes, condition variables, read-write locks, and barriers).
74f58a64
MK
1751.\" TODO FIXME(Torvald) Above, we cite this section and claim it contains
1752.\" details on the synchronization semantics; add the C11 equivalents
1753.\" here (or whatever we find consensus for).
305cc415
MK
1754.\"
1755.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1756.\"
a14af333 1757.SH EXAMPLES
bc54ed38
MK
1758The program below demonstrates use of futexes in a program where a parent
1759process and a child process use a pair of futexes located inside a
305cc415
MK
1760shared anonymous mapping to synchronize access to a shared resource:
1761the terminal.
1762The two processes each write
1763.IR nloops
1764(a command-line argument that defaults to 5 if omitted)
1765messages to the terminal and employ a synchronization protocol
1766that ensures that they alternate in writing messages.
1767Upon running this program we see output such as the following:
efeece04 1768.PP
305cc415 1769.in +4n
b76974c1 1770.EX
305cc415
MK
1771$ \fB./futex_demo\fP
1772Parent (18534) 0
1773Child (18535) 0
1774Parent (18534) 1
1775Child (18535) 1
1776Parent (18534) 2
1777Child (18535) 2
1778Parent (18534) 3
1779Child (18535) 3
1780Parent (18534) 4
1781Child (18535) 4
b76974c1 1782.EE
305cc415
MK
1783.in
1784.SS Program source
1785\&
e7d0bb47 1786.EX
305cc415
MK
1787/* futex_demo.c
1788
1789 Usage: futex_demo [nloops]
1790 (Default: 5)
1791
1792 Demonstrate the use of futexes in a program where parent and child
1793 use a pair of futexes located inside a shared anonymous mapping to
1794 synchronize access to a shared resource: the terminal. The two
1795 processes each write \(aqnum\-loops\(aq messages to the terminal and employ
1796 a synchronization protocol that ensures that they alternate in
1797 writing messages.
1798*/
1799#define _GNU_SOURCE
1800#include <stdio.h>
1801#include <errno.h>
915c4ba3 1802#include <stdatomic.h>
8eb90116 1803#include <stdint.h>
305cc415
MK
1804#include <stdlib.h>
1805#include <unistd.h>
1806#include <sys/wait.h>
1807#include <sys/mman.h>
1808#include <sys/syscall.h>
1809#include <linux/futex.h>
1810#include <sys/time.h>
1811
d1a71985 1812#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \e
305cc415
MK
1813 } while (0)
1814
2253ecf0 1815static uint32_t *futex1, *futex2, *iaddr;
305cc415
MK
1816
1817static int
2253ecf0
AC
1818futex(uint32_t *uaddr, int futex_op, uint32_t val,
1819 const struct timespec *timeout, uint32_t *uaddr2, uint32_t val3)
305cc415
MK
1820{
1821 return syscall(SYS_futex, uaddr, futex_op, val,
c1e04f01 1822 timeout, uaddr2, val3);
305cc415
MK
1823}
1824
1825/* Acquire the futex pointed to by \(aqfutexp\(aq: wait for its value to
1826 become 1, and then set the value to 0. */
1827
1828static void
2253ecf0 1829fwait(uint32_t *futexp)
305cc415 1830{
2253ecf0 1831 long s;
305cc415 1832
915c4ba3
BP
1833 /* atomic_compare_exchange_strong(ptr, oldval, newval)
1834 atomically performs the equivalent of:
305cc415 1835
915c4ba3 1836 if (*ptr == *oldval)
305cc415
MK
1837 *ptr = newval;
1838
915c4ba3 1839 It returns true if the test yielded true and *ptr was updated. */
305cc415 1840
305cc415 1841 while (1) {
83e80dda 1842
63ad44cb 1843 /* Is the futex available? */
2253ecf0 1844 const uint32_t one = 1;
09e456c2 1845 if (atomic_compare_exchange_strong(futexp, &one, 0))
305cc415
MK
1846 break; /* Yes */
1847
c6beb8a1 1848 /* Futex is not available; wait. */
83e80dda 1849
63ad44cb
HS
1850 s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0);
1851 if (s == \-1 && errno != EAGAIN)
1852 errExit("futex\-FUTEX_WAIT");
305cc415
MK
1853 }
1854}
1855
1856/* Release the futex pointed to by \(aqfutexp\(aq: if the futex currently
1857 has the value 0, set its value to 1 and the wake any futex waiters,
e2c75104 1858 so that if the peer is blocked in fwait(), it can proceed. */
305cc415
MK
1859
1860static void
2253ecf0 1861fpost(uint32_t *futexp)
305cc415 1862{
2253ecf0 1863 long s;
305cc415 1864
68219aba 1865 /* atomic_compare_exchange_strong() was described
c6beb8a1 1866 in comments above. */
305cc415 1867
2253ecf0 1868 const uint32_t zero = 0;
09e456c2 1869 if (atomic_compare_exchange_strong(futexp, &zero, 1)) {
305cc415
MK
1870 s = futex(futexp, FUTEX_WAKE, 1, NULL, NULL, 0);
1871 if (s == \-1)
1872 errExit("futex\-FUTEX_WAKE");
1873 }
1874}
1875
1876int
1877main(int argc, char *argv[])
1878{
1879 pid_t childPid;
88893a77 1880 int nloops;
305cc415
MK
1881
1882 setbuf(stdout, NULL);
1883
1884 nloops = (argc > 1) ? atoi(argv[1]) : 5;
1885
1886 /* Create a shared anonymous mapping that will hold the futexes.
1887 Since the futexes are being shared between processes, we
1888 subsequently use the "shared" futex operations (i.e., not the
c6beb8a1 1889 ones suffixed "_PRIVATE"). */
305cc415 1890
d60a7a9a 1891 iaddr = mmap(NULL, sizeof(*iaddr) * 2, PROT_READ | PROT_WRITE,
305cc415
MK
1892 MAP_ANONYMOUS | MAP_SHARED, \-1, 0);
1893 if (iaddr == MAP_FAILED)
1894 errExit("mmap");
1895
1896 futex1 = &iaddr[0];
1897 futex2 = &iaddr[1];
1898
1899 *futex1 = 0; /* State: unavailable */
1900 *futex2 = 1; /* State: available */
1901
1902 /* Create a child process that inherits the shared anonymous
c6beb8a1 1903 mapping. */
305cc415
MK
1904
1905 childPid = fork();
92a46690 1906 if (childPid == \-1)
305cc415
MK
1907 errExit("fork");
1908
1909 if (childPid == 0) { /* Child */
88893a77 1910 for (int j = 0; j < nloops; j++) {
305cc415 1911 fwait(futex1);
8eb90116 1912 printf("Child (%jd) %d\en", (intmax_t) getpid(), j);
305cc415
MK
1913 fpost(futex2);
1914 }
1915
1916 exit(EXIT_SUCCESS);
1917 }
1918
c6beb8a1 1919 /* Parent falls through to here. */
305cc415 1920
88893a77 1921 for (int j = 0; j < nloops; j++) {
305cc415 1922 fwait(futex2);
8eb90116 1923 printf("Parent (%jd) %d\en", (intmax_t) getpid(), j);
305cc415
MK
1924 fpost(futex1);
1925 }
1926
1927 wait(NULL);
1928
1929 exit(EXIT_SUCCESS);
1930}
e7d0bb47 1931.EE
47297adb 1932.SH SEE ALSO
4c222281 1933.ad l
9913033c 1934.BR get_robust_list (2),
d806bc05 1935.BR restart_syscall (2),
e0074751 1936.BR pthread_mutexattr_getprotocol (3),
ac894879
MK
1937.BR futex (7),
1938.BR sched (7)
fea681da 1939.PP
f5ad572f
MK
1940The following kernel source files:
1941.IP * 2
b49c2acb 1942.I Documentation/pi\-futex.txt
f5ad572f 1943.IP *
b49c2acb 1944.I Documentation/futex\-requeue\-pi.txt
f5ad572f 1945.IP *
b49c2acb 1946.I Documentation/locking/rt\-mutex.txt
f5ad572f 1947.IP *
b49c2acb 1948.I Documentation/locking/rt\-mutex\-design.txt
8fe019c7 1949.IP *
b49c2acb 1950.I Documentation/robust\-futex\-ABI.txt
43b99089 1951.PP
4c222281 1952Franke, H., Russell, R., and Kirwood, M., 2002.
52087dd3 1953\fIFuss, Futexes and Furwocks: Fast Userlevel Locking in Linux\fP
4c222281 1954(from proceedings of the Ottawa Linux Symposium 2002),
9b936e9e 1955.br
5465ae95 1956.UR http://kernel.org\:/doc\:/ols\:/2002\:/ols2002\-pages\-479\-495.pdf
608bf950 1957.UE
efeece04 1958.PP
4c222281 1959Hart, D., 2009. \fIA futex overview and update\fP,
2ed26199
MK
1960.UR http://lwn.net/Articles/360699/
1961.UE
efeece04 1962.PP
8fb01fde 1963Hart, D.\& and Guniguntala, D., 2009.
0483b6cc 1964\fIRequeue-PI: Making Glibc Condvars PI-Aware\fP
4c222281 1965(from proceedings of the 2009 Real-Time Linux Workshop),
0483b6cc
MK
1966.UR http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
1967.UE
efeece04 1968.PP
4c222281 1969Drepper, U., 2011. \fIFutexes Are Tricky\fP,
f42eb21b
MK
1970.UR http://www.akkadia.org/drepper/futex.pdf
1971.UE
9b936e9e
MK
1972.PP
1973Futex example library, futex-*.tar.bz2 at
1974.br
a605264d 1975.UR ftp://ftp.kernel.org\:/pub\:/linux\:/kernel\:/people\:/rusty/
608bf950 1976.UE
34f14794 1977.\"
74f58a64 1978.\" FIXME(Torvald) We should probably refer to the glibc code here, in
9915ea23
MK
1979.\" particular the glibc-internal futex wrapper functions that are
1980.\" WIP, and the generic pthread_mutex_t and perhaps condvar
1981.\" implementations.