]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/futex.2
clock_getres.2: Cast 'time_t' to 'int' for printf() and fix the length modifiers
[thirdparty/man-pages.git] / man2 / futex.2
CommitLineData
8f0aff2a 1.\" Page by b.hubert
1abce893
MK
2.\" and Copyright (C) 2015, Thomas Gleixner <tglx@linutronix.de>
3.\" and Copyright (C) 2015, Michael Kerrisk <mtk.manpages@gmail.com>
2297bf0e 4.\"
2e46a6e7 5.\" %%%LICENSE_START(FREELY_REDISTRIBUTABLE)
8f0aff2a 6.\" may be freely modified and distributed
8ff7380d 7.\" %%%LICENSE_END
fea681da
MK
8.\"
9.\" Niki A. Rahimi (LTC Security Development, narahimi@us.ibm.com)
10.\" added ERRORS section.
11.\"
12.\" Modified 2004-06-17 mtk
13.\" Modified 2004-10-07 aeb, added FUTEX_REQUEUE, FUTEX_CMP_REQUEUE
14.\"
47f5c4ba 15.\" FIXME Still to integrate are some points from Torvald Riegel's mail of
9915ea23 16.\" 2015-01-23:
47f5c4ba
MK
17.\" http://thread.gmane.org/gmane.linux.kernel/1703405/focus=7977
18.\"
78e85692 19.\" FIXME Do we need to add some text regarding Torvald Riegel's 2015-01-24 mail
9915ea23 20.\" http://thread.gmane.org/gmane.linux.kernel/1703405/focus=1873242
02182e7c 21.\"
bffbb22f 22.TH FUTEX 2 2020-06-09 "Linux" "Linux Programmer's Manual"
fea681da 23.SH NAME
ce154705 24futex \- fast user-space locking
fea681da 25.SH SYNOPSIS
9d9dc1e8 26.nf
68e4db0a 27.PP
fea681da 28.B "#include <linux/futex.h>"
fea681da 29.B "#include <sys/time.h>"
68e4db0a 30.PP
d33602c4 31.BI "int futex(int *" uaddr ", int " futex_op ", int " val ,
768d3c23 32.BI " const struct timespec *" timeout , \
9bfc9cb1 33" \fR /* or: \fBuint32_t \fIval2\fP */"
9d9dc1e8 34.BI " int *" uaddr2 ", int " val3 );
9d9dc1e8 35.fi
dbfe9c70 36.PP
b939d6e4
MK
37.IR Note :
38There is no glibc wrapper for this system call; see NOTES.
47297adb 39.SH DESCRIPTION
fea681da 40The
e511ffb6 41.BR futex ()
4b35dc5d 42system call provides a method for waiting until a certain condition becomes
077981d4
MK
43true.
44It is typically used as a blocking construct in the context of
d45f244c
MK
45shared-memory synchronization.
46When using futexes, the majority of
47the synchronization operations are performed in user space.
bc54ed38 48A user-space program employs the
d45f244c 49.BR futex ()
ca4e5b2b 50system call only when it is likely that the program has to block for
4c8cb0ff 51a longer time until the condition becomes true.
bc54ed38 52Other
d45f244c 53.BR futex ()
bc54ed38
MK
54operations can be used to wake any processes or threads waiting
55for a particular condition.
efeece04 56.PP
7e8dcabc
MK
57A futex is a 32-bit value\(emreferred to below as a
58.IR "futex word" \(emwhose
59address is supplied to the
4b35dc5d 60.BR futex ()
7e8dcabc 61system call.
c3f4c019 62(Futexes are 32 bits in size on all platforms, including 64-bit systems.)
7e8dcabc
MK
63All futex operations are governed by this value.
64In order to share a futex between processes,
65the futex is placed in a region of shared memory,
66created using (for example)
67.BR mmap (2)
68or
69.BR shmat (2).
c3f4c019 70(Thus, the futex word may have different
7e8dcabc
MK
71virtual addresses in different processes,
72but these addresses all refer to the same location in physical memory.)
ca4e5b2b
MK
73In a multithreaded program, it is sufficient to place the futex word
74in a global variable shared by all threads.
efeece04 75.PP
0c3ec26b
MK
76When executing a futex operation that requests to block a thread,
77the kernel will block only if the futex word has the value that the
55f9e85e
MK
78calling thread supplied (as one of the arguments of the
79.BR futex ()
80call) as the expected value of the futex word.
9d32a39b
MK
81The loading of the futex word's value,
82the comparison of that value with the expected value,
bc54ed38 83and the actual blocking will happen atomically and will be totally ordered
da894b18 84with respect to concurrent operations performed by other threads
0fb87d16 85on the same futex word.
da894b18
MK
86.\" Notes from Darren Hart (Dec 2015):
87.\" Totally ordered with respect futex operations refers to semantics
88.\" of the ACQUIRE/RELEASE operations and how they impact ordering of
89.\" memory reads and writes. The kernel futex operations are protected
f6615c42 90.\" by spinlocks, which ensure that all operations are serialized
da894b18
MK
91.\" with respect to one another.
92.\"
93.\" This is a lot to attempt to define in this document. Perhaps a
94.\" reference to linux/Documentation/memory-barriers.txt as a footnote
95.\" would be sufficient? Or perhaps for this manual, "serialized" would
96.\" be sufficient, with a footnote regarding "totally ordered" and a
97.\" pointer to the memory-barrier documentation?
b80daba2 98Thus, the futex word is used to connect the synchronization in user space
9d32a39b 99with the implementation of blocking by the kernel.
55f9e85e 100Analogously to an atomic
4b35dc5d 101compare-and-exchange operation that potentially changes shared memory,
077981d4 102blocking via a futex is an atomic compare-and-block operation.
d6bb5a38 103.\" FIXME(Torvald Riegel):
61066e14
MK
104.\" Eventually we want to have some text in NOTES to satisfy
105.\" the reference in the following sentence
106.\" See NOTES for a detailed specification of
107.\" the synchronization semantics.
efeece04 108.PP
ca4e5b2b 109One use of futexes is for implementing locks.
c0dc758e
MK
110The state of the lock (i.e., acquired or not acquired)
111can be represented as an atomically accessed flag in shared memory.
4c8cb0ff 112In the uncontended case,
c3f4c019 113a thread can access or modify the lock state with atomic instructions,
4c8cb0ff
MK
114for example atomically changing it from not acquired to acquired
115using an atomic compare-and-exchange instruction.
55f9e85e
MK
116(Such instructions are performed entirely in user mode,
117and the kernel maintains no information about the lock state.)
118On the other hand, a thread may be unable to acquire a lock because
8e754e12 119it is already acquired by another thread.
55f9e85e 120It then may pass the lock's flag as a futex word and the value
0c3ec26b 121representing the acquired state as the expected value to a
8e754e12
HS
122.BR futex ()
123wait operation.
55f9e85e 124This
8e754e12 125.BR futex ()
bc54ed38 126operation will block if and only if the lock is still acquired
f6615c42 127(i.e., the value in the futex word still matches the "acquired state").
077981d4 128When releasing the lock, a thread has to first reset the
0c3ec26b 129lock state to not acquired and then execute a futex
55f9e85e 130operation that wakes threads blocked on the lock flag used as a futex word
f6615c42 131(this can be further optimized to avoid unnecessary wake-ups).
077981d4 132See
4b35dc5d
TR
133.BR futex (7)
134for more detail on how to use futexes.
efeece04 135.PP
4b35dc5d 136Besides the basic wait and wake-up futex functionality, there are further
077981d4 137futex operations aimed at supporting more complex use cases.
efeece04 138.PP
ca4e5b2b 139Note that
2af84f99 140no explicit initialization or destruction is necessary to use futexes;
4c8cb0ff
MK
141the kernel maintains a futex
142(i.e., the kernel-internal implementation artifact)
4b35dc5d
TR
143only while operations such as
144.BR FUTEX_WAIT ,
145described below, are being performed on a particular futex word.
a663ca5a
MK
146.\"
147.SS Arguments
fea681da
MK
148The
149.I uaddr
077981d4
MK
150argument points to the futex word.
151On all platforms, futexes are four-byte
4b35dc5d 152integers that must be aligned on a four-byte boundary.
f388ba70
MK
153The operation to perform on the futex is specified in the
154.I futex_op
155argument;
156.IR val
157is a value whose meaning and purpose depends on
158.IR futex_op .
efeece04 159.PP
36ab2074
MK
160The remaining arguments
161.RI ( timeout ,
162.IR uaddr2 ,
163and
164.IR val3 )
165are required only for certain of the futex operations described below.
166Where one of these arguments is not required, it is ignored.
efeece04 167.PP
36ab2074
MK
168For several blocking operations, the
169.I timeout
170argument is a pointer to a
171.IR timespec
172structure that specifies a timeout for the operation.
173However, notwithstanding the prototype shown above, for some operations,
eb4aa521
MK
174the least significant four bytes of this argument are instead
175used as an integer whose meaning is determined by the operation.
768d3c23
MK
176For these operations, the kernel casts the
177.I timeout
10022b8e
HS
178value first to
179.IR "unsigned long",
180then to
c6dc40a2 181.IR uint32_t ,
768d3c23
MK
182and in the remainder of this page, this argument is referred to as
183.I val2
184when interpreted in this fashion.
efeece04 185.PP
de5a3bb4 186Where it is required, the
36ab2074 187.IR uaddr2
4c8cb0ff
MK
188argument is a pointer to a second futex word that is employed
189by the operation.
efeece04 190.PP
36ab2074
MK
191The interpretation of the final integer argument,
192.IR val3 ,
193depends on the operation.
a663ca5a
MK
194.\"
195.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
196.\"
197.SS Futex operations
6be4bad7 198The
d33602c4 199.I futex_op
6be4bad7
MK
200argument consists of two parts:
201a command that specifies the operation to be performed,
ed1819cf 202bit-wise ORed with zero or more options that
6be4bad7 203modify the behaviour of the operation.
fc30eb79 204The options that may be included in
d33602c4 205.I futex_op
fc30eb79
TG
206are as follows:
207.TP
208.BR FUTEX_PRIVATE_FLAG " (since Linux 2.6.22)"
209.\" commit 34f01cc1f512fa783302982776895c73714ebbc2
210This option bit can be employed with all futex operations.
e45f9735 211It tells the kernel that the futex is process-private and not shared
0c3ec26b
MK
212with another process (i.e., it is being used for synchronization
213only between threads of the same process).
943ccc52
MK
214This allows the kernel to make some additional performance optimizations.
215.\" I.e., It allows the kernel choose the fast path for validating
216.\" the user-space address and avoids expensive VMA lookups,
217.\" taking reference counts on file backing store, and so on.
efeece04 218.IP
ae2c1774
MK
219As a convenience,
220.IR <linux/futex.h>
221defines a set of constants with the suffix
222.BR _PRIVATE
223that are equivalents of all of the operations listed below,
dcdfde26 224.\" except the obsolete FUTEX_FD, for which the "private" flag was
ae2c1774
MK
225.\" meaningless
226but with the
227.BR FUTEX_PRIVATE_FLAG
228ORed into the constant value.
229Thus, there are
230.BR FUTEX_WAIT_PRIVATE ,
231.BR FUTEX_WAKE_PRIVATE ,
232and so on.
2e98bbc2
TG
233.TP
234.BR FUTEX_CLOCK_REALTIME " (since Linux 2.6.28)"
235.\" commit 1acdac104668a0834cfa267de9946fac7764d486
4a7e5b05 236This option bit can be employed only with the
949ceae3
MK
237.BR FUTEX_WAIT_BITSET ,
238.BR FUTEX_WAIT_REQUEUE_PI ,
2e98bbc2 239and
949ceae3
MK
240(since Linux 4.5)
241.\" commit 337f13046ff03717a9e99675284a817527440a49
6f19879d 242.BR FUTEX_WAIT
c84cf68c 243operations.
efeece04 244.IP
8064bfa5 245If this option is set, the kernel measures the
f2103b26 246.I timeout
8064bfa5
MK
247against the
248.BR CLOCK_REALTIME
249clock.
efeece04 250.IP
8064bfa5 251If this option is not set, the kernel measures the
f2103b26 252.I timeout
8064bfa5 253against the
1c952cf5
MK
254.BR CLOCK_MONOTONIC
255clock.
6be4bad7
MK
256.PP
257The operation specified in
d33602c4 258.I futex_op
6be4bad7 259is one of the following:
70b06b90
MK
260.\"
261.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
262.\"
fea681da 263.TP
81c9d87e
MK
264.BR FUTEX_WAIT " (since Linux 2.6.0)"
265.\" Strictly speaking, since some time in 2.5.x
f065673c 266This operation tests that the value at the
4b35dc5d 267futex word pointed to by the address
fea681da 268.I uaddr
4b35dc5d 269still contains the expected value
fea681da 270.IR val ,
fd105614 271and if so, then sleeps waiting for a
682edefb 272.B FUTEX_WAKE
fd105614 273operation on the futex word.
077981d4 274The load of the value of the futex word is an atomic memory
4b35dc5d 275access (i.e., using atomic machine instructions of the respective
077981d4
MK
276architecture).
277This load, the comparison with the expected value, and
fd105614 278starting to sleep are performed atomically
da56650a 279.\" FIXME: Torvald, I think we may need to add some explanation of
61066e14 280.\" "totally ordered" here.
fd105614
MK
281and totally ordered
282with respect to other futex operations on the same futex word.
c0dc758e
MK
283If the thread starts to sleep,
284it is considered a waiter on this futex word.
f065673c
MK
285If the futex value does not match
286.IR val ,
4710334a 287then the call fails immediately with the error
badbf70c 288.BR EAGAIN .
efeece04 289.IP
4b35dc5d 290The purpose of the comparison with the expected value is to prevent lost
fd105614
MK
291wake-ups.
292If another thread changed the value of the futex word after the
c0dc758e
MK
293calling thread decided to block based on the prior value,
294and if the other thread executed a
4b35dc5d
TR
295.BR FUTEX_WAKE
296operation (or similar wake-up) after the value change and before this
f065673c 297.BR FUTEX_WAIT
bc54ed38
MK
298operation, then the calling thread will observe the
299value change and will not start to sleep.
efeece04 300.IP
c13182ef 301If the
fea681da 302.I timeout
40d2dab9 303is not NULL, the structure it points to specifies a
40d2dab9 304timeout for the wait.
ac991a11
MK
305(This interval will be rounded up to the system clock granularity,
306and is guaranteed not to expire early.)
a6918f1d 307The timeout is by default measured according to the
1c952cf5 308.BR CLOCK_MONOTONIC
a01c3098
MK
309clock, but, since Linux 4.5, the
310.BR CLOCK_REALTIME
311clock can be selected by specifying
312.BR FUTEX_CLOCK_REALTIME
313in
314.IR futex_op .
82a6092b
MK
315If
316.I timeout
317is NULL, the call blocks indefinitely.
efeece04 318.IP
4100abc5
MK
319.IR Note :
320for
321.BR FUTEX_WAIT ,
322.IR timeout
323is interpreted as a
324.IR relative
325value.
326This differs from other futex operations, where
327.I timeout
328is interpreted as an absolute value.
329To obtain the equivalent of
330.BR FUTEX_WAIT
331with an absolute timeout, employ
332.BR FUTEX_WAIT_BITSET
333with
334.IR val3
335specified as
336.BR FUTEX_BITSET_MATCH_ANY .
efeece04 337.IP
c13182ef 338The arguments
fea681da
MK
339.I uaddr2
340and
341.I val3
342are ignored.
9915ea23
MK
343.\" FIXME . (Torvald) I think we should remove this. Or maybe adapt to a
344.\" different example.
345.\"
346.\" For
347.\" .BR futex (7),
348.\" this call is executed if decrementing the count gave a negative value
349.\" (indicating contention),
350.\" and will sleep until another process or thread releases
351.\" the futex and executes the
352.\" .B FUTEX_WAKE
353.\" operation.
70b06b90
MK
354.\"
355.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
356.\"
fea681da 357.TP
81c9d87e
MK
358.BR FUTEX_WAKE " (since Linux 2.6.0)"
359.\" Strictly speaking, since Linux 2.5.x
f065673c
MK
360This operation wakes at most
361.I val
4b35dc5d 362of the waiters that are waiting (e.g., inside
f065673c 363.BR FUTEX_WAIT )
4b35dc5d 364on the futex word at the address
f065673c
MK
365.IR uaddr .
366Most commonly,
367.I val
368is specified as either 1 (wake up a single waiter) or
369.BR INT_MAX
370(wake up all waiters).
730bfbda
MK
371No guarantee is provided about which waiters are awoken
372(e.g., a waiter with a higher scheduling priority is not guaranteed
373to be awoken in preference to a waiter with a lower priority).
efeece04 374.IP
fea681da
MK
375The arguments
376.IR timeout ,
c8b921bd 377.IR uaddr2 ,
fea681da
MK
378and
379.I val3
380are ignored.
9915ea23
MK
381.\" FIXME . (Torvald) I think we should remove this. Or maybe adapt to
382.\" a different example.
383.\"
4c8cb0ff
MK
384.\" For
385.\" .BR futex (7),
386.\" this is executed if incrementing the count showed that
387.\" there were waiters,
388.\" once the futex value has been set to 1
389.\" (indicating that it is available).
390.\"
9915ea23 391.\" How does "incrementing the count show that there were waiters"?
70b06b90
MK
392.\"
393.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
394.\"
a7c2bf45
MK
395.TP
396.BR FUTEX_FD " (from Linux 2.6.0 up to and including Linux 2.6.25)"
397.\" Strictly speaking, from Linux 2.5.x to 2.6.25
4c8cb0ff
MK
398This operation creates a file descriptor that is associated with
399the futex at
a7c2bf45 400.IR uaddr .
bdc5957a
MK
401The caller must close the returned file descriptor after use.
402When another process or thread performs a
a7c2bf45 403.BR FUTEX_WAKE
4b35dc5d 404on the futex word, the file descriptor indicates as being readable with
a7c2bf45
MK
405.BR select (2),
406.BR poll (2),
407and
408.BR epoll (7)
efeece04 409.IP
f1d2171d 410The file descriptor can be used to obtain asynchronous notifications: if
a7c2bf45 411.I val
ca4e5b2b 412is nonzero, then, when another process or thread executes a
a7c2bf45
MK
413.BR FUTEX_WAKE ,
414the caller will receive the signal number that was passed in
415.IR val .
efeece04 416.IP
a7c2bf45
MK
417The arguments
418.IR timeout ,
419.I uaddr2
420and
421.I val3
422are ignored.
efeece04 423.IP
a7c2bf45
MK
424Because it was inherently racy,
425.B FUTEX_FD
426has been removed
427.\" commit 82af7aca56c67061420d618cc5a30f0fd4106b80
428from Linux 2.6.26 onward.
70b06b90
MK
429.\"
430.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
431.\"
a7c2bf45
MK
432.TP
433.BR FUTEX_REQUEUE " (since Linux 2.6.0)"
a7c2bf45 434This operation performs the same task as
27dd3a6e
MK
435.BR FUTEX_CMP_REQUEUE
436(see below), except that no check is made using the value in
a7c2bf45
MK
437.IR val3 .
438(The argument
439.I val3
440is ignored.)
70b06b90
MK
441.\"
442.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
443.\"
a7c2bf45
MK
444.TP
445.BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)"
4b35dc5d 446This operation first checks whether the location
a7c2bf45
MK
447.I uaddr
448still contains the value
449.IR val3 .
450If not, the operation fails with the error
451.BR EAGAIN .
4b35dc5d 452Otherwise, the operation wakes up a maximum of
a7c2bf45
MK
453.I val
454waiters that are waiting on the futex at
455.IR uaddr .
456If there are more than
457.I val
458waiters, then the remaining waiters are removed
459from the wait queue of the source futex at
460.I uaddr
461and added to the wait queue of the target futex at
462.IR uaddr2 .
463The
768d3c23 464.I val2
936876a9 465argument specifies an upper limit on the number of waiters
a7c2bf45 466that are requeued to the futex at
768d3c23 467.IR uaddr2 .
efeece04 468.IP
d6bb5a38
MK
469.\" FIXME(Torvald) Is the following correct? Or is just the decision
470.\" which threads to wake or requeue part of the atomic operation?
4b35dc5d
TR
471The load from
472.I uaddr
4c8cb0ff
MK
473is an atomic memory access (i.e., using atomic machine instructions of
474the respective architecture).
077981d4 475This load, the comparison with
4b35dc5d 476.IR val3 ,
4c8cb0ff
MK
477and the requeueing of any waiters are performed atomically and totally
478ordered with respect to other operations on the same futex word.
ee65b0e8
MK
479.\" Notes from a f2f conversation with Thomas Gleixner (Aug 2015): ###
480.\" The operation is serialized with respect to operations on both
481.\" source and target futex. No other waiter can enqueue itself
482.\" for waiting and no other waiter can dequeue itself because of
483.\" a timeout or signal.
efeece04 484.IP
a7c2bf45
MK
485Typical values to specify for
486.I val
ed1819cf 487are 0 or 1.
a7c2bf45
MK
488(Specifying
489.BR INT_MAX
490is not useful, because it would make the
491.BR FUTEX_CMP_REQUEUE
492operation equivalent to
493.BR FUTEX_WAKE .)
936876a9 494The limit value specified via
768d3c23
MK
495.I val2
496is typically either 1 or
a7c2bf45
MK
497.BR INT_MAX .
498(Specifying the argument as 0 is not useful, because it would make the
499.BR FUTEX_CMP_REQUEUE
500operation equivalent to
501.BR FUTEX_WAIT .)
efeece04 502.IP
627b50ce
MK
503The
504.B FUTEX_CMP_REQUEUE
505operation was added as a replacement for the earlier
506.BR FUTEX_REQUEUE .
507The difference is that the check of the value at
508.I uaddr
509can be used to ensure that requeueing happens only under certain
510conditions, which allows race conditions to be avoided in certain use cases.
dcb410c3 511.\" But, as Rich Felker points out, there remain valid use cases for
627b50ce
MK
512.\" FUTEX_REQUEUE, for example, when the calling thread is requeuing
513.\" the target(s) to a lock that the calling thread owns
514.\" From: Rich Felker <dalias@libc.org>
515.\" Date: Wed, 29 Oct 2014 22:43:17 -0400
516.\" To: Darren Hart <dvhart@infradead.org>
517.\" CC: libc-alpha@sourceware.org, ...
518.\" Subject: Re: Add futex wrapper to glibc?
efeece04 519.IP
627b50ce
MK
520Both
521.BR FUTEX_REQUEUE
522and
523.BR FUTEX_CMP_REQUEUE
524can be used to avoid "thundering herd" wake-ups that could occur when using
525.B FUTEX_WAKE
526in cases where all of the waiters that are woken need to acquire
527another futex.
528Consider the following scenario,
529where multiple waiter threads are waiting on B,
530a wait queue implemented using a futex:
efeece04 531.IP
627b50ce 532.in +4n
b76974c1 533.EX
627b50ce
MK
534lock(A)
535while (!check_value(V)) {
536 unlock(A);
537 block_on(B);
538 lock(A);
539};
540unlock(A);
b76974c1 541.EE
627b50ce 542.in
efeece04 543.IP
627b50ce
MK
544If a waker thread used
545.BR FUTEX_WAKE ,
546then all waiters waiting on B would be woken up,
67c67ff2 547and they would all try to acquire lock A.
627b50ce
MK
548However, waking all of the threads in this manner would be pointless because
549all except one of the threads would immediately block on lock A again.
550By contrast, a requeue operation wakes just one waiter and moves
551the other waiters to lock A,
552and when the woken waiter unlocks A then the next waiter can proceed.
43d16602 553.\"
70b06b90
MK
554.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
555.\"
fea681da 556.TP
d67e21f5
MK
557.BR FUTEX_WAKE_OP " (since Linux 2.6.14)"
558.\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721
6bac3b85
MK
559.\" Author: Jakub Jelinek <jakub@redhat.com>
560.\" Date: Tue Sep 6 15:16:25 2005 -0700
9915ea23 561.\" FIXME. (Torvald) The glibc condvar implementation is currently being
4c8cb0ff
MK
562.\" revised (e.g., to not use an internal lock anymore).
563.\" It is probably more future-proof to remove this paragraph.
d6bb5a38 564.\" [Torvald, do you have an update here?]
6bac3b85
MK
565This operation was added to support some user-space use cases
566where more than one futex must be handled at the same time.
567The most notable example is the implementation of
568.BR pthread_cond_signal (3),
569which requires operations on two futexes,
570the one used to implement the mutex and the one used in the implementation
571of the wait queue associated with the condition variable.
572.BR FUTEX_WAKE_OP
573allows such cases to be implemented without leading to
574high rates of contention and context switching.
efeece04 575.IP
6bac3b85 576The
57f2d48b 577.BR FUTEX_WAKE_OP
e61abc20 578operation is equivalent to executing the following code atomically
4c8cb0ff
MK
579and totally ordered with respect to other futex operations on
580any of the two supplied futex words:
efeece04 581.IP
6bac3b85 582.in +4n
b76974c1 583.EX
6bac3b85
MK
584int oldval = *(int *) uaddr2;
585*(int *) uaddr2 = oldval \fIop\fP \fIoparg\fP;
586futex(uaddr, FUTEX_WAKE, val, 0, 0, 0);
587if (oldval \fIcmp\fP \fIcmparg\fP)
768d3c23 588 futex(uaddr2, FUTEX_WAKE, val2, 0, 0, 0);
b76974c1 589.EE
6bac3b85 590.in
efeece04 591.IP
6bac3b85 592In other words,
57f2d48b 593.BR FUTEX_WAKE_OP
6bac3b85
MK
594does the following:
595.RS
596.IP * 3
4b35dc5d
TR
597saves the original value of the futex word at
598.IR uaddr2
599and performs an operation to modify the value of the futex at
6bac3b85 600.IR uaddr2 ;
4c8cb0ff
MK
601this is an atomic read-modify-write memory access (i.e., using atomic
602machine instructions of the respective architecture)
6bac3b85
MK
603.IP *
604wakes up a maximum of
605.I val
4b35dc5d 606waiters on the futex for the futex word at
6bac3b85
MK
607.IR uaddr ;
608and
609.IP *
4c8cb0ff
MK
610dependent on the results of a test of the original value of the
611futex word at
6bac3b85
MK
612.IR uaddr2 ,
613wakes up a maximum of
768d3c23 614.I val2
4b35dc5d 615waiters on the futex for the futex word at
6bac3b85
MK
616.IR uaddr2 .
617.RE
618.IP
6bac3b85
MK
619The operation and comparison that are to be performed are encoded
620in the bits of the argument
621.IR val3 .
622Pictorially, the encoding is:
efeece04 623.IP
f6af90e7 624.in +8n
b76974c1 625.EX
f6af90e7
MK
626+---+---+-----------+-----------+
627|op |cmp| oparg | cmparg |
628+---+---+-----------+-----------+
629 4 4 12 12 <== # of bits
b76974c1 630.EE
6bac3b85 631.in
efeece04 632.IP
6bac3b85 633Expressed in code, the encoding is:
efeece04 634.IP
6bac3b85 635.in +4n
b76974c1 636.EX
d1a71985
MK
637#define FUTEX_OP(op, oparg, cmp, cmparg) \e
638 (((op & 0xf) << 28) | \e
639 ((cmp & 0xf) << 24) | \e
640 ((oparg & 0xfff) << 12) | \e
6bac3b85 641 (cmparg & 0xfff))
b76974c1 642.EE
6bac3b85 643.in
efeece04 644.IP
6bac3b85
MK
645In the above,
646.I op
647and
648.I cmp
649are each one of the codes listed below.
650The
651.I oparg
652and
653.I cmparg
654components are literal numeric values, except as noted below.
efeece04 655.IP
6bac3b85
MK
656The
657.I op
658component has one of the following values:
efeece04 659.IP
6bac3b85 660.in +4n
b76974c1 661.EX
6bac3b85
MK
662FUTEX_OP_SET 0 /* uaddr2 = oparg; */
663FUTEX_OP_ADD 1 /* uaddr2 += oparg; */
664FUTEX_OP_OR 2 /* uaddr2 |= oparg; */
af2d18b2 665FUTEX_OP_ANDN 3 /* uaddr2 &= \(tioparg; */
9ca13180 666FUTEX_OP_XOR 4 /* uaddr2 \(ha= oparg; */
b76974c1 667.EE
6bac3b85 668.in
efeece04 669.IP
6bac3b85
MK
670In addition, bit-wise ORing the following value into
671.I op
672causes
673.IR "(1\ <<\ oparg)"
674to be used as the operand:
efeece04 675.IP
6bac3b85 676.in +4n
b76974c1 677.EX
6bac3b85 678FUTEX_OP_ARG_SHIFT 8 /* Use (1 << oparg) as operand */
b76974c1 679.EE
6bac3b85 680.in
efeece04 681.IP
6bac3b85
MK
682The
683.I cmp
684field is one of the following:
efeece04 685.IP
6bac3b85 686.in +4n
b76974c1 687.EX
6bac3b85
MK
688FUTEX_OP_CMP_EQ 0 /* if (oldval == cmparg) wake */
689FUTEX_OP_CMP_NE 1 /* if (oldval != cmparg) wake */
690FUTEX_OP_CMP_LT 2 /* if (oldval < cmparg) wake */
691FUTEX_OP_CMP_LE 3 /* if (oldval <= cmparg) wake */
692FUTEX_OP_CMP_GT 4 /* if (oldval > cmparg) wake */
693FUTEX_OP_CMP_GE 5 /* if (oldval >= cmparg) wake */
b76974c1 694.EE
6bac3b85 695.in
efeece04 696.IP
6bac3b85
MK
697The return value of
698.BR FUTEX_WAKE_OP
699is the sum of the number of waiters woken on the futex
700.IR uaddr
701plus the number of waiters woken on the futex
702.IR uaddr2 .
70b06b90
MK
703.\"
704.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
705.\"
d67e21f5 706.TP
79c9b436
TG
707.BR FUTEX_WAIT_BITSET " (since Linux 2.6.25)"
708.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
fd9e59d4 709This operation is like
79c9b436
TG
710.BR FUTEX_WAIT
711except that
712.I val3
84abf4ba 713is used to provide a 32-bit bit mask to the kernel.
2ae96e8a 714This bit mask, in which at least one bit must be set,
6c38ce7f 715is stored in the kernel-internal state of the waiter.
79c9b436
TG
716See the description of
717.BR FUTEX_WAKE_BITSET
718for further details.
efeece04 719.IP
8064bfa5
MK
720If
721.I timeout
722is not NULL, the structure it points to specifies
723an absolute timeout for the wait operation.
724If
725.I timeout
726is NULL, the operation can block indefinitely.
efeece04 727.IP
79c9b436
TG
728The
729.I uaddr2
730argument is ignored.
70b06b90
MK
731.\"
732.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
733.\"
79c9b436 734.TP
d67e21f5
MK
735.BR FUTEX_WAKE_BITSET " (since Linux 2.6.25)"
736.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
55cc422d
TG
737This operation is the same as
738.BR FUTEX_WAKE
739except that the
e24fbf10 740.I val3
5e1456d4 741argument is used to provide a 32-bit bit mask to the kernel.
6c38ce7f
MK
742This bit mask, in which at least one bit must be set,
743is used to select which waiters should be woken up.
5e1456d4 744The selection is done by a bit-wise AND of the "wake" bit mask
98d769c0
MK
745(i.e., the value in
746.IR val3 )
5e1456d4
MK
747and the bit mask which is stored in the kernel-internal
748state of the waiter (the "wait" bit mask that is set using
98d769c0
MK
749.BR FUTEX_WAIT_BITSET ).
750All of the waiters for which the result of the AND is nonzero are woken up;
751the remaining waiters are left sleeping.
efeece04 752.IP
e9d4496b
MK
753The effect of
754.BR FUTEX_WAIT_BITSET
755and
756.BR FUTEX_WAKE_BITSET
9732dd8b
MK
757is to allow selective wake-ups among multiple waiters that are blocked
758on the same futex.
ac894879 759However, note that, depending on the use case,
5e1456d4 760employing this bit-mask multiplexing feature on a
ac894879 761futex can be less efficient than simply using multiple futexes,
5e1456d4 762because employing bit-mask multiplexing requires the kernel
e9d4496b
MK
763to check all waiters on a futex,
764including those that are not interested in being woken up
5e1456d4 765(i.e., they do not have the relevant bit set in their "wait" bit mask).
e9d4496b
MK
766.\" According to http://locklessinc.com/articles/futex_cheat_sheet/:
767.\"
768.\" "The original reason for the addition of these extensions
769.\" was to improve the performance of pthread read-write locks
770.\" in glibc. However, the pthreads library no longer uses the
771.\" same locking algorithm, and these extensions are not used
772.\" without the bitset parameter being all ones.
e24fbf10 773.\"
e9d4496b 774.\" The page goes on to note that the FUTEX_WAIT_BITSET operation
5e1456d4 775.\" is nevertheless used (with a bit mask of all ones) in order to
e9d4496b
MK
776.\" obtain the absolute timeout functionality that is useful
777.\" for efficiently implementing Pthreads APIs (which use absolute
778.\" timeouts); FUTEX_WAIT provides only relative timeouts.
efeece04 779.IP
678c9986
MK
780The constant
781.BR FUTEX_BITSET_MATCH_ANY ,
782which corresponds to all 32 bits set in the bit mask, can be used as the
783.I val3
784argument for
785.BR FUTEX_WAIT_BITSET
98d769c0 786and
678c9986
MK
787.BR FUTEX_WAKE_BITSET .
788Other than differences in the handling of the
98d769c0 789.I timeout
678c9986 790argument, the
9732dd8b 791.BR FUTEX_WAIT
678c9986 792operation is equivalent to
9732dd8b 793.BR FUTEX_WAIT_BITSET
678c9986
MK
794with
795.IR val3
796specified as
797.BR FUTEX_BITSET_MATCH_ANY ;
798that is, allow a wake-up by any waker.
799The
800.BR FUTEX_WAKE
801operation is equivalent to
9732dd8b 802.BR FUTEX_WAKE_BITSET
678c9986
MK
803with
804.IR val3
805specified as
806.BR FUTEX_BITSET_MATCH_ANY ;
807that is, wake up any waiter(s).
efeece04 808.IP
678c9986
MK
809The
810.I uaddr2
811and
812.I timeout
813arguments are ignored.
bd90a5f9 814.\"
70b06b90 815.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
bd90a5f9
MK
816.\"
817.SS Priority-inheritance futexes
b52e1cd4
MK
818Linux supports priority-inheritance (PI) futexes in order to handle
819priority-inversion problems that can be encountered with
820normal futex locks.
b565548b 821Priority inversion is the problem that occurs when a high-priority
bdc5957a
MK
822task is blocked waiting to acquire a lock held by a low-priority task,
823while tasks at an intermediate priority continuously preempt
824the low-priority task from the CPU.
825Consequently, the low-priority task makes no progress toward
826releasing the lock, and the high-priority task remains blocked.
efeece04 827.PP
7d20efd7
MK
828Priority inheritance is a mechanism for dealing with
829the priority-inversion problem.
bdc5957a
MK
830With this mechanism, when a high-priority task becomes blocked
831by a lock held by a low-priority task,
9cee832c
MK
832the priority of the low-priority task is temporarily raised
833to that of the high-priority task,
bdc5957a 834so that it is not preempted by any intermediate level tasks,
7d20efd7
MK
835and can thus make progress toward releasing the lock.
836To be effective, priority inheritance must be transitive,
bdc5957a 837meaning that if a high-priority task blocks on a lock
ca4e5b2b 838held by a lower-priority task that is itself blocked by a lock
bdc5957a 839held by another intermediate-priority task
7d20efd7 840(and so on, for chains of arbitrary length),
b0f35fbb 841then both of those tasks
bdc5957a
MK
842(or more generally, all of the tasks in a lock chain)
843have their priorities raised to be the same as the high-priority task.
efeece04 844.PP
9e2b90ee 845From a user-space perspective,
39e9b2e1
MK
846what makes a futex PI-aware is a policy agreement (described below)
847between user space and the kernel about the value of the futex word,
601399f3
MK
848coupled with the use of the PI-futex operations described below.
849(Unlike the other futex operations described above,
850the PI-futex operations are designed
851for the implementation of very specific IPC mechanisms.)
852.\"
9e2b90ee
MK
853.\" Quoting Darren Hart:
854.\" These opcodes paired with the PI futex value policy (described below)
855.\" defines a "futex" as PI aware. These were created very specifically
856.\" in support of PI pthread_mutexes, so it makes a lot more sense to
857.\" talk about a PI aware pthread_mutex, than a PI aware futex, since
858.\" there is a lot of policy and scaffolding that has to be built up
859.\" around it to use it properly (this is what a PI pthread_mutex is).
efeece04 860.PP
ac894879 861.\" mtk: The following text is drawn from the Hart/Guniguntala paper
1af427a4 862.\" (listed in SEE ALSO), but I have reworded some pieces
8d825152 863.\" significantly.
79d918c7 864.\"
f0a9e8f4 865The PI-futex operations described below differ from the other
4b35dc5d
TR
866futex operations in that they impose policy on the use of the value of the
867futex word:
79d918c7 868.IP * 3
4b35dc5d 869If the lock is not acquired, the futex word's value shall be 0.
79d918c7 870.IP *
4c8cb0ff
MK
871If the lock is acquired, the futex word's value shall
872be the thread ID (TID;
4b35dc5d 873see
79d918c7
MK
874.BR gettid (2))
875of the owning thread.
876.IP *
79d918c7
MK
877If the lock is owned and there are threads contending for the lock,
878then the
879.B FUTEX_WAITERS
4b35dc5d 880bit shall be set in the futex word's value; in other words, this value is:
efeece04 881.IP
79d918c7 882 FUTEX_WAITERS | TID
601399f3
MK
883.IP
884(Note that is invalid for a PI futex word to have no owner and
885.BR FUTEX_WAITERS
886set.)
79d918c7
MK
887.PP
888With this policy in place,
fd105614 889a user-space application can acquire an unacquired
601399f3 890lock or release a lock using atomic instructions executed in user mode
fd105614 891(e.g., a compare-and-swap operation such as
b52e1cd4
MK
892.I cmpxchg
893on the x86 architecture).
4c8cb0ff
MK
894Acquiring a lock simply consists of using compare-and-swap to atomically
895set the futex word's value to the caller's TID if its previous value was 0.
4b35dc5d
TR
896Releasing a lock requires using compare-and-swap to set the futex word's
897value to 0 if the previous value was the expected TID.
efeece04 898.PP
4b35dc5d 899If a futex is already acquired (i.e., has a nonzero value),
b52e1cd4 900waiters must employ the
79d918c7
MK
901.B FUTEX_LOCK_PI
902operation to acquire the lock.
4b35dc5d 903If other threads are waiting for the lock, then the
79d918c7 904.B FUTEX_WAITERS
4c8cb0ff
MK
905bit is set in the futex value;
906in this case, the lock owner must employ the
79d918c7 907.B FUTEX_UNLOCK_PI
b52e1cd4 908operation to release the lock.
efeece04 909.PP
79d918c7
MK
910In the cases where callers are forced into the kernel
911(i.e., required to perform a
912.BR futex ()
0c3ec26b 913call),
79d918c7
MK
914they then deal directly with a so-called RT-mutex,
915a kernel locking mechanism which implements the required
916priority-inheritance semantics.
917After the RT-mutex is acquired, the futex value is updated accordingly,
918before the calling thread returns to user space.
efeece04 919.PP
a59fca75 920It is important to note
ac894879 921.\" tglx (July 2015):
30239c10
MK
922.\" If there are multiple waiters on a pi futex then a wake pi operation
923.\" will wake the first waiter and hand over the lock to this waiter. This
924.\" includes handing over the rtmutex which represents the futex in the
925.\" kernel. The strict requirement is that the futex owner and the rtmutex
926.\" owner must be the same, except for the update period which is
927.\" serialized by the futex internal locking. That means the kernel must
1d09c150 928.\" update the user-space value prior to returning to user space
4b35dc5d 929that the kernel will update the futex word's value prior
79d918c7 930to returning to user space.
601399f3
MK
931(This prevents the possibility of the futex word's value ending
932up in an invalid state, such as having an owner but the value being 0,
933or having waiters but not having the
934.B FUTEX_WAITERS
935bit set.)
efeece04 936.PP
30239c10
MK
937If a futex has an associated RT-mutex in the kernel
938(i.e., there are blocked waiters)
939and the owner of the futex/RT-mutex dies unexpectedly,
940then the kernel cleans up the RT-mutex and hands it over to the next waiter.
941This in turn requires that the user-space value is updated accordingly.
942To indicate that this is required, the kernel sets the
943.B FUTEX_OWNER_DIED
944bit in the futex word along with the thread ID of the new owner.
8adaf0a7
MK
945User space can detect this situation via the presence of the
946.B FUTEX_OWNER_DIED
947bit and is then responsible for cleaning up the stale state left over by
1d09c150 948the dead owner.
30239c10
MK
949.\" tglx (July 2015):
950.\" The FUTEX_OWNER_DIED bit can also be set on uncontended futexes, where
951.\" the kernel has no state associated. This happens via the robust futex
952.\" mechanism. In that case the futex value will be set to
953.\" FUTEX_OWNER_DIED. The robust futex mechanism is also available for non
954.\" PI futexes.
efeece04 955.PP
601399f3
MK
956PI futexes are operated on by specifying one of the values listed below in
957.IR futex_op .
958Note that the PI futex operations must be used as paired operations
959and are subject to some additional requirements:
960.IP * 3
961.B FUTEX_LOCK_PI
962and
963.B FUTEX_TRYLOCK_PI
964pair with
d8012462 965.BR FUTEX_UNLOCK_PI .
601399f3
MK
966.B FUTEX_UNLOCK_PI
967must be called only on a futex owned by the calling thread,
968as defined by the value policy, otherwise the error
969.B EPERM
970results.
971.IP *
972.B FUTEX_WAIT_REQUEUE_PI
973pairs with
974.BR FUTEX_CMP_REQUEUE_PI .
975This must be performed from a non-PI futex to a distinct PI futex
976(or the error
977.B EINVAL
978results).
979Additionally,
980.I val
981(the number of waiters to be woken) must be 1
982(or the error
983.B EINVAL
984results).
11ac5b51 985.PP
601399f3 986The PI futex operations are as follows:
70b06b90
MK
987.\"
988.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
989.\"
d67e21f5
MK
990.TP
991.BR FUTEX_LOCK_PI " (since Linux 2.6.18)"
992.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
bc54ed38 993This operation is used after an attempt to acquire
fd105614 994the lock via an atomic user-mode instruction failed
4b35dc5d 995because the futex word has a nonzero value\(emspecifically,
8297383e 996because it contained the (PID-namespace-specific) TID of the lock owner.
efeece04 997.IP
4b35dc5d 998The operation checks the value of the futex word at the address
67833bec 999.IR uaddr .
70b06b90
MK
1000If the value is 0, then the kernel tries to atomically set
1001the futex value to the caller's TID.
c3875d1d 1002If the futex word's value is nonzero,
67833bec 1003the kernel atomically sets the
e0547e70 1004.B FUTEX_WAITERS
67833bec
MK
1005bit, which signals the futex owner that it cannot unlock the futex in
1006user space atomically by setting the futex value to 0.
c3875d1d
MK
1007.\" tglx (July 2015):
1008.\" The operation here is similar to the FUTEX_WAIT logic. When the user
1009.\" space atomic acquire does not succeed because the futex value was non
1010.\" zero, then the waiter goes into the kernel, takes the kernel internal
1011.\" lock and retries the acquisition under the lock. If the acquisition
1012.\" does not succeed either, then it sets the FUTEX_WAITERS bit, to signal
1013.\" the lock owner that it needs to go into the kernel. Here is the pseudo
1014.\" code:
1015.\"
1016.\" lock(kernel_lock);
1017.\" retry:
9bfc9cb1 1018.\"
c3875d1d
MK
1019.\" /*
1020.\" * Owner might have unlocked in userspace before we
1021.\" * were able to set the waiter bit.
1022.\" */
1023.\" if (atomic_acquire(futex) == SUCCESS) {
1024.\" unlock(kernel_lock());
1025.\" return 0;
1026.\" }
1027.\"
1028.\" /*
1029.\" * Owner might have unlocked after the above atomic_acquire()
1030.\" * attempt.
1031.\" */
1032.\" if (atomic_set_waiters_bit(futex) != SUCCESS)
1033.\" goto retry;
1034.\"
1035.\" queue_waiter();
1036.\" unlock(kernel_lock);
1037.\" block();
1038.\"
1039After that, the kernel:
1040.RS
1041.IP 1. 3
1042Tries to find the thread which is associated with the owner TID.
1043.IP 2.
1044Creates or reuses kernel state on behalf of the owner.
1045(If this is the first waiter, there is no kernel state for this
1046futex, so kernel state is created by locking the RT-mutex
1047and the futex owner is made the owner of the RT-mutex.
1048If there are existing waiters, then the existing state is reused.)
1049.IP 3.
ca4e5b2b 1050Attaches the waiter to the futex
c3875d1d
MK
1051(i.e., the waiter is enqueued on the RT-mutex waiter list).
1052.RE
1053.IP
ac894879
MK
1054If more than one waiter exists,
1055the enqueueing of the waiter is in descending priority order.
1056(For information on priority ordering, see the discussion of the
1057.BR SCHED_DEADLINE ,
1058.BR SCHED_FIFO ,
1059and
1060.BR SCHED_RR
1061scheduling policies in
1062.BR sched (7).)
1063The owner inherits either the waiter's CPU bandwidth
1064(if the waiter is scheduled under the
1065.BR SCHED_DEADLINE
1066policy) or the waiter's priority (if the waiter is scheduled under the
1067.BR SCHED_RR
1068or
1069.BR SCHED_FIFO
1070policy).
1d09c150
MK
1071.\" August 2015:
1072.\" mtk: If the realm is restricted purely to SCHED_OTHER (SCHED_NORMAL)
1073.\" processes, does the nice value come into play also?
1074.\"
1075.\" tglx: No. SCHED_OTHER/NORMAL tasks are handled in FIFO order
c3875d1d 1076This inheritance follows the lock chain in the case of nested locking
ca4e5b2b
MK
1077.\" (i.e., task 1 blocks on lock A, held by task 2,
1078.\" while task 2 blocks on lock B, held by task 3)
c3875d1d 1079and performs deadlock detection.
efeece04 1080.IP
e0547e70
TG
1081The
1082.I timeout
9ce19cf1 1083argument provides a timeout for the lock attempt.
8064bfa5
MK
1084If
1085.I timeout
1086is not NULL, the structure it points to specifies
1087an absolute timeout, measured against the
9ce19cf1
MK
1088.BR CLOCK_REALTIME
1089clock.
c082f385
MK
1090.\" 2016-07-07 response from Thomas Gleixner on LKML:
1091.\" From: Thomas Gleixner <tglx@linutronix.de>
1092.\" Date: 6 July 2016 at 20:57
1093.\" Subject: Re: futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
2ae96e8a 1094.\"
c082f385
MK
1095.\" On Thu, 23 Jun 2016, Michael Kerrisk (man-pages) wrote:
1096.\" > On 06/23/2016 08:28 PM, Darren Hart wrote:
1097.\" > > And as a follow-on, what is the reason for FUTEX_LOCK_PI only using
1098.\" > > CLOCK_REALTIME? It seems reasonable to me that a user may want to wait a
1099.\" > > specific amount of time, regardless of wall time.
1100.\" >
1101.\" > Yes, that's another weird inconsistency.
2ae96e8a 1102.\"
c082f385
MK
1103.\" The reason is that phtread_mutex_timedlock() uses absolute timeouts based on
1104.\" CLOCK_REALTIME. glibc folks asked to make that the default behaviour back
1105.\" then when we added LOCK_PI.
9ce19cf1
MK
1106If
1107.I timeout
1108is NULL, the operation will block indefinitely.
efeece04 1109.IP
a449c634 1110The
e0547e70
TG
1111.IR uaddr2 ,
1112.IR val ,
1113and
1114.IR val3
a449c634 1115arguments are ignored.
67833bec 1116.\"
70b06b90
MK
1117.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1118.\"
d67e21f5 1119.TP
12fdbe23 1120.BR FUTEX_TRYLOCK_PI " (since Linux 2.6.18)"
d67e21f5 1121.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
3fbb1be1 1122This operation tries to acquire the lock at
12fdbe23 1123.IR uaddr .
c3875d1d
MK
1124It is invoked when a user-space atomic acquire did not
1125succeed because the futex word was not 0.
efeece04 1126.IP
8adaf0a7
MK
1127Because the kernel has access to more state information than user space,
1128acquisition of the lock might succeed if performed by the
1129kernel in cases where the futex word
1130(i.e., the state information accessible to use-space) contains stale state
c3875d1d
MK
1131.RB ( FUTEX_WAITERS
1132and/or
1133.BR FUTEX_OWNER_DIED ).
1134This can happen when the owner of the futex died.
1d09c150
MK
1135User space cannot handle this condition in a race-free manner,
1136but the kernel can fix this up and acquire the futex.
ee65b0e8
MK
1137.\" Paraphrasing a f2f conversation with Thomas Gleixner about the
1138.\" above point (Aug 2015): ###
1139.\" There is a rare possibility of a race condition involving an
1140.\" uncontended futex with no owner, but with waiters. The
1141.\" kernel-user-space contract is that if a futex is nonzero, you must
1142.\" go into kernel. The futex was owned by a task, and that task dies
1143.\" but there are no waiters, so the futex value is non zero.
1144.\" Therefore, the next locker has to go into the kernel,
1145.\" so that the kernel has a chance to clean up. (CMXCH on zero
1146.\" in user space would fail, so kernel has to clean up.)
8adaf0a7
MK
1147.\" Darren Hart (Oct 2015):
1148.\" The trylock in the kernel has more state, so it can independently
1149.\" verify the flags that userspace must trust implicitly.
efeece04 1150.IP
084744ef
MK
1151The
1152.IR uaddr2 ,
1153.IR val ,
1154.IR timeout ,
1155and
1156.IR val3
1157arguments are ignored.
70b06b90
MK
1158.\"
1159.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1160.\"
d67e21f5 1161.TP
12fdbe23 1162.BR FUTEX_UNLOCK_PI " (since Linux 2.6.18)"
d67e21f5 1163.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
d4ba4328 1164This operation wakes the top priority waiter that is waiting in
ecae2099
TG
1165.B FUTEX_LOCK_PI
1166on the futex address provided by the
1167.I uaddr
1168argument.
efeece04 1169.IP
1d09c150 1170This is called when the user-space value at
ecae2099
TG
1171.I uaddr
1172cannot be changed atomically from a TID (of the owner) to 0.
efeece04 1173.IP
ecae2099
TG
1174The
1175.IR uaddr2 ,
1176.IR val ,
1177.IR timeout ,
1178and
1179.IR val3
11a194bf 1180arguments are ignored.
70b06b90
MK
1181.\"
1182.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1183.\"
d67e21f5 1184.TP
d67e21f5
MK
1185.BR FUTEX_CMP_REQUEUE_PI " (since Linux 2.6.31)"
1186.\" commit 52400ba946759af28442dee6265c5c0180ac7122
f812a08b
DH
1187This operation is a PI-aware variant of
1188.BR FUTEX_CMP_REQUEUE .
1189It requeues waiters that are blocked via
1190.B FUTEX_WAIT_REQUEUE_PI
1191on
1192.I uaddr
1193from a non-PI source futex
1194.RI ( uaddr )
1195to a PI target futex
1196.RI ( uaddr2 ).
efeece04 1197.IP
9e54d26d
MK
1198As with
1199.BR FUTEX_CMP_REQUEUE ,
1200this operation wakes up a maximum of
1201.I val
1202waiters that are waiting on the futex at
1203.IR uaddr .
1204However, for
1205.BR FUTEX_CMP_REQUEUE_PI ,
1206.I val
6fbeb8f4 1207is required to be 1
939ca89f 1208(since the main point is to avoid a thundering herd).
9e54d26d
MK
1209The remaining waiters are removed from the wait queue of the source futex at
1210.I uaddr
1211and added to the wait queue of the target futex at
1212.IR uaddr2 .
efeece04 1213.IP
9e54d26d 1214The
768d3c23 1215.I val2
c6d8cf21
MK
1216.\" val2 is the cap on the number of requeued waiters.
1217.\" In the glibc pthread_cond_broadcast() implementation, this argument
1218.\" is specified as INT_MAX, and for pthread_cond_signal() it is 0.
9e54d26d 1219and
768d3c23 1220.I val3
9e54d26d
MK
1221arguments serve the same purposes as for
1222.BR FUTEX_CMP_REQUEUE .
70b06b90 1223.\"
8297383e 1224.\" The page at http://locklessinc.com/articles/futex_cheat_sheet/
be376673 1225.\" notes that "priority-inheritance Futex to priority-inheritance
8297383e
MK
1226.\" Futex requeues are currently unsupported". However, probably
1227.\" the page does not need to say nothing about this, since
1228.\" Thomas Gleixner commented (July 2015): "they never will be
1229.\" supported because they make no sense at all"
70b06b90
MK
1230.\"
1231.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1232.\"
d67e21f5
MK
1233.TP
1234.BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)"
1235.\" commit 52400ba946759af28442dee6265c5c0180ac7122
70b06b90 1236.\"
c3875d1d 1237Wait on a non-PI futex at
6ff1b4c0 1238.I uaddr
c3875d1d
MK
1239and potentially be requeued (via a
1240.BR FUTEX_CMP_REQUEUE_PI
1241operation in another task) onto a PI futex at
6ff1b4c0
TG
1242.IR uaddr2 .
1243The wait operation on
1244.I uaddr
c3875d1d 1245is the same as for
6ff1b4c0 1246.BR FUTEX_WAIT .
efeece04 1247.IP
6ff1b4c0
TG
1248The waiter can be removed from the wait on
1249.I uaddr
6ff1b4c0 1250without requeueing on
c3875d1d
MK
1251.IR uaddr2
1252via a
1d09c150 1253.BR FUTEX_WAKE
c3875d1d
MK
1254operation in another task.
1255In this case, the
1256.BR FUTEX_WAIT_REQUEUE_PI
3fbb1be1
MK
1257operation fails with the error
1258.BR EAGAIN .
efeece04 1259.IP
63bea7dc
MK
1260If
1261.I timeout
8064bfa5
MK
1262is not NULL, the structure it points to specifies
1263an absolute timeout for the wait operation.
63bea7dc
MK
1264If
1265.I timeout
1266is NULL, the operation can block indefinitely.
efeece04 1267.IP
a4e69912
MK
1268The
1269.I val3
1270argument is ignored.
efeece04 1271.IP
abb571e8
MK
1272The
1273.BR FUTEX_WAIT_REQUEUE_PI
1274and
1275.BR FUTEX_CMP_REQUEUE_PI
1276were added to support a fairly specific use case:
1277support for priority-inheritance-aware POSIX threads condition variables.
1278The idea is that these operations should always be paired,
1279in order to ensure that user space and the kernel remain in sync.
1280Thus, in the
1281.BR FUTEX_WAIT_REQUEUE_PI
1282operation, the user-space application pre-specifies the target
1283of the requeue that takes place in the
1284.BR FUTEX_CMP_REQUEUE_PI
1285operation.
1286.\"
1287.\" Darren Hart notes that a patch to allow glibc to fully support
1af427a4 1288.\" PI-aware pthreads condition variables has not yet been accepted into
abb571e8
MK
1289.\" glibc. The story is complex, and can be found at
1290.\" https://sourceware.org/bugzilla/show_bug.cgi?id=11588
1291.\" Darren notes that in the meantime, the patch is shipped with various
1af427a4 1292.\" PREEMPT_RT-enabled Linux systems.
abb571e8
MK
1293.\"
1294.\" Related to the preceding, Darren proposed that somewhere, man-pages
1295.\" should document the following point:
1af427a4 1296.\"
4c8cb0ff
MK
1297.\" While the Linux kernel, since 2.6.31, supports requeueing of
1298.\" priority-inheritance (PI) aware mutexes via the
1299.\" FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI futex operations,
1300.\" the glibc implementation does not yet take full advantage of this.
1301.\" Specifically, the condvar internal data lock remains a non-PI aware
1302.\" mutex, regardless of the type of the pthread_mutex associated with
1303.\" the condvar. This can lead to an unbounded priority inversion on
1304.\" the internal data lock even when associating a PI aware
1305.\" pthread_mutex with a condvar during a pthread_cond*_wait
1306.\" operation. For this reason, it is not recommended to rely on
1307.\" priority inheritance when using pthread condition variables.
1af427a4
MK
1308.\"
1309.\" The problem is that the obvious location for this text is
1310.\" the pthread_cond*wait(3) man page. However, such a man page
abb571e8 1311.\" does not currently exist.
70b06b90 1312.\"
6700de24 1313.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
70b06b90 1314.\"
47297adb 1315.SH RETURN VALUE
a5c5a06a
MK
1316In the event of an error (and assuming that
1317.BR futex ()
1318was invoked via
1319.BR syscall (2)),
1320all operations return \-1 and set
e808bba0 1321.I errno
6f147f79 1322to indicate the cause of the error.
efeece04 1323.PP
e808bba0
MK
1324The return value on success depends on the operation,
1325as described in the following list:
fea681da
MK
1326.TP
1327.B FUTEX_WAIT
077981d4 1328Returns 0 if the caller was woken up.
4c8cb0ff
MK
1329Note that a wake-up can also be caused by common futex usage patterns
1330in unrelated code that happened to have previously used the futex word's
1331memory location (e.g., typical futex-based implementations of
1332Pthreads mutexes can cause this under some conditions).
1333Therefore, callers should always conservatively assume that a return
1334value of 0 can mean a spurious wake-up, and use the futex word's value
bc54ed38
MK
1335(i.e., the user-space synchronization scheme)
1336to decide whether to continue to block or not.
fea681da
MK
1337.TP
1338.B FUTEX_WAKE
bdc5957a 1339Returns the number of waiters that were woken up.
fea681da
MK
1340.TP
1341.B FUTEX_FD
1342Returns the new file descriptor associated with the futex.
1343.TP
1344.B FUTEX_REQUEUE
bdc5957a 1345Returns the number of waiters that were woken up.
fea681da
MK
1346.TP
1347.B FUTEX_CMP_REQUEUE
bdc5957a 1348Returns the total number of waiters that were woken up or
4b35dc5d 1349requeued to the futex for the futex word at
3dfcc11d
MK
1350.IR uaddr2 .
1351If this value is greater than
1352.IR val ,
fd105614 1353then the difference is the number of waiters requeued to the futex for the
4c8cb0ff 1354futex word at
3dfcc11d 1355.IR uaddr2 .
dcad19c0
MK
1356.TP
1357.B FUTEX_WAKE_OP
a8b5b324 1358Returns the total number of waiters that were woken up.
4c8cb0ff
MK
1359This is the sum of the woken waiters on the two futexes for
1360the futex words at
a8b5b324
MK
1361.I uaddr
1362and
1363.IR uaddr2 .
dcad19c0
MK
1364.TP
1365.B FUTEX_WAIT_BITSET
077981d4
MK
1366Returns 0 if the caller was woken up.
1367See
4b35dc5d
TR
1368.B FUTEX_WAIT
1369for how to interpret this correctly in practice.
dcad19c0
MK
1370.TP
1371.B FUTEX_WAKE_BITSET
bdc5957a 1372Returns the number of waiters that were woken up.
dcad19c0
MK
1373.TP
1374.B FUTEX_LOCK_PI
bf02a260 1375Returns 0 if the futex was successfully locked.
dcad19c0
MK
1376.TP
1377.B FUTEX_TRYLOCK_PI
5c716eef 1378Returns 0 if the futex was successfully locked.
dcad19c0
MK
1379.TP
1380.B FUTEX_UNLOCK_PI
52bb928f 1381Returns 0 if the futex was successfully unlocked.
dcad19c0
MK
1382.TP
1383.B FUTEX_CMP_REQUEUE_PI
bdc5957a 1384Returns the total number of waiters that were woken up or
4b35dc5d 1385requeued to the futex for the futex word at
dddd395a
MK
1386.IR uaddr2 .
1387If this value is greater than
1388.IR val ,
4c8cb0ff
MK
1389then difference is the number of waiters requeued to the futex for
1390the futex word at
dddd395a 1391.IR uaddr2 .
dcad19c0
MK
1392.TP
1393.B FUTEX_WAIT_REQUEUE_PI
4c8cb0ff
MK
1394Returns 0 if the caller was successfully requeued to the futex for
1395the futex word at
22c15de9 1396.IR uaddr2 .
70b06b90
MK
1397.\"
1398.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1399.\"
fea681da
MK
1400.SH ERRORS
1401.TP
1402.B EACCES
4b35dc5d 1403No read access to the memory of a futex word.
fea681da
MK
1404.TP
1405.B EAGAIN
f48516d1 1406.RB ( FUTEX_WAIT ,
4b35dc5d 1407.BR FUTEX_WAIT_BITSET ,
f48516d1 1408.BR FUTEX_WAIT_REQUEUE_PI )
badbf70c
MK
1409The value pointed to by
1410.I uaddr
1411was not equal to the expected value
1412.I val
1413at the time of the call.
efeece04 1414.IP
9732dd8b
MK
1415.BR Note :
1416on Linux, the symbolic names
1417.B EAGAIN
1418and
1419.B EWOULDBLOCK
77da5feb 1420(both of which appear in different parts of the kernel futex code)
9732dd8b 1421have the same value.
badbf70c
MK
1422.TP
1423.B EAGAIN
8f2068bb
MK
1424.RB ( FUTEX_CMP_REQUEUE ,
1425.BR FUTEX_CMP_REQUEUE_PI )
ce5602fd 1426The value pointed to by
9f6c40c0
МК
1427.I uaddr
1428is not equal to the expected value
1429.IR val3 .
fea681da 1430.TP
5662f56a
MK
1431.BR EAGAIN
1432.RB ( FUTEX_LOCK_PI ,
aaec9032
MK
1433.BR FUTEX_TRYLOCK_PI ,
1434.BR FUTEX_CMP_REQUEUE_PI )
1435The futex owner thread ID of
1436.I uaddr
1437(for
1438.BR FUTEX_CMP_REQUEUE_PI :
1439.IR uaddr2 )
1440is about to exit,
5662f56a
MK
1441but has not yet handled the internal state cleanup.
1442Try again.
1443.TP
7a39e745
MK
1444.BR EDEADLK
1445.RB ( FUTEX_LOCK_PI ,
9732dd8b
MK
1446.BR FUTEX_TRYLOCK_PI ,
1447.BR FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1448The futex word at
7a39e745
MK
1449.I uaddr
1450is already locked by the caller.
1451.TP
662c0da8 1452.BR EDEADLK
c3875d1d 1453.\" FIXME . I see that kernel/locking/rtmutex.c uses EDEADLK in some
d6bb5a38 1454.\" places, and EDEADLOCK in others. On almost all architectures
4c8cb0ff
MK
1455.\" these constants are synonymous. Is there a reason that both
1456.\" names are used?
8297383e
MK
1457.\"
1458.\" tglx (July 2015): "No. We should probably fix that."
1459.\"
662c0da8 1460.RB ( FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1461While requeueing a waiter to the PI futex for the futex word at
662c0da8
MK
1462.IR uaddr2 ,
1463the kernel detected a deadlock.
1464.TP
fea681da 1465.B EFAULT
1ea901e8
MK
1466A required pointer argument (i.e.,
1467.IR uaddr ,
1468.IR uaddr2 ,
1469or
1470.IR timeout )
496df304 1471did not point to a valid user-space address.
fea681da 1472.TP
9f6c40c0 1473.B EINTR
e808bba0 1474A
9f6c40c0 1475.B FUTEX_WAIT
2674f781
MK
1476or
1477.B FUTEX_WAIT_BITSET
e808bba0 1478operation was interrupted by a signal (see
f529fd20
MK
1479.BR signal (7)).
1480In kernels before Linux 2.6.22, this error could also be returned for
b5fff4ea 1481a spurious wakeup; since Linux 2.6.22, this no longer happens.
9f6c40c0 1482.TP
fea681da 1483.B EINVAL
180f97b7
MK
1484The operation in
1485.IR futex_op
1486is one of those that employs a timeout, but the supplied
fb2f4c27
MK
1487.I timeout
1488argument was invalid
1489.RI ( tv_sec
1490was less than zero, or
1491.IR tv_nsec
cabee29d 1492was not less than 1,000,000,000).
fb2f4c27
MK
1493.TP
1494.B EINVAL
0c74df0b 1495The operation specified in
025e1374 1496.IR futex_op
0c74df0b 1497employs one or both of the pointers
51ee94be 1498.I uaddr
a1f47699 1499and
0c74df0b
MK
1500.IR uaddr2 ,
1501but one of these does not point to a valid object\(emthat is,
1502the address is not four-byte-aligned.
51ee94be
MK
1503.TP
1504.B EINVAL
55cc422d
TG
1505.RB ( FUTEX_WAIT_BITSET ,
1506.BR FUTEX_WAKE_BITSET )
5e1456d4 1507The bit mask supplied in
79c9b436
TG
1508.IR val3
1509is zero.
1510.TP
1511.B EINVAL
2abcba67 1512.RB ( FUTEX_CMP_REQUEUE_PI )
add875c0
MK
1513.I uaddr
1514equals
1515.IR uaddr2
1516(i.e., an attempt was made to requeue to the same futex).
1517.TP
ff597681
MK
1518.BR EINVAL
1519.RB ( FUTEX_FD )
1520The signal number supplied in
1521.I val
1522is invalid.
1523.TP
6bac3b85 1524.B EINVAL
476debd7
MK
1525.RB ( FUTEX_WAKE ,
1526.BR FUTEX_WAKE_OP ,
1527.BR FUTEX_WAKE_BITSET ,
1528.BR FUTEX_REQUEUE ,
1529.BR FUTEX_CMP_REQUEUE )
1530The kernel detected an inconsistency between the user-space state at
1531.I uaddr
1532and the kernel state\(emthat is, it detected a waiter which waits in
1533.BR FUTEX_LOCK_PI
1534on
1535.IR uaddr .
1536.TP
1537.B EINVAL
a218ef20 1538.RB ( FUTEX_LOCK_PI ,
ce022f18
MK
1539.BR FUTEX_TRYLOCK_PI ,
1540.BR FUTEX_UNLOCK_PI )
a218ef20
MK
1541The kernel detected an inconsistency between the user-space state at
1542.I uaddr
1543and the kernel state.
ce022f18 1544This indicates either state corruption
ce022f18 1545or that the kernel found a waiter on
a218ef20
MK
1546.I uaddr
1547which is waiting via
1548.BR FUTEX_WAIT
1549or
1550.BR FUTEX_WAIT_BITSET .
1551.TP
1552.B EINVAL
f9250b1a
MK
1553.RB ( FUTEX_CMP_REQUEUE_PI )
1554The kernel detected an inconsistency between the user-space state at
99c0041d
MK
1555.I uaddr2
1556and the kernel state;
ee65b0e8
MK
1557.\" From a conversation with Thomas Gleixner (Aug 2015): ###
1558.\" The kernel sees: I have non PI state for a futex you tried to
1559.\" tell me was PI
99c0041d
MK
1560that is, the kernel detected a waiter which waits via
1561.BR FUTEX_WAIT
8297383e
MK
1562or
1563.BR FUTEX_WAIT_BITSET
99c0041d
MK
1564on
1565.IR uaddr2 .
1566.TP
1567.B EINVAL
1568.RB ( FUTEX_CMP_REQUEUE_PI )
1569The kernel detected an inconsistency between the user-space state at
f9250b1a
MK
1570.I uaddr
1571and the kernel state;
1572that is, the kernel detected a waiter which waits via
75299c8d 1573.BR FUTEX_WAIT
99c0041d 1574or
75299c8d 1575.BR FUTEX_WAIT_BITESET
f9250b1a
MK
1576on
1577.IR uaddr .
1578.TP
1579.B EINVAL
99c0041d 1580.RB ( FUTEX_CMP_REQUEUE_PI )
75299c8d
MK
1581The kernel detected an inconsistency between the user-space state at
1582.I uaddr
1583and the kernel state;
1584that is, the kernel detected a waiter which waits on
1585.I uaddr
1586via
1587.BR FUTEX_LOCK_PI
1588(instead of
1589.BR FUTEX_WAIT_REQUEUE_PI ).
99c0041d
MK
1590.TP
1591.B EINVAL
9786b3ca 1592.RB ( FUTEX_CMP_REQUEUE_PI )
8297383e
MK
1593.\" This deals with the case:
1594.\" wait_requeue_pi(A, B);
1595.\" requeue_pi(A, C);
9786b3ca
MK
1596An attempt was made to requeue a waiter to a futex other than that
1597specified by the matching
1598.B FUTEX_WAIT_REQUEUE_PI
1599call for that waiter.
1600.TP
1601.B EINVAL
f0c0d61c
MK
1602.RB ( FUTEX_CMP_REQUEUE_PI )
1603The
1604.I val
1605argument is not 1.
1606.TP
1607.B EINVAL
4832b48a 1608Invalid argument.
fea681da 1609.TP
d07d4ef3
MK
1610.B ENFILE
1611.RB ( FUTEX_FD )
1612The system-wide limit on the total number of open files has been reached.
1613.TP
a449c634
MK
1614.BR ENOMEM
1615.RB ( FUTEX_LOCK_PI ,
e34a8fb6
MK
1616.BR FUTEX_TRYLOCK_PI ,
1617.BR FUTEX_CMP_REQUEUE_PI )
a449c634
MK
1618The kernel could not allocate memory to hold state information.
1619.TP
4701fc28
MK
1620.B ENOSYS
1621Invalid operation specified in
d33602c4 1622.IR futex_op .
9f6c40c0 1623.TP
4a7e5b05
MK
1624.B ENOSYS
1625The
1626.BR FUTEX_CLOCK_REALTIME
1627option was specified in
1afcee7c 1628.IR futex_op ,
4a7e5b05 1629but the accompanying operation was neither
017d194b
MK
1630.BR FUTEX_WAIT ,
1631.BR FUTEX_WAIT_BITSET ,
4a7e5b05
MK
1632nor
1633.BR FUTEX_WAIT_REQUEUE_PI .
1634.TP
a9dcb4d1
MK
1635.BR ENOSYS
1636.RB ( FUTEX_LOCK_PI ,
f2424fae 1637.BR FUTEX_TRYLOCK_PI ,
4945ff19 1638.BR FUTEX_UNLOCK_PI ,
4cf92894 1639.BR FUTEX_CMP_REQUEUE_PI ,
794bb106 1640.BR FUTEX_WAIT_REQUEUE_PI )
4b35dc5d 1641A run-time check determined that the operation is not available.
f0a9e8f4 1642The PI-futex operations are not implemented on all architectures and
077981d4 1643are not supported on some CPU variants.
a9dcb4d1 1644.TP
c7589177
MK
1645.BR EPERM
1646.RB ( FUTEX_LOCK_PI ,
dc2742a8
MK
1647.BR FUTEX_TRYLOCK_PI ,
1648.BR FUTEX_CMP_REQUEUE_PI )
04331c3f 1649The caller is not allowed to attach itself to the futex at
dc2742a8
MK
1650.I uaddr
1651(for
1652.BR FUTEX_CMP_REQUEUE_PI :
1653the futex at
1654.IR uaddr2 ).
c7589177
MK
1655(This may be caused by a state corruption in user space.)
1656.TP
76f347ba 1657.BR EPERM
87276709 1658.RB ( FUTEX_UNLOCK_PI )
4b35dc5d 1659The caller does not own the lock represented by the futex word.
76f347ba 1660.TP
0b0e4934
MK
1661.BR ESRCH
1662.RB ( FUTEX_LOCK_PI ,
9732dd8b
MK
1663.BR FUTEX_TRYLOCK_PI ,
1664.BR FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1665The thread ID in the futex word at
0b0e4934
MK
1666.I uaddr
1667does not exist.
1668.TP
360f773c
MK
1669.BR ESRCH
1670.RB ( FUTEX_CMP_REQUEUE_PI )
4b35dc5d 1671The thread ID in the futex word at
360f773c
MK
1672.I uaddr2
1673does not exist.
1674.TP
9f6c40c0 1675.B ETIMEDOUT
4d85047f
MK
1676The operation in
1677.IR futex_op
1678employed the timeout specified in
1679.IR timeout ,
1680and the timeout expired before the operation completed.
70b06b90
MK
1681.\"
1682.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1683.\"
47297adb 1684.SH VERSIONS
81c9d87e
MK
1685Futexes were first made available in a stable kernel release
1686with Linux 2.6.0.
efeece04 1687.PP
4c8cb0ff
MK
1688Initial futex support was merged in Linux 2.5.7 but with different
1689semantics from what was described above.
52dee70e 1690A four-argument system call with the semantics
fd3fa7ef 1691described in this page was introduced in Linux 2.5.40.
d0442d14
MK
1692A fifth argument was added in Linux 2.5.70,
1693and a sixth argument was added in Linux 2.6.7.
47297adb 1694.SH CONFORMING TO
8382f16d 1695This system call is Linux-specific.
47297adb 1696.SH NOTES
baf0f1f4
MK
1697Glibc does not provide a wrapper for this system call; call it using
1698.BR syscall (2).
efeece04 1699.PP
02f7b623 1700Several higher-level programming abstractions are implemented via futexes,
e24fbf10 1701including POSIX semaphores and
02f7b623
MK
1702various POSIX threads synchronization mechanisms
1703(mutexes, condition variables, read-write locks, and barriers).
74f58a64
MK
1704.\" TODO FIXME(Torvald) Above, we cite this section and claim it contains
1705.\" details on the synchronization semantics; add the C11 equivalents
1706.\" here (or whatever we find consensus for).
305cc415
MK
1707.\"
1708.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1709.\"
a14af333 1710.SH EXAMPLES
bc54ed38
MK
1711The program below demonstrates use of futexes in a program where a parent
1712process and a child process use a pair of futexes located inside a
305cc415
MK
1713shared anonymous mapping to synchronize access to a shared resource:
1714the terminal.
1715The two processes each write
1716.IR nloops
1717(a command-line argument that defaults to 5 if omitted)
1718messages to the terminal and employ a synchronization protocol
1719that ensures that they alternate in writing messages.
1720Upon running this program we see output such as the following:
efeece04 1721.PP
305cc415 1722.in +4n
b76974c1 1723.EX
305cc415
MK
1724$ \fB./futex_demo\fP
1725Parent (18534) 0
1726Child (18535) 0
1727Parent (18534) 1
1728Child (18535) 1
1729Parent (18534) 2
1730Child (18535) 2
1731Parent (18534) 3
1732Child (18535) 3
1733Parent (18534) 4
1734Child (18535) 4
b76974c1 1735.EE
305cc415
MK
1736.in
1737.SS Program source
1738\&
e7d0bb47 1739.EX
305cc415
MK
1740/* futex_demo.c
1741
1742 Usage: futex_demo [nloops]
1743 (Default: 5)
1744
1745 Demonstrate the use of futexes in a program where parent and child
1746 use a pair of futexes located inside a shared anonymous mapping to
1747 synchronize access to a shared resource: the terminal. The two
1748 processes each write \(aqnum\-loops\(aq messages to the terminal and employ
1749 a synchronization protocol that ensures that they alternate in
1750 writing messages.
1751*/
1752#define _GNU_SOURCE
1753#include <stdio.h>
1754#include <errno.h>
915c4ba3 1755#include <stdatomic.h>
305cc415
MK
1756#include <stdlib.h>
1757#include <unistd.h>
1758#include <sys/wait.h>
1759#include <sys/mman.h>
1760#include <sys/syscall.h>
1761#include <linux/futex.h>
1762#include <sys/time.h>
1763
d1a71985 1764#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \e
305cc415
MK
1765 } while (0)
1766
1767static int *futex1, *futex2, *iaddr;
1768
1769static int
1770futex(int *uaddr, int futex_op, int val,
1771 const struct timespec *timeout, int *uaddr2, int val3)
1772{
1773 return syscall(SYS_futex, uaddr, futex_op, val,
c1e04f01 1774 timeout, uaddr2, val3);
305cc415
MK
1775}
1776
1777/* Acquire the futex pointed to by \(aqfutexp\(aq: wait for its value to
1778 become 1, and then set the value to 0. */
1779
1780static void
1781fwait(int *futexp)
1782{
1783 int s;
1784
915c4ba3
BP
1785 /* atomic_compare_exchange_strong(ptr, oldval, newval)
1786 atomically performs the equivalent of:
305cc415 1787
915c4ba3 1788 if (*ptr == *oldval)
305cc415
MK
1789 *ptr = newval;
1790
915c4ba3 1791 It returns true if the test yielded true and *ptr was updated. */
305cc415 1792
305cc415 1793 while (1) {
83e80dda 1794
63ad44cb 1795 /* Is the futex available? */
09e456c2
PP
1796 const int one = 1;
1797 if (atomic_compare_exchange_strong(futexp, &one, 0))
305cc415
MK
1798 break; /* Yes */
1799
63ad44cb 1800 /* Futex is not available; wait */
83e80dda 1801
63ad44cb
HS
1802 s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0);
1803 if (s == \-1 && errno != EAGAIN)
1804 errExit("futex\-FUTEX_WAIT");
305cc415
MK
1805 }
1806}
1807
1808/* Release the futex pointed to by \(aqfutexp\(aq: if the futex currently
1809 has the value 0, set its value to 1 and the wake any futex waiters,
1810 so that if the peer is blocked in fpost(), it can proceed. */
1811
1812static void
1813fpost(int *futexp)
1814{
1815 int s;
1816
915c4ba3 1817 /* atomic_compare_exchange_strong() was described in comments above */
305cc415 1818
09e456c2
PP
1819 const int zero = 0;
1820 if (atomic_compare_exchange_strong(futexp, &zero, 1)) {
305cc415
MK
1821 s = futex(futexp, FUTEX_WAKE, 1, NULL, NULL, 0);
1822 if (s == \-1)
1823 errExit("futex\-FUTEX_WAKE");
1824 }
1825}
1826
1827int
1828main(int argc, char *argv[])
1829{
1830 pid_t childPid;
88893a77 1831 int nloops;
305cc415
MK
1832
1833 setbuf(stdout, NULL);
1834
1835 nloops = (argc > 1) ? atoi(argv[1]) : 5;
1836
1837 /* Create a shared anonymous mapping that will hold the futexes.
1838 Since the futexes are being shared between processes, we
1839 subsequently use the "shared" futex operations (i.e., not the
1840 ones suffixed "_PRIVATE") */
1841
d60a7a9a 1842 iaddr = mmap(NULL, sizeof(*iaddr) * 2, PROT_READ | PROT_WRITE,
305cc415
MK
1843 MAP_ANONYMOUS | MAP_SHARED, \-1, 0);
1844 if (iaddr == MAP_FAILED)
1845 errExit("mmap");
1846
1847 futex1 = &iaddr[0];
1848 futex2 = &iaddr[1];
1849
1850 *futex1 = 0; /* State: unavailable */
1851 *futex2 = 1; /* State: available */
1852
1853 /* Create a child process that inherits the shared anonymous
35764662 1854 mapping */
305cc415
MK
1855
1856 childPid = fork();
92a46690 1857 if (childPid == \-1)
305cc415
MK
1858 errExit("fork");
1859
1860 if (childPid == 0) { /* Child */
88893a77 1861 for (int j = 0; j < nloops; j++) {
305cc415 1862 fwait(futex1);
d1a71985 1863 printf("Child (%ld) %d\en", (long) getpid(), j);
305cc415
MK
1864 fpost(futex2);
1865 }
1866
1867 exit(EXIT_SUCCESS);
1868 }
1869
1870 /* Parent falls through to here */
1871
88893a77 1872 for (int j = 0; j < nloops; j++) {
305cc415 1873 fwait(futex2);
d1a71985 1874 printf("Parent (%ld) %d\en", (long) getpid(), j);
305cc415
MK
1875 fpost(futex1);
1876 }
1877
1878 wait(NULL);
1879
1880 exit(EXIT_SUCCESS);
1881}
e7d0bb47 1882.EE
47297adb 1883.SH SEE ALSO
4c222281 1884.ad l
9913033c 1885.BR get_robust_list (2),
d806bc05 1886.BR restart_syscall (2),
e0074751 1887.BR pthread_mutexattr_getprotocol (3),
ac894879
MK
1888.BR futex (7),
1889.BR sched (7)
fea681da 1890.PP
f5ad572f
MK
1891The following kernel source files:
1892.IP * 2
1893.I Documentation/pi-futex.txt
1894.IP *
1895.I Documentation/futex-requeue-pi.txt
1896.IP *
1897.I Documentation/locking/rt-mutex.txt
1898.IP *
1899.I Documentation/locking/rt-mutex-design.txt
8fe019c7
MK
1900.IP *
1901.I Documentation/robust-futex-ABI.txt
43b99089 1902.PP
4c222281 1903Franke, H., Russell, R., and Kirwood, M., 2002.
52087dd3 1904\fIFuss, Futexes and Furwocks: Fast Userlevel Locking in Linux\fP
4c222281 1905(from proceedings of the Ottawa Linux Symposium 2002),
9b936e9e 1906.br
5465ae95 1907.UR http://kernel.org\:/doc\:/ols\:/2002\:/ols2002\-pages\-479\-495.pdf
608bf950 1908.UE
efeece04 1909.PP
4c222281 1910Hart, D., 2009. \fIA futex overview and update\fP,
2ed26199
MK
1911.UR http://lwn.net/Articles/360699/
1912.UE
efeece04 1913.PP
8fb01fde 1914Hart, D.\& and Guniguntala, D., 2009.
0483b6cc 1915\fIRequeue-PI: Making Glibc Condvars PI-Aware\fP
4c222281 1916(from proceedings of the 2009 Real-Time Linux Workshop),
0483b6cc
MK
1917.UR http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
1918.UE
efeece04 1919.PP
4c222281 1920Drepper, U., 2011. \fIFutexes Are Tricky\fP,
f42eb21b
MK
1921.UR http://www.akkadia.org/drepper/futex.pdf
1922.UE
9b936e9e
MK
1923.PP
1924Futex example library, futex-*.tar.bz2 at
1925.br
a605264d 1926.UR ftp://ftp.kernel.org\:/pub\:/linux\:/kernel\:/people\:/rusty/
608bf950 1927.UE
34f14794 1928.\"
74f58a64 1929.\" FIXME(Torvald) We should probably refer to the glibc code here, in
9915ea23
MK
1930.\" particular the glibc-internal futex wrapper functions that are
1931.\" WIP, and the generic pthread_mutex_t and perhaps condvar
1932.\" implementations.