]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/futex.2
futex.2: Add brief description of the priority inversion problem
[thirdparty/man-pages.git] / man2 / futex.2
CommitLineData
8f0aff2a 1.\" Page by b.hubert
1abce893
MK
2.\" and Copyright (C) 2015, Thomas Gleixner <tglx@linutronix.de>
3.\" and Copyright (C) 2015, Michael Kerrisk <mtk.manpages@gmail.com>
2297bf0e 4.\"
2e46a6e7 5.\" %%%LICENSE_START(FREELY_REDISTRIBUTABLE)
8f0aff2a 6.\" may be freely modified and distributed
8ff7380d 7.\" %%%LICENSE_END
fea681da
MK
8.\"
9.\" Niki A. Rahimi (LTC Security Development, narahimi@us.ibm.com)
10.\" added ERRORS section.
11.\"
12.\" Modified 2004-06-17 mtk
13.\" Modified 2004-10-07 aeb, added FUTEX_REQUEUE, FUTEX_CMP_REQUEUE
14.\"
3d155313 15.TH FUTEX 2 2014-05-21 "Linux" "Linux Programmer's Manual"
fea681da 16.SH NAME
ce154705 17futex \- fast user-space locking
fea681da 18.SH SYNOPSIS
9d9dc1e8 19.nf
fea681da
MK
20.sp
21.B "#include <linux/futex.h>"
fea681da
MK
22.B "#include <sys/time.h>"
23.sp
d33602c4 24.BI "int futex(int *" uaddr ", int " futex_op ", int " val ,
768d3c23
MK
25.BI " const struct timespec *" timeout , \
26" \fR /* or: \fBu32 \fIval2\fP */
9d9dc1e8 27.BI " int *" uaddr2 ", int " val3 );
9d9dc1e8 28.fi
409f08b0 29
b939d6e4
MK
30.IR Note :
31There is no glibc wrapper for this system call; see NOTES.
47297adb 32.SH DESCRIPTION
fea681da
MK
33.PP
34The
e511ffb6 35.BR futex ()
fea681da
MK
36system call provides a method for
37a program to wait for a value at a given address to change, and a
f19904c0
MK
38method to wake up anyone waiting on a particular address.
39(While the
40virtual addresses for the same memory in separate processes may not be
41equal, the kernel maps them internally so that the same memory mapped in
fea681da 42different locations will correspond for
e511ffb6 43.BR futex ()
f19904c0 44calls.)
fd3fa7ef 45This system call is typically used to
fea681da
MK
46implement the contended case of a lock in shared memory, as
47described in
a8bda636 48.BR futex (7).
fea681da 49.PP
f388ba70
MK
50When a futex operation did not finish uncontended in user space, a
51.BR futex ()
52call needs to be made to the kernel to arbitrate.
c13182ef 53Arbitration can either mean putting the calling
fea681da
MK
54process to sleep or, conversely, waking a waiting process.
55.PP
f388ba70
MK
56Callers of
57.BR futex ()
58are expected to adhere to the semantics described in
a8bda636 59.BR futex (7).
fea681da 60As these
d603cc27 61semantics involve writing nonportable assembly instructions, this in turn
fea681da
MK
62probably means that most users will in fact be library authors and not
63general application developers.
64.PP
65The
66.I uaddr
f388ba70
MK
67argument points to an integer which stores the counter (futex).
68On all platforms, futexes are four-byte integers that
69must be aligned on a four-byte boundary.
70The operation to perform on the futex is specified in the
71.I futex_op
72argument;
73.IR val
74is a value whose meaning and purpose depends on
75.IR futex_op .
36ab2074
MK
76
77The remaining arguments
78.RI ( timeout ,
79.IR uaddr2 ,
80and
81.IR val3 )
82are required only for certain of the futex operations described below.
83Where one of these arguments is not required, it is ignored.
768d3c23 84
36ab2074
MK
85For several blocking operations, the
86.I timeout
87argument is a pointer to a
88.IR timespec
89structure that specifies a timeout for the operation.
90However, notwithstanding the prototype shown above, for some operations,
91this argument is instead a four-byte integer whose meaning
92is determined by the operation.
768d3c23
MK
93For these operations, the kernel casts the
94.I timeout
95value to
96.IR u32 ,
97and in the remainder of this page, this argument is referred to as
98.I val2
99when interpreted in this fashion.
100
de5a3bb4 101Where it is required, the
36ab2074 102.IR uaddr2
de5a3bb4 103argument is a pointer to a second futex that is employed by the operation.
36ab2074
MK
104The interpretation of the final integer argument,
105.IR val3 ,
106depends on the operation.
107
6be4bad7 108The
d33602c4 109.I futex_op
6be4bad7
MK
110argument consists of two parts:
111a command that specifies the operation to be performed,
112bit-wise ORed with zero or or more options that
113modify the behaviour of the operation.
fc30eb79 114The options that may be included in
d33602c4 115.I futex_op
fc30eb79
TG
116are as follows:
117.TP
118.BR FUTEX_PRIVATE_FLAG " (since Linux 2.6.22)"
119.\" commit 34f01cc1f512fa783302982776895c73714ebbc2
120This option bit can be employed with all futex operations.
e45f9735
MK
121It tells the kernel that the futex is process-private and not shared
122with another process
123(i.e., it is being used for synchronization between threads).
fc30eb79
TG
124This allows the kernel to choose the fast path for validating
125the user-space address and avoids expensive VMA lookups,
126taking reference counts on file backing store, and so on.
ae2c1774
MK
127
128As a convenience,
129.IR <linux/futex.h>
130defines a set of constants with the suffix
131.BR _PRIVATE
132that are equivalents of all of the operations listed below,
dcdfde26 133.\" except the obsolete FUTEX_FD, for which the "private" flag was
ae2c1774
MK
134.\" meaningless
135but with the
136.BR FUTEX_PRIVATE_FLAG
137ORed into the constant value.
138Thus, there are
139.BR FUTEX_WAIT_PRIVATE ,
140.BR FUTEX_WAKE_PRIVATE ,
141and so on.
2e98bbc2
TG
142.TP
143.BR FUTEX_CLOCK_REALTIME " (since Linux 2.6.28)"
144.\" commit 1acdac104668a0834cfa267de9946fac7764d486
4a7e5b05 145This option bit can be employed only with the
2e98bbc2
TG
146.BR FUTEX_WAIT_BITSET
147and
148.BR FUTEX_WAIT_REQUEUE_PI
c84cf68c 149operations.
2e98bbc2 150
f2103b26
MK
151If this option is set, the kernel treats
152.I timeout
153as an absolute time based on
2e98bbc2
TG
154.BR CLOCK_REALTIME .
155
f2103b26
MK
156If this option is not set, the kernel treats
157.I timeout
158as relative time,
1c952cf5
MK
159.\" FIXME I added CLOCK_MONOTONIC here. Is it correct?
160measured against the
161.BR CLOCK_MONOTONIC
162clock.
6be4bad7
MK
163.PP
164The operation specified in
d33602c4 165.I futex_op
6be4bad7 166is one of the following:
70b06b90
MK
167.\"
168.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
169.\"
fea681da 170.TP
81c9d87e
MK
171.BR FUTEX_WAIT " (since Linux 2.6.0)"
172.\" Strictly speaking, since some time in 2.5.x
f065673c
MK
173This operation tests that the value at the
174location pointed to by the futex address
fea681da
MK
175.I uaddr
176still contains the value
177.IR val ,
f065673c 178and then sleeps awaiting
682edefb 179.B FUTEX_WAKE
f065673c
MK
180on the futex address.
181The test and sleep steps are performed atomically.
182If the futex value does not match
183.IR val ,
4710334a 184then the call fails immediately with the error
badbf70c 185.BR EAGAIN .
f065673c
MK
186.\" FIXME I added the following sentence. Please confirm that it is correct.
187The purpose of the test step is to detect races where
4e566b1e 188another process changes the value of the futex between
f065673c
MK
189the time it was last checked and the time of the
190.BR FUTEX_WAIT
63d3f911 191operation.
1909e523 192
c13182ef 193If the
fea681da 194.I timeout
53ba4030 195argument is non-NULL, its contents specify a relative timeout for the wait,
1c952cf5
MK
196.\" FIXME I added CLOCK_MONOTONIC here. Is it correct?
197measured according to the
198.BR CLOCK_MONOTONIC
199clock.
82a6092b
MK
200(This interval will be rounded up to the system clock granularity,
201and kernel scheduling delays mean that the
202blocking interval may overrun by a small amount.)
203If
204.I timeout
205is NULL, the call blocks indefinitely.
4798a7f3 206
c13182ef 207The arguments
fea681da
MK
208.I uaddr2
209and
210.I val3
211are ignored.
212
213For
a8bda636 214.BR futex (7),
fea681da
MK
215this call is executed if decrementing the count gave a negative value
216(indicating contention), and will sleep until another process releases
682edefb
MK
217the futex and executes the
218.B FUTEX_WAKE
219operation.
70b06b90
MK
220.\"
221.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
222.\"
fea681da 223.TP
81c9d87e
MK
224.BR FUTEX_WAKE " (since Linux 2.6.0)"
225.\" Strictly speaking, since Linux 2.5.x
f065673c
MK
226This operation wakes at most
227.I val
228processes waiting (i.e., inside
229.BR FUTEX_WAIT )
230on the futex at the address
231.IR uaddr .
232Most commonly,
233.I val
234is specified as either 1 (wake up a single waiter) or
235.BR INT_MAX
236(wake up all waiters).
730bfbda
MK
237.\" FIXME Please confirm that the following is correct:
238No guarantee is provided about which waiters are awoken
239(e.g., a waiter with a higher scheduling priority is not guaranteed
240to be awoken in preference to a waiter with a lower priority).
4798a7f3 241
fea681da
MK
242The arguments
243.IR timeout ,
c8b921bd 244.IR uaddr2 ,
fea681da
MK
245and
246.I val3
247are ignored.
248
249For
a8bda636 250.BR futex (7),
f2bf5121 251this is executed if incrementing the count showed that there were waiters,
64191e8f 252.\" FIXME How does "incrementing the count showed that there were waiters"?
f2bf5121 253once the futex value has been set to 1 (indicating that it is available).
70b06b90
MK
254.\"
255.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
256.\"
a7c2bf45
MK
257.TP
258.BR FUTEX_FD " (from Linux 2.6.0 up to and including Linux 2.6.25)"
259.\" Strictly speaking, from Linux 2.5.x to 2.6.25
260This operation creates a file descriptor that is associated with the futex at
261.IR uaddr .
a7c2bf45
MK
262The calling process must close the returned file descriptor after use.
263When another process performs a
264.BR FUTEX_WAKE
265on the futex, the file descriptor indicates as being readable with
266.BR select (2),
267.BR poll (2),
268and
269.BR epoll (7)
270
271The file descriptor can be used to obtain asynchronous notifications:
272if
273.I val
274is nonzero, then when another process executes a
275.BR FUTEX_WAKE ,
276the caller will receive the signal number that was passed in
277.IR val .
278
279The arguments
280.IR timeout ,
281.I uaddr2
282and
283.I val3
284are ignored.
285
286To prevent race conditions, the caller should test if the futex has
287been upped after
288.B FUTEX_FD
289returns.
290
291Because it was inherently racy,
292.B FUTEX_FD
293has been removed
294.\" commit 82af7aca56c67061420d618cc5a30f0fd4106b80
295from Linux 2.6.26 onward.
70b06b90
MK
296.\"
297.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
298.\"
a7c2bf45
MK
299.TP
300.BR FUTEX_REQUEUE " (since Linux 2.6.0)"
301.\" Strictly speaking: from Linux 2.5.70
302.\"
303.\" FIXME I added this warning. Okay?
304.IR "Avoid using this operation" .
305It is broken (unavoidably racy) for its intended purpose.
306Use
307.BR FUTEX_CMP_REQUEUE
308instead.
309
310This operation performs the same task as
311.BR FUTEX_CMP_REQUEUE ,
312except that no check is made using the value in
313.IR val3 .
314(The argument
315.I val3
316is ignored.)
70b06b90
MK
317.\"
318.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
319.\"
a7c2bf45
MK
320.TP
321.BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)"
322This operation was added as a replacement for the earlier
323.BR FUTEX_REQUEUE ,
324because that operation was racy for its intended use.
325
326As with
327.BR FUTEX_REQUEUE ,
328the
329.BR FUTEX_CMP_REQUEUE
330operation is used to avoid a "thundering herd" effect when
331.B FUTEX_WAKE
332is used and all processes woken up need to acquire another futex.
333It differs from
334.BR FUTEX_REQUEUE
335in that it first checks whether the location
336.I uaddr
337still contains the value
338.IR val3 .
339If not, the operation fails with the error
340.BR EAGAIN .
70b06b90
MK
341.\" FIXME I added the following sentence on the rationale for
342.\" FUTEX_CMP_REQUEUE. Is it correct? Should it be expanded?
a7c2bf45
MK
343This additional feature of
344.BR FUTEX_CMP_REQUEUE
345can be used by the caller to (atomically) detect changes
346in the value of the target futex at
347.IR uaddr2 .
348
349The operation wakes up a maximum of
350.I val
351waiters that are waiting on the futex at
352.IR uaddr .
353If there are more than
354.I val
355waiters, then the remaining waiters are removed
356from the wait queue of the source futex at
357.I uaddr
358and added to the wait queue of the target futex at
359.IR uaddr2 .
936876a9 360
a7c2bf45 361The
768d3c23 362.I val2
936876a9 363argument specifies an upper limit on the number of waiters
a7c2bf45 364that are requeued to the futex at
768d3c23 365.IR uaddr2 .
a7c2bf45
MK
366
367.\" FIXME Please review the following new paragraph to see if it is
368.\" accurate.
369Typical values to specify for
370.I val
371are 0 or or 1.
372(Specifying
373.BR INT_MAX
374is not useful, because it would make the
375.BR FUTEX_CMP_REQUEUE
376operation equivalent to
377.BR FUTEX_WAKE .)
936876a9 378The limit value specified via
768d3c23
MK
379.I val2
380is typically either 1 or
a7c2bf45
MK
381.BR INT_MAX .
382(Specifying the argument as 0 is not useful, because it would make the
383.BR FUTEX_CMP_REQUEUE
384operation equivalent to
385.BR FUTEX_WAIT .)
6bac3b85 386.\"
43d16602
MK
387.\" FIXME Here, it would be helpful to have an example of how
388.\" FUTEX_CMP_REQUEUE might be used, at the same time illustrating
389.\" why FUTEX_WAKE is unsuitable for the same use case.
390.\"
70b06b90
MK
391.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
392.\"
6bac3b85
MK
393.\" FIXME I added some FUTEX_WAKE_OP text, and I'd be happy if someone
394.\" checked it.
fea681da 395.TP
d67e21f5
MK
396.BR FUTEX_WAKE_OP " (since Linux 2.6.14)"
397.\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721
6bac3b85
MK
398.\" Author: Jakub Jelinek <jakub@redhat.com>
399.\" Date: Tue Sep 6 15:16:25 2005 -0700
400This operation was added to support some user-space use cases
401where more than one futex must be handled at the same time.
402The most notable example is the implementation of
403.BR pthread_cond_signal (3),
404which requires operations on two futexes,
405the one used to implement the mutex and the one used in the implementation
406of the wait queue associated with the condition variable.
407.BR FUTEX_WAKE_OP
408allows such cases to be implemented without leading to
409high rates of contention and context switching.
410
411The
412.BR FUTEX_WAIT_OP
413operation is equivalent to atomically executing the following code:
414
415.in +4n
416.nf
417int oldval = *(int *) uaddr2;
418*(int *) uaddr2 = oldval \fIop\fP \fIoparg\fP;
419futex(uaddr, FUTEX_WAKE, val, 0, 0, 0);
420if (oldval \fIcmp\fP \fIcmparg\fP)
768d3c23 421 futex(uaddr2, FUTEX_WAKE, val2, 0, 0, 0);
6bac3b85
MK
422.fi
423.in
424
425In other words,
426.BR FUTEX_WAIT_OP
427does the following:
428.RS
429.IP * 3
430saves the original value of the futex at
431.IR uaddr2 ;
432.IP *
433performs an operation to modify the value of the futex at
434.IR uaddr2 ;
435.IP *
436wakes up a maximum of
437.I val
438waiters on the futex
439.IR uaddr ;
440and
441.IP *
442dependent on the results of a test of the original value of the futex at
443.IR uaddr2 ,
444wakes up a maximum of
768d3c23 445.I val2
6bac3b85
MK
446waiters on the futex
447.IR uaddr2 .
448.RE
449.IP
6bac3b85
MK
450The operation and comparison that are to be performed are encoded
451in the bits of the argument
452.IR val3 .
453Pictorially, the encoding is:
454
f6af90e7 455.in +8n
6bac3b85 456.nf
f6af90e7
MK
457+---+---+-----------+-----------+
458|op |cmp| oparg | cmparg |
459+---+---+-----------+-----------+
460 4 4 12 12 <== # of bits
6bac3b85
MK
461.fi
462.in
463
464Expressed in code, the encoding is:
465
466.in +4n
467.nf
468#define FUTEX_OP(op, oparg, cmp, cmparg) \\
469 (((op & 0xf) << 28) | \\
470 ((cmp & 0xf) << 24) | \\
471 ((oparg & 0xfff) << 12) | \\
472 (cmparg & 0xfff))
473.fi
474.in
475
476In the above,
477.I op
478and
479.I cmp
480are each one of the codes listed below.
481The
482.I oparg
483and
484.I cmparg
485components are literal numeric values, except as noted below.
486
487The
488.I op
489component has one of the following values:
490
491.in +4n
492.nf
493FUTEX_OP_SET 0 /* uaddr2 = oparg; */
494FUTEX_OP_ADD 1 /* uaddr2 += oparg; */
495FUTEX_OP_OR 2 /* uaddr2 |= oparg; */
496FUTEX_OP_ANDN 3 /* uaddr2 &= ~oparg; */
497FUTEX_OP_XOR 4 /* uaddr2 ^= oparg; */
498.fi
499.in
500
501In addition, bit-wise ORing the following value into
502.I op
503causes
504.IR "(1\ <<\ oparg)"
505to be used as the operand:
506
507.in +4n
508.nf
509FUTEX_OP_ARG_SHIFT 8 /* Use (1 << oparg) as operand */
510.fi
511.in
512
513The
514.I cmp
515field is one of the following:
516
517.in +4n
518.nf
519FUTEX_OP_CMP_EQ 0 /* if (oldval == cmparg) wake */
520FUTEX_OP_CMP_NE 1 /* if (oldval != cmparg) wake */
521FUTEX_OP_CMP_LT 2 /* if (oldval < cmparg) wake */
522FUTEX_OP_CMP_LE 3 /* if (oldval <= cmparg) wake */
523FUTEX_OP_CMP_GT 4 /* if (oldval > cmparg) wake */
524FUTEX_OP_CMP_GE 5 /* if (oldval >= cmparg) wake */
525.fi
526.in
527
528The return value of
529.BR FUTEX_WAKE_OP
530is the sum of the number of waiters woken on the futex
531.IR uaddr
532plus the number of waiters woken on the futex
533.IR uaddr2 .
70b06b90
MK
534.\"
535.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
536.\"
d67e21f5 537.TP
79c9b436
TG
538.BR FUTEX_WAIT_BITSET " (since Linux 2.6.25)"
539.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
fd9e59d4 540This operation is like
79c9b436
TG
541.BR FUTEX_WAIT
542except that
543.I val3
544is used to provide a 32-bit bitset to the kernel.
545This bitset is stored in the kernel-internal state of the waiter.
546See the description of
547.BR FUTEX_WAKE_BITSET
548for further details.
549
fd9e59d4
MK
550The
551.BR FUTEX_WAIT_BITSET
552also interprets the
553.I timeout
554argument differently from
555.BR FUTEX_WAIT .
556See the discussion of
557.BR FUTEX_CLOCK_REALTIME ,
558above.
559
79c9b436
TG
560The
561.I uaddr2
562argument is ignored.
70b06b90
MK
563.\"
564.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
565.\"
79c9b436 566.TP
d67e21f5
MK
567.BR FUTEX_WAKE_BITSET " (since Linux 2.6.25)"
568.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
55cc422d
TG
569This operation is the same as
570.BR FUTEX_WAKE
571except that the
572.I val3
573argument is used to provide a 32-bit bitset to the kernel.
98d769c0
MK
574This bitset is used to select which waiters should be woken up.
575The selection is done by a bit-wise AND of the "wake" bitset
576(i.e., the value in
577.IR val3 )
578and the bitset which is stored in the kernel-internal
09cb4ce7 579state of the waiter (the "wait" bitset that is set using
98d769c0
MK
580.BR FUTEX_WAIT_BITSET ).
581All of the waiters for which the result of the AND is nonzero are woken up;
582the remaining waiters are left sleeping.
583
70b06b90 584.\" FIXME Is this paragraph that I added okay?
e9d4496b
MK
585The effect of
586.BR FUTEX_WAIT_BITSET
587and
588.BR FUTEX_WAKE_BITSET
589is to allow selective wake-ups among multiple waiters that are waiting
590on the same futex;
591since a futex has a size of 32 bits,
592these operations provide 32 wakeup "channels".
593(The
594.BR FUTEX_WAIT
595and
596.BR FUTEX_WAKE
597operations correspond to
598.BR FUTEX_WAIT_BITSET
599and
600.BR FUTEX_WAKE_BITSET
601operations where the bitsets are all ones.)
09cb4ce7 602Note, however, that using this bitset multiplexing feature on a
e9d4496b
MK
603futex is less efficient than simply using multiple futexes,
604because employing bitset multiplexing requires the kernel
605to check all waiters on a futex,
606including those that are not interested in being woken up
607(i.e., they do not have the relevant bit set in their "wait" bitset).
608.\" According to http://locklessinc.com/articles/futex_cheat_sheet/:
609.\"
610.\" "The original reason for the addition of these extensions
611.\" was to improve the performance of pthread read-write locks
612.\" in glibc. However, the pthreads library no longer uses the
613.\" same locking algorithm, and these extensions are not used
614.\" without the bitset parameter being all ones.
615.\"
616.\" The page goes on to note that the FUTEX_WAIT_BITSET operation
617.\" is nevertheless used (with a bitset of all ones) in order to
618.\" obtain the absolute timeout functionality that is useful
619.\" for efficiently implementing Pthreads APIs (which use absolute
620.\" timeouts); FUTEX_WAIT provides only relative timeouts.
621
98d769c0
MK
622The
623.I uaddr2
624and
625.I timeout
626arguments are ignored.
bd90a5f9 627.\"
70b06b90 628.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
bd90a5f9
MK
629.\"
630.SS Priority-inheritance futexes
b52e1cd4
MK
631Linux supports priority-inheritance (PI) futexes in order to handle
632priority-inversion problems that can be encountered with
633normal futex locks.
b565548b
MK
634Priority inversion is the problem that occurs when a high-priority
635process is blocked waiting to acquire a lock held by a low-priority process,
636while processes at an intermediate priority continuously preempt
637the low-priority process from the CPU.
638Consequently, the low-priority process makes no progress toward
639releasing the lock, and the high-priority process remains blocked.
7f315ae3 640
79d918c7
MK
641.\" FIXME ===== Start of adapted Hart/Guniguntala text =====
642.\" The following text is drawn from the Hart/Guniguntala paper,
643.\" but I have reworded some pieces significantly. Please check it.
644.\"
645The PI futex operations described below differ from the other
646futex operations in that they impose policy on the use of the futex value:
647.IP * 3
7c16fbff 648If the lock is unowned, the futex value shall be 0.
79d918c7
MK
649.IP *
650If the lock is owned, the futex value shall be the thread ID (TID; see
651.BR gettid (2))
652of the owning thread.
653.IP *
654.\" FIXME In the following line, I added "the lock is owned and". Okay?
655If the lock is owned and there are threads contending for the lock,
656then the
657.B FUTEX_WAITERS
658bit shall be set in the futex value; in other words, the futex value is:
659
660 FUTEX_WAITERS | TID
661.PP
662With this policy in place,
663a user-space application can acquire an unowned
21b060ba 664lock or release an uncontended lock using atomic
79d918c7 665.\" FIXME In the following line, I added "user-space". Okay?
21b060ba 666instructions executed in user-space (e.g.,
b52e1cd4
MK
667.I cmpxchg
668on the x86 architecture).
669Locking an unowned lock simply consists of setting
670the futex value to the caller's TID.
671Releasing an uncontended lock simply requires setting the futex value to 0.
672
673If a futex is currently owned (i.e., has a nonzero value),
674waiters must employ the
79d918c7
MK
675.B FUTEX_LOCK_PI
676operation to acquire the lock.
b52e1cd4 677If a lock is contended (i.e., the
79d918c7 678.B FUTEX_WAITERS
b52e1cd4 679bit is set in the futex value), the lock owner must employ the
79d918c7 680.B FUTEX_UNLOCK_PI
b52e1cd4
MK
681operation to release the lock.
682
79d918c7
MK
683In the cases where callers are forced into the kernel
684(i.e., required to perform a
685.BR futex ()
686operation),
687they then deal directly with a so-called RT-mutex,
688a kernel locking mechanism which implements the required
689priority-inheritance semantics.
690After the RT-mutex is acquired, the futex value is updated accordingly,
691before the calling thread returns to user space.
692.\" FIXME ===== End of adapted Hart/Guniguntala text =====
693
a59fca75
MK
694It is important to note
695.\" FIXME We need some explanation here of *why* it is important to
70b06b90 696.\" note this
a59fca75 697that the kernel will update the futex value prior
79d918c7
MK
698to returning to user space.
699Unlike the other futex operations described above,
700the PI futex operations are designed
7c16fbff 701for the implementation of very specific IPC mechanisms).
fc57e6bb
MK
702.\"
703.\" FIXME We don't quite have a definition anywhere of what a PI futex
70b06b90 704.\" is (vs a non-PI futex). Below, we have the information that
fc57e6bb
MK
705.\" FUTEX_CMP_REQUEUE_PI requeues from a non-PI futex to a
706.\" PI futex, but what determines whether the futex is of one
707.\" kind of the other? We should have such a definition somewhere
708.\" about here.
99c0ac69
MK
709.\"
710.\" FIXME In discussing errors for FUTEX_CMP_REQUEUE_PI, Darren Hart
711.\" made the observation that "EINVAL is returned if the non-pi
712.\" to pi or op pairing semantics are violated."
713.\" Probably there needs to be a general statement about this
714.\" requirement, probably located at about this point in the page.
dd003bef
MK
715.\"
716.\" FIXME Somewhere on this page (I guess under the discussion of PI
717.\" futexes) we need a discussion of the FUTEX_OWNER_DIED bit.
718.\" Can someone propose a text?
bd90a5f9
MK
719
720PI futexes are operated on by specifying one of the following values in
721.IR futex_op :
70b06b90
MK
722.\"
723.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
724.\"
d67e21f5
MK
725.TP
726.BR FUTEX_LOCK_PI " (since Linux 2.6.18)"
727.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
67833bec
MK
728.\"
729.\" FIXME I did some significant rewording of tglx's text.
730.\" Please check, in case I injected errors.
731.\"
732This operation is used after after an attempt to acquire
733the futex lock via an atomic user-space instruction failed
734because the futex has a nonzero value\(emspecifically,
735because it contained the namespace-specific TID of the lock owner.
67259526 736.\" FIXME In the preceding line, what does "namespace-specific" mean?
67833bec 737.\" (I kept those words from tglx.)
67259526 738.\" That is, what kind of namespace are we talking about?
67833bec
MK
739.\" (I suppose we are talking PID namespaces here, but I want to
740.\" be sure.)
741
742The operation checks the value of the futex at the address
743.IR uaddr .
70b06b90
MK
744If the value is 0, then the kernel tries to atomically set
745the futex value to the caller's TID.
67833bec
MK
746If that fails,
747.\" FIXME What would be the cause of failure?
748or the futex value is nonzero,
749the kernel atomically sets the
e0547e70 750.B FUTEX_WAITERS
67833bec
MK
751bit, which signals the futex owner that it cannot unlock the futex in
752user space atomically by setting the futex value to 0.
753After that, the kernel tries to find the thread which is
754associated with the owner TID,
755.\" FIXME Could I get a bit more detail on the next two lines?
756.\" What is "creates or reuses kernel state" about?
757creates or reuses kernel state on behalf of the owner
758and attaches the waiter to it.
67259526
MK
759.\" FIXME In the next line, what type of "priority" are we talking about?
760.\" Realtime priorities for SCHED_FIFO and SCHED_RR?
761.\" Or something else?
1f043693 762The enqueueing of the waiter is in descending priority order if more
e0547e70 763than one waiter exists.
67259526 764.\" FIXME What does "bandwidth" refer to in the next line?
e0547e70 765The owner inherits either the priority or the bandwidth of the waiter.
67259526
MK
766.\" FIXME In the preceding line, what determines whether the
767.\" owner inherits the priority versus the bandwidth?
67833bec
MK
768.\"
769.\" FIXME Could I get some help translating the next sentence into
770.\" something that user-space developers (and I) can understand?
70b06b90 771.\" In particular, what are "nested locks" in this context?
e0547e70
TG
772This inheritance follows the lock chain in the case of
773nested locking and performs deadlock detection.
774
9ce19cf1
MK
775.\" FIXME tglx says "The timeout argument is handled as described in
776.\" FUTEX_WAIT." However, it appears to me that this is not right.
70b06b90 777.\" Is the following formulation correct?
e0547e70
TG
778The
779.I timeout
9ce19cf1
MK
780argument provides a timeout for the lock attempt.
781It is interpreted as an absolute time, measured against the
782.BR CLOCK_REALTIME
783clock.
784If
785.I timeout
786is NULL, the operation will block indefinitely.
e0547e70 787
a449c634 788The
e0547e70
TG
789.IR uaddr2 ,
790.IR val ,
791and
792.IR val3
a449c634 793arguments are ignored.
67833bec 794.\"
70b06b90
MK
795.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
796.\"
d67e21f5 797.TP
12fdbe23 798.BR FUTEX_TRYLOCK_PI " (since Linux 2.6.18)"
d67e21f5 799.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
12fdbe23
MK
800This operation tries to acquire the futex at
801.IR uaddr .
0b761826 802.\" FIXME I think it would be helpful here to say a few more words about
70b06b90
MK
803.\" the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI.
804.\" Can someone propose something?
805.\"
fa0388c3 806It deals with the situation where the TID value at
12fdbe23
MK
807.I uaddr
808is 0, but the
b52e1cd4 809.B FUTEX_WAITERS
12fdbe23 810bit is set.
fa0388c3
MK
811.\" FIXME How does the situation in the previous sentence come about?
812.\" Probably it would be helpful to say something about that in
813.\" the man page.
badbf70c 814.\" FIXME And *how* does FUTEX_TRYLOCK_PI deal with this situation?
a282e5b0 815User space cannot handle this condition in a race-free manner
084744ef
MK
816
817The
818.IR uaddr2 ,
819.IR val ,
820.IR timeout ,
821and
822.IR val3
823arguments are ignored.
70b06b90
MK
824.\"
825.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
826.\"
d67e21f5 827.TP
12fdbe23 828.BR FUTEX_UNLOCK_PI " (since Linux 2.6.18)"
d67e21f5 829.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
d4ba4328 830This operation wakes the top priority waiter that is waiting in
ecae2099
TG
831.B FUTEX_LOCK_PI
832on the futex address provided by the
833.I uaddr
834argument.
835
836This is called when the user space value at
837.I uaddr
838cannot be changed atomically from a TID (of the owner) to 0.
839
840The
841.IR uaddr2 ,
842.IR val ,
843.IR timeout ,
844and
845.IR val3
11a194bf 846arguments are ignored.
70b06b90
MK
847.\"
848.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
849.\"
d67e21f5 850.TP
d67e21f5
MK
851.BR FUTEX_CMP_REQUEUE_PI " (since Linux 2.6.31)"
852.\" commit 52400ba946759af28442dee6265c5c0180ac7122
f812a08b
DH
853This operation is a PI-aware variant of
854.BR FUTEX_CMP_REQUEUE .
855It requeues waiters that are blocked via
856.B FUTEX_WAIT_REQUEUE_PI
857on
858.I uaddr
859from a non-PI source futex
860.RI ( uaddr )
861to a PI target futex
862.RI ( uaddr2 ).
863
9e54d26d
MK
864As with
865.BR FUTEX_CMP_REQUEUE ,
866this operation wakes up a maximum of
867.I val
868waiters that are waiting on the futex at
869.IR uaddr .
870However, for
871.BR FUTEX_CMP_REQUEUE_PI ,
872.I val
6fbeb8f4 873is required to be 1
939ca89f 874(since the main point is to avoid a thundering herd).
9e54d26d
MK
875The remaining waiters are removed from the wait queue of the source futex at
876.I uaddr
877and added to the wait queue of the target futex at
878.IR uaddr2 .
f812a08b 879
9e54d26d 880The
768d3c23 881.I val2
c6d8cf21
MK
882.\" val2 is the cap on the number of requeued waiters.
883.\" In the glibc pthread_cond_broadcast() implementation, this argument
884.\" is specified as INT_MAX, and for pthread_cond_signal() it is 0.
9e54d26d 885and
768d3c23 886.I val3
9e54d26d
MK
887arguments serve the same purposes as for
888.BR FUTEX_CMP_REQUEUE .
70b06b90 889.\"
be376673
MK
890.\" FIXME The page at http://locklessinc.com/articles/futex_cheat_sheet/
891.\" notes that "priority-inheritance Futex to priority-inheritance
892.\" Futex requeues are currently unsupported". Do we need to say
893.\" something in the man page about that?
70b06b90
MK
894.\"
895.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
896.\"
d67e21f5
MK
897.TP
898.BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)"
899.\" commit 52400ba946759af28442dee6265c5c0180ac7122
70b06b90
MK
900.\"
901.\" FIXME I find the next sentence (from tglx) pretty hard to grok.
902.\" Could someone explain it a bit more.
6ff1b4c0
TG
903Wait operation to wait on a non-PI futex at
904.I uaddr
905and potentially be requeued onto a PI futex at
906.IR uaddr2 .
907The wait operation on
908.I uaddr
909is the same as
910.BR FUTEX_WAIT .
70b06b90
MK
911.\"
912.\" FIXME What does the next sentence mean?
6ff1b4c0
TG
913The waiter can be removed from the wait on
914.I uaddr
915via
916.BR FUTEX_WAKE
917without requeueing on
918.IR uaddr2 .
a4e69912 919
5d67b190
MK
920.\" FIXME Somewhere around here, something needs to be said about
921.\" the pairing semantics of FUTEX_CMP_REQUEUE_PI and
70b06b90 922.\" FUTEX_WAIT_REQUEUE_PI. (The Hart/Guniguntala paper says
5d67b190
MK
923.\" "FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI must be
924.\" paired only with each other." Could someone propose
925.\" a statement about this pairing requirement and why it
926.\" is needed?
927.\"
63bea7dc
MK
928.\" FIXME Please check the following. tglx said "The timeout argument
929.\" is handled as described in FUTEX_WAIT.", but the truth is
930.\" as below, AFAICS
931If
932.I timeout
933is not NULL, it specifies a timeout for the wait operation;
934this timeout is interpreted as outlined above in the description of the
935.BR FUTEX_CLOCK_REALTIME
936option.
937If
938.I timeout
939is NULL, the operation can block indefinitely.
940
a4e69912
MK
941The
942.I val3
943argument is ignored.
70b06b90 944.\" FIXME Re the preceding sentence... Actually 'val3' is internally set to
a4e69912
MK
945.\" FUTEX_BITSET_MATCH_ANY before calling futex_wait_requeue_pi().
946.\" I'm not sure we need to say anything about this though.
947.\" Comments?
70b06b90 948.\"
b565548b 949 .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
70b06b90 950.\"
47297adb 951.SH RETURN VALUE
fea681da 952.PP
6f147f79 953In the event of an error, all operations return \-1 and set
e808bba0 954.I errno
6f147f79 955to indicate the cause of the error.
e808bba0
MK
956The return value on success depends on the operation,
957as described in the following list:
fea681da
MK
958.TP
959.B FUTEX_WAIT
682edefb
MK
960Returns 0 if the process was woken by a
961.B FUTEX_WAKE
7446a837
MK
962or
963.B FUTEX_WAKE_BITSET
682edefb 964call.
fea681da
MK
965.TP
966.B FUTEX_WAKE
967Returns the number of processes woken up.
968.TP
969.B FUTEX_FD
970Returns the new file descriptor associated with the futex.
971.TP
972.B FUTEX_REQUEUE
973Returns the number of processes woken up.
974.TP
975.B FUTEX_CMP_REQUEUE
3dfcc11d
MK
976Returns the total number of processes woken up or requeued to the futex at
977.IR uaddr2 .
978If this value is greater than
979.IR val ,
980then difference is the number of waiters requeued to the futex at
981.IR uaddr2 .
dcad19c0
MK
982.TP
983.B FUTEX_WAKE_OP
a8b5b324
MK
984.\" FIXME Is the following correct?
985Returns the total number of waiters that were woken up.
986This is the sum of the woken waiters on the two futexes at
987.I uaddr
988and
989.IR uaddr2 .
dcad19c0
MK
990.TP
991.B FUTEX_WAIT_BITSET
7bcc5351
MK
992.\" FIXME Is the following correct?
993Returns 0 if the process was woken by a
994.B FUTEX_WAKE
995or
996.B FUTEX_WAKE_BITSET
997call.
dcad19c0
MK
998.TP
999.B FUTEX_WAKE_BITSET
b884566b
MK
1000.\" FIXME Is the following correct?
1001Returns the number of processes woken up.
dcad19c0
MK
1002.TP
1003.B FUTEX_LOCK_PI
bf02a260
MK
1004.\" FIXME Is the following correct?
1005Returns 0 if the futex was successfully locked.
dcad19c0
MK
1006.TP
1007.B FUTEX_TRYLOCK_PI
5c716eef
MK
1008.\" FIXME Is the following correct?
1009Returns 0 if the futex was successfully locked.
dcad19c0
MK
1010.TP
1011.B FUTEX_UNLOCK_PI
52bb928f
MK
1012.\" FIXME Is the following correct?
1013Returns 0 if the futex was successfully unlocked.
dcad19c0
MK
1014.TP
1015.B FUTEX_CMP_REQUEUE_PI
dddd395a
MK
1016.\" FIXME Is the following correct?
1017Returns the total number of processes woken up or requeued to the futex at
1018.IR uaddr2 .
1019If this value is greater than
1020.IR val ,
1021then difference is the number of waiters requeued to the futex at
1022.IR uaddr2 .
dcad19c0
MK
1023.TP
1024.B FUTEX_WAIT_REQUEUE_PI
22c15de9
MK
1025.\" FIXME Is the following correct?
1026Returns 0 if the caller was successfully requeued to the futex at
1027.IR uaddr2 .
70b06b90
MK
1028.\"
1029.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1030.\"
fea681da
MK
1031.SH ERRORS
1032.TP
1033.B EACCES
1034No read access to futex memory.
1035.TP
1036.B EAGAIN
f48516d1
MK
1037.RB ( FUTEX_WAIT ,
1038.BR FUTEX_WAIT_REQUEUE_PI )
badbf70c
MK
1039The value pointed to by
1040.I uaddr
1041was not equal to the expected value
1042.I val
1043at the time of the call.
1044.TP
1045.B EAGAIN
8f2068bb
MK
1046.RB ( FUTEX_CMP_REQUEUE ,
1047.BR FUTEX_CMP_REQUEUE_PI )
ce5602fd 1048The value pointed to by
9f6c40c0
МК
1049.I uaddr
1050is not equal to the expected value
1051.IR val3 .
fd1dc4c2 1052.\" FIXME: Is the following sentence correct?
fea681da 1053(This probably indicates a race;
682edefb
MK
1054use the safe
1055.B FUTEX_WAKE
1056now.)
c0091dd3
MK
1057.\"
1058.\" FIXME Should there be an EAGAIN case for FUTEX_TRYLOCK_PI?
1059.\" It seems so, looking at the handling of the rt_mutex_trylock()
1060.\" call in futex_lock_pi()
1061.\"
fea681da 1062.TP
5662f56a
MK
1063.BR EAGAIN
1064.RB ( FUTEX_LOCK_PI ,
aaec9032
MK
1065.BR FUTEX_TRYLOCK_PI ,
1066.BR FUTEX_CMP_REQUEUE_PI )
1067The futex owner thread ID of
1068.I uaddr
1069(for
1070.BR FUTEX_CMP_REQUEUE_PI :
1071.IR uaddr2 )
1072is about to exit,
5662f56a
MK
1073but has not yet handled the internal state cleanup.
1074Try again.
61f8c1d1
MK
1075.\"
1076.\" FIXME Is there not also an EAGAIN error case on 'uaddr2' for
1077.\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
1078.\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
1079.\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EAGAIN?
5662f56a 1080.TP
7a39e745
MK
1081.BR EDEADLK
1082.RB ( FUTEX_LOCK_PI ,
1083.BR FUTEX_TRYLOCK_PI )
1084The futex at
1085.I uaddr
1086is already locked by the caller.
d08ce5dd
MK
1087.\"
1088.\" FIXME Is there not also an EDEADLK error case on 'uaddr2' for
1089.\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
1090.\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
1091.\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EDEADLK?
7a39e745 1092.TP
662c0da8
MK
1093.BR EDEADLK
1094.\" FIXME I reworded tglx's text somewhat; is the following okay?
1095.RB ( FUTEX_CMP_REQUEUE_PI )
1096While requeueing a waiter to the PI futex at
1097.IR uaddr2 ,
1098the kernel detected a deadlock.
1099.TP
fea681da 1100.B EFAULT
1ea901e8
MK
1101A required pointer argument (i.e.,
1102.IR uaddr ,
1103.IR uaddr2 ,
1104or
1105.IR timeout )
496df304 1106did not point to a valid user-space address.
fea681da 1107.TP
9f6c40c0 1108.B EINTR
e808bba0 1109A
9f6c40c0 1110.B FUTEX_WAIT
2674f781
MK
1111or
1112.B FUTEX_WAIT_BITSET
e808bba0
MK
1113operation was interrupted by a signal (see
1114.BR signal (7))
1115or a spurious wakeup.
5eeca856
MK
1116.\" FIXME
1117.\" Regarding the words "spurious wakeup" above, I received this
1118.\" bug report from Rich Felker:
1119.\"
1120.\" I see no code in the kernel whereby a "spurious wakeup", or anything
1121.\" other than interruption by a signal handler that's not SA_RESTART,
1122.\" can cause futex to fail with EINTR. In general, overloading of EINTR
1123.\" and/or spurious EINTRs from a syscall make it impossible to use that
1124.\" syscall for implementing any function where EINTR is a mandatory
1125.\" failure on interruption-by-signal, since there is no way for
1126.\" userspace to distinguish whether the EINTR occurred as a result of
1127.\" an interrupting signal or some other reason. The kernel folks have
1128.\" gone to great lengths to fix spurious EINTRs (see signal(7) for
1129.\" history), especially by non-interrupting signal handlers, including
1130.\" in futex, and allowing EINTR here would be contrary to that goal.
1131.\"
1132.\" It's my belief that the "or a spurious wakeup" text should simply be
1133.\" removed.
1134.\"
1135.\" The reason I'm raising this topic is its relevance to a thread on
1136.\" libc-alpha:
1137.\" [RFC] mutex destruction (#13690): problem description and workarounds
1138.\"
1139.\" The bug and mailing list discussions to which Rich refers are:
1140.\" https://sourceware.org/bugzilla/show_bug.cgi?id=13690
1141.\" https://sourceware.org/ml/libc-alpha/2014-12/threads.html#0001
1142.\"
1143.\" Can anyone comment on whether the words "spurious wakeup" are correct?
1144.\"
9f6c40c0 1145.TP
fea681da 1146.B EINVAL
180f97b7
MK
1147The operation in
1148.IR futex_op
1149is one of those that employs a timeout, but the supplied
fb2f4c27
MK
1150.I timeout
1151argument was invalid
1152.RI ( tv_sec
1153was less than zero, or
1154.IR tv_nsec
1155was not less than 1000,000,000).
1156.TP
1157.B EINVAL
0c74df0b 1158The operation specified in
025e1374 1159.IR futex_op
0c74df0b 1160employs one or both of the pointers
51ee94be 1161.I uaddr
a1f47699 1162and
0c74df0b
MK
1163.IR uaddr2 ,
1164but one of these does not point to a valid object\(emthat is,
1165the address is not four-byte-aligned.
51ee94be
MK
1166.TP
1167.B EINVAL
55cc422d
TG
1168.RB ( FUTEX_WAIT_BITSET ,
1169.BR FUTEX_WAKE_BITSET )
79c9b436
TG
1170The bitset supplied in
1171.IR val3
1172is zero.
1173.TP
1174.B EINVAL
2043f2c1
MK
1175.RB ( FUTEX_REQUEUE ,
1176.\" FIXME tglx suggested adding this, but does this error really occur for
1177.\" FUTEX_REQUEUE? (The case where it occurs for FUTEX_CMP_REQUEUE_PI
1178.\" is obvious at the start of futex_requeue().)
1179.BR FUTEX_CMP_REQUEUE_PI )
add875c0
MK
1180.I uaddr
1181equals
1182.IR uaddr2
1183(i.e., an attempt was made to requeue to the same futex).
1184.TP
ff597681
MK
1185.BR EINVAL
1186.RB ( FUTEX_FD )
1187The signal number supplied in
1188.I val
1189is invalid.
1190.TP
6bac3b85 1191.B EINVAL
476debd7
MK
1192.RB ( FUTEX_WAKE ,
1193.BR FUTEX_WAKE_OP ,
1194.BR FUTEX_WAKE_BITSET ,
1195.BR FUTEX_REQUEUE ,
1196.BR FUTEX_CMP_REQUEUE )
1197The kernel detected an inconsistency between the user-space state at
1198.I uaddr
1199and the kernel state\(emthat is, it detected a waiter which waits in
1200.BR FUTEX_LOCK_PI
1201on
1202.IR uaddr .
1203.TP
1204.B EINVAL
a218ef20 1205.RB ( FUTEX_LOCK_PI ,
ce022f18
MK
1206.BR FUTEX_TRYLOCK_PI ,
1207.BR FUTEX_UNLOCK_PI )
a218ef20
MK
1208The kernel detected an inconsistency between the user-space state at
1209.I uaddr
1210and the kernel state.
ce022f18
MK
1211This indicates either state corruption
1212.\" FIXME tglx did not mention the "state corruption" for FUTEX_UNLOCK_PI.
1213.\" Does that case also apply for FUTEX_UNLOCK_PI?
1214or that the kernel found a waiter on
a218ef20
MK
1215.I uaddr
1216which is waiting via
1217.BR FUTEX_WAIT
1218or
1219.BR FUTEX_WAIT_BITSET .
1220.TP
1221.B EINVAL
f9250b1a
MK
1222.RB ( FUTEX_CMP_REQUEUE_PI )
1223The kernel detected an inconsistency between the user-space state at
99c0041d
MK
1224.I uaddr2
1225and the kernel state;
1226that is, the kernel detected a waiter which waits via
1227.BR FUTEX_WAIT
1228.\" FIXME tglx did not mention FUTEX_WAIT_BITSET here,
1229.\" but should that not also be included here?
1230on
1231.IR uaddr2 .
1232.TP
1233.B EINVAL
1234.RB ( FUTEX_CMP_REQUEUE_PI )
1235The kernel detected an inconsistency between the user-space state at
f9250b1a
MK
1236.I uaddr
1237and the kernel state;
1238that is, the kernel detected a waiter which waits via
75299c8d 1239.BR FUTEX_WAIT
99c0041d 1240or
75299c8d 1241.BR FUTEX_WAIT_BITESET
f9250b1a
MK
1242on
1243.IR uaddr .
1244.TP
1245.B EINVAL
99c0041d 1246.RB ( FUTEX_CMP_REQUEUE_PI )
75299c8d
MK
1247The kernel detected an inconsistency between the user-space state at
1248.I uaddr
1249and the kernel state;
1250that is, the kernel detected a waiter which waits on
1251.I uaddr
1252via
1253.BR FUTEX_LOCK_PI
1254(instead of
1255.BR FUTEX_WAIT_REQUEUE_PI ).
99c0041d
MK
1256.TP
1257.B EINVAL
9786b3ca 1258.RB ( FUTEX_CMP_REQUEUE_PI )
70b06b90 1259.\" FIXME The following is a reworded version of Darren Hart's text.
9786b3ca
MK
1260.\" Please check that I did not introduce any errors.
1261An attempt was made to requeue a waiter to a futex other than that
1262specified by the matching
1263.B FUTEX_WAIT_REQUEUE_PI
1264call for that waiter.
1265.TP
1266.B EINVAL
f0c0d61c
MK
1267.RB ( FUTEX_CMP_REQUEUE_PI )
1268The
1269.I val
1270argument is not 1.
1271.TP
1272.B EINVAL
4832b48a 1273Invalid argument.
fea681da 1274.TP
a449c634
MK
1275.BR ENOMEM
1276.RB ( FUTEX_LOCK_PI ,
e34a8fb6
MK
1277.BR FUTEX_TRYLOCK_PI ,
1278.BR FUTEX_CMP_REQUEUE_PI )
a449c634
MK
1279The kernel could not allocate memory to hold state information.
1280.TP
fea681da 1281.B ENFILE
ff597681 1282.RB ( FUTEX_FD )
fea681da 1283The system limit on the total number of open files has been reached.
4701fc28
MK
1284.TP
1285.B ENOSYS
1286Invalid operation specified in
d33602c4 1287.IR futex_op .
9f6c40c0 1288.TP
4a7e5b05
MK
1289.B ENOSYS
1290The
1291.BR FUTEX_CLOCK_REALTIME
1292option was specified in
1afcee7c 1293.IR futex_op ,
4a7e5b05
MK
1294but the accompanying operation was neither
1295.BR FUTEX_WAIT_BITSET
1296nor
1297.BR FUTEX_WAIT_REQUEUE_PI .
1298.TP
a9dcb4d1
MK
1299.BR ENOSYS
1300.RB ( FUTEX_LOCK_PI ,
f2424fae 1301.BR FUTEX_TRYLOCK_PI ,
4945ff19 1302.BR FUTEX_UNLOCK_PI ,
4cf92894 1303.BR FUTEX_CMP_REQUEUE_PI ,
794bb106 1304.BR FUTEX_WAIT_REQUEUE_PI )
a9dcb4d1 1305A run-time check determined that the operation not available.
a2ebebcd
MK
1306The PI futex operations are not implemented on all architectures and
1307are not supported on some CPU variants.
a9dcb4d1 1308.TP
c7589177
MK
1309.BR EPERM
1310.RB ( FUTEX_LOCK_PI ,
dc2742a8
MK
1311.BR FUTEX_TRYLOCK_PI ,
1312.BR FUTEX_CMP_REQUEUE_PI )
04331c3f 1313The caller is not allowed to attach itself to the futex at
dc2742a8
MK
1314.I uaddr
1315(for
1316.BR FUTEX_CMP_REQUEUE_PI :
1317the futex at
1318.IR uaddr2 ).
c7589177 1319(This may be caused by a state corruption in user space.)
61f8c1d1
MK
1320.\"
1321.\" FIXME Is there not also an EPERM error case on 'uaddr2' for
1322.\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
1323.\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
1324.\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EPERM?
c7589177 1325.TP
76f347ba 1326.BR EPERM
87276709 1327.RB ( FUTEX_UNLOCK_PI )
76f347ba
MK
1328The caller does not own the futex.
1329.TP
0b0e4934
MK
1330.BR ESRCH
1331.RB ( FUTEX_LOCK_PI ,
1332.BR FUTEX_TRYLOCK_PI )
1333.\" FIXME I reworded the following sentence a bit differently from
1334.\" tglx's formulation. Is it okay?
1335The thread ID in the futex at
1336.I uaddr
1337does not exist.
61f8c1d1
MK
1338.\"
1339.\" FIXME Is there not also an ESRCH error case on 'uaddr2' for
1340.\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
1341.\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
1342.\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> ESRCH?
0b0e4934 1343.TP
360f773c
MK
1344.BR ESRCH
1345.RB ( FUTEX_CMP_REQUEUE_PI )
1346.\" FIXME I reworded the following sentence a bit differently from
1347.\" tglx's formulation. Is it okay?
1348The thread ID in the futex at
1349.I uaddr2
1350does not exist.
1351.TP
9f6c40c0 1352.B ETIMEDOUT
4d85047f
MK
1353The operation in
1354.IR futex_op
1355employed the timeout specified in
1356.IR timeout ,
1357and the timeout expired before the operation completed.
70b06b90
MK
1358.\"
1359.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
1360.\"
47297adb 1361.SH VERSIONS
a1d5f77c 1362.PP
81c9d87e
MK
1363Futexes were first made available in a stable kernel release
1364with Linux 2.6.0.
1365
a1d5f77c
MK
1366Initial futex support was merged in Linux 2.5.7 but with different semantics
1367from what was described above.
52dee70e 1368A four-argument system call with the semantics
fd3fa7ef 1369described in this page was introduced in Linux 2.5.40.
11b520ed 1370In Linux 2.5.70, one argument
a1d5f77c 1371was added.
11b520ed 1372In Linux 2.6.7, a sixth argument was added\(emmessy, especially
a1d5f77c 1373on the s390 architecture.
47297adb 1374.SH CONFORMING TO
8382f16d 1375This system call is Linux-specific.
47297adb 1376.SH NOTES
baf0f1f4
MK
1377Glibc does not provide a wrapper for this system call; call it using
1378.BR syscall (2).
1379
fcdad7d6 1380To reiterate, bare futexes are not intended as an easy-to-use abstraction
c13182ef 1381for end-users.
fcdad7d6 1382(There is no wrapper function for this system call in glibc.)
c13182ef 1383Implementors are expected to be assembly literate and to have
7fac88a9 1384read the sources of the futex user-space library referenced below.
d282bb24 1385.\" .SH AUTHORS
fea681da
MK
1386.\" .PP
1387.\" Futexes were designed and worked on by
1388.\" Hubertus Franke (IBM Thomas J. Watson Research Center),
1389.\" Matthew Kirkwood, Ingo Molnar (Red Hat)
1390.\" and Rusty Russell (IBM Linux Technology Center).
1391.\" This page written by bert hubert.
47297adb 1392.SH SEE ALSO
4c222281 1393.ad l
9913033c 1394.BR get_robust_list (2),
d806bc05 1395.BR restart_syscall (2),
14d8dd3b 1396.BR futex (7)
fea681da 1397.PP
f5ad572f
MK
1398The following kernel source files:
1399.IP * 2
1400.I Documentation/pi-futex.txt
1401.IP *
1402.I Documentation/futex-requeue-pi.txt
1403.IP *
1404.I Documentation/locking/rt-mutex.txt
1405.IP *
1406.I Documentation/locking/rt-mutex-design.txt
8fe019c7
MK
1407.IP *
1408.I Documentation/robust-futex-ABI.txt
43b99089 1409.PP
4c222281 1410Franke, H., Russell, R., and Kirwood, M., 2002.
52087dd3 1411\fIFuss, Futexes and Furwocks: Fast Userlevel Locking in Linux\fP
4c222281 1412(from proceedings of the Ottawa Linux Symposium 2002),
9b936e9e 1413.br
608bf950
SK
1414.UR http://kernel.org\:/doc\:/ols\:/2002\:/ols2002-pages-479-495.pdf
1415.UE
f42eb21b 1416
4c222281 1417Hart, D., 2009. \fIA futex overview and update\fP,
2ed26199
MK
1418.UR http://lwn.net/Articles/360699/
1419.UE
1420
4c222281 1421Hart, D. and Guniguntala, D., 2009.
0483b6cc 1422\fIRequeue-PI: Making Glibc Condvars PI-Aware\fP
4c222281 1423(from proceedings of the 2009 Real-Time Linux Workshop),
0483b6cc
MK
1424.UR http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
1425.UE
1426
4c222281 1427Drepper, U., 2011. \fIFutexes Are Tricky\fP,
f42eb21b
MK
1428.UR http://www.akkadia.org/drepper/futex.pdf
1429.UE
9b936e9e
MK
1430.PP
1431Futex example library, futex-*.tar.bz2 at
1432.br
a605264d 1433.UR ftp://ftp.kernel.org\:/pub\:/linux\:/kernel\:/people\:/rusty/
608bf950 1434.UE
34f14794
MK
1435.\"
1436.\" FIXME Are there any other resources that should be listed
1437.\" in the SEE ALSO section?