]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/futex.2
futex.2: srcfix: FIXME
[thirdparty/man-pages.git] / man2 / futex.2
CommitLineData
8f0aff2a 1.\" Page by b.hubert
1abce893
MK
2.\" and Copyright (C) 2015, Thomas Gleixner <tglx@linutronix.de>
3.\" and Copyright (C) 2015, Michael Kerrisk <mtk.manpages@gmail.com>
2297bf0e 4.\"
2e46a6e7 5.\" %%%LICENSE_START(FREELY_REDISTRIBUTABLE)
8f0aff2a 6.\" may be freely modified and distributed
8ff7380d 7.\" %%%LICENSE_END
fea681da
MK
8.\"
9.\" Niki A. Rahimi (LTC Security Development, narahimi@us.ibm.com)
10.\" added ERRORS section.
11.\"
12.\" Modified 2004-06-17 mtk
13.\" Modified 2004-10-07 aeb, added FUTEX_REQUEUE, FUTEX_CMP_REQUEUE
14.\"
4f58b197
MK
15.\" 2.6.31 adds FUTEX_WAIT_REQUEUE_PI, FUTEX_CMP_REQUEUE_PI
16.\" commit 52400ba946759af28442dee6265c5c0180ac7122
17.\" Author: Darren Hart <dvhltc@us.ibm.com>
18.\" Date: Fri Apr 3 13:40:49 2009 -0700
19.\"
20.\" commit ba9c22f2c01cf5c88beed5a6b9e07d42e10bd358
21.\" Author: Darren Hart <dvhltc@us.ibm.com>
22.\" Date: Mon Apr 20 22:22:22 2009 -0700
23.\"
24.\" See Documentation/futex-requeue-pi.txt
34f7665a 25.\"
3d155313 26.TH FUTEX 2 2014-05-21 "Linux" "Linux Programmer's Manual"
fea681da 27.SH NAME
ce154705 28futex \- fast user-space locking
fea681da 29.SH SYNOPSIS
9d9dc1e8 30.nf
fea681da
MK
31.sp
32.B "#include <linux/futex.h>"
fea681da
MK
33.B "#include <sys/time.h>"
34.sp
d33602c4
MK
35.BI "int futex(int *" uaddr ", int " futex_op ", int " val ,
36.BI " const struct timespec *" timeout ,
9d9dc1e8 37.BI " int *" uaddr2 ", int " val3 );
fea681da 38.\" int *? void *? u32 *?
9d9dc1e8 39.fi
409f08b0 40
b939d6e4
MK
41.IR Note :
42There is no glibc wrapper for this system call; see NOTES.
47297adb 43.SH DESCRIPTION
fea681da
MK
44.PP
45The
e511ffb6 46.BR futex ()
fea681da
MK
47system call provides a method for
48a program to wait for a value at a given address to change, and a
49method to wake up anyone waiting on a particular address (while the
50addresses for the same memory in separate processes may not be
51equal, the kernel maps them internally so the same memory mapped in
52different locations will correspond for
e511ffb6 53.BR futex ()
c13182ef 54calls).
fd3fa7ef 55This system call is typically used to
fea681da
MK
56implement the contended case of a lock in shared memory, as
57described in
a8bda636 58.BR futex (7).
fea681da 59.PP
f388ba70
MK
60When a futex operation did not finish uncontended in user space, a
61.BR futex ()
62call needs to be made to the kernel to arbitrate.
c13182ef 63Arbitration can either mean putting the calling
fea681da
MK
64process to sleep or, conversely, waking a waiting process.
65.PP
f388ba70
MK
66Callers of
67.BR futex ()
68are expected to adhere to the semantics described in
a8bda636 69.BR futex (7).
fea681da 70As these
d603cc27 71semantics involve writing nonportable assembly instructions, this in turn
fea681da
MK
72probably means that most users will in fact be library authors and not
73general application developers.
74.PP
75The
76.I uaddr
f388ba70
MK
77argument points to an integer which stores the counter (futex).
78On all platforms, futexes are four-byte integers that
79must be aligned on a four-byte boundary.
80The operation to perform on the futex is specified in the
81.I futex_op
82argument;
83.IR val
84is a value whose meaning and purpose depends on
85.IR futex_op .
36ab2074
MK
86
87The remaining arguments
88.RI ( timeout ,
89.IR uaddr2 ,
90and
91.IR val3 )
92are required only for certain of the futex operations described below.
93Where one of these arguments is not required, it is ignored.
94For several blocking operations, the
95.I timeout
96argument is a pointer to a
97.IR timespec
98structure that specifies a timeout for the operation.
99However, notwithstanding the prototype shown above, for some operations,
100this argument is instead a four-byte integer whose meaning
101is determined by the operation.
102Where it is required,
103.IR uaddr2
104is a pointer to a second futex that is employed by the operation.
105The interpretation of the final integer argument,
106.IR val3 ,
107depends on the operation.
108
6be4bad7 109The
d33602c4 110.I futex_op
6be4bad7
MK
111argument consists of two parts:
112a command that specifies the operation to be performed,
113bit-wise ORed with zero or or more options that
114modify the behaviour of the operation.
fc30eb79 115The options that may be included in
d33602c4 116.I futex_op
fc30eb79
TG
117are as follows:
118.TP
119.BR FUTEX_PRIVATE_FLAG " (since Linux 2.6.22)"
120.\" commit 34f01cc1f512fa783302982776895c73714ebbc2
121This option bit can be employed with all futex operations.
122It tells the kernel that the futex is process private and not shared
123with another process.
124This allows the kernel to choose the fast path for validating
125the user-space address and avoids expensive VMA lookups,
126taking reference counts on file backing store, and so on.
ae2c1774
MK
127
128As a convenience,
129.IR <linux/futex.h>
130defines a set of constants with the suffix
131.BR _PRIVATE
132that are equivalents of all of the operations listed below,
dcdfde26 133.\" except the obsolete FUTEX_FD, for which the "private" flag was
ae2c1774
MK
134.\" meaningless
135but with the
136.BR FUTEX_PRIVATE_FLAG
137ORed into the constant value.
138Thus, there are
139.BR FUTEX_WAIT_PRIVATE ,
140.BR FUTEX_WAKE_PRIVATE ,
141and so on.
2e98bbc2
TG
142.TP
143.BR FUTEX_CLOCK_REALTIME " (since Linux 2.6.28)"
144.\" commit 1acdac104668a0834cfa267de9946fac7764d486
4a7e5b05 145This option bit can be employed only with the
2e98bbc2
TG
146.BR FUTEX_WAIT_BITSET
147and
148.BR FUTEX_WAIT_REQUEUE_PI
149operations (described below).
150
f2103b26
MK
151If this option is set, the kernel treats
152.I timeout
153as an absolute time based on
2e98bbc2
TG
154.BR CLOCK_REALTIME .
155
f2103b26
MK
156If this option is not set, the kernel treats
157.I timeout
158as relative time,
1c952cf5
MK
159.\" FIXME I added CLOCK_MONOTONIC here. Is it correct?
160measured against the
161.BR CLOCK_MONOTONIC
162clock.
6be4bad7
MK
163.PP
164The operation specified in
d33602c4 165.I futex_op
6be4bad7 166is one of the following:
fea681da 167.TP
81c9d87e
MK
168.BR FUTEX_WAIT " (since Linux 2.6.0)"
169.\" Strictly speaking, since some time in 2.5.x
f065673c
MK
170This operation tests that the value at the
171location pointed to by the futex address
fea681da
MK
172.I uaddr
173still contains the value
174.IR val ,
f065673c 175and then sleeps awaiting
682edefb 176.B FUTEX_WAKE
f065673c
MK
177on the futex address.
178The test and sleep steps are performed atomically.
179If the futex value does not match
180.IR val ,
4710334a 181then the call fails immediately with the error
badbf70c 182.BR EAGAIN .
f065673c
MK
183.\" FIXME I added the following sentence. Please confirm that it is correct.
184The purpose of the test step is to detect races where
185another process changes that value of the futex between
186the time it was last checked and the time of the
187.BR FUTEX_WAIT
63d3f911 188operation.
1909e523 189
c13182ef 190If the
fea681da 191.I timeout
1c952cf5
MK
192argument is non-NULL, its contents specify a relative timeout for the wait
193.\" FIXME I added CLOCK_MONOTONIC here. Is it correct?
194measured according to the
195.BR CLOCK_MONOTONIC
196clock.
82a6092b
MK
197(This interval will be rounded up to the system clock granularity,
198and kernel scheduling delays mean that the
199blocking interval may overrun by a small amount.)
200If
201.I timeout
202is NULL, the call blocks indefinitely.
4798a7f3 203
c13182ef 204The arguments
fea681da
MK
205.I uaddr2
206and
207.I val3
208are ignored.
209
210For
a8bda636 211.BR futex (7),
fea681da
MK
212this call is executed if decrementing the count gave a negative value
213(indicating contention), and will sleep until another process releases
682edefb
MK
214the futex and executes the
215.B FUTEX_WAKE
216operation.
fea681da 217.TP
81c9d87e
MK
218.BR FUTEX_WAKE " (since Linux 2.6.0)"
219.\" Strictly speaking, since Linux 2.5.x
f065673c
MK
220This operation wakes at most
221.I val
222processes waiting (i.e., inside
223.BR FUTEX_WAIT )
224on the futex at the address
225.IR uaddr .
226Most commonly,
227.I val
228is specified as either 1 (wake up a single waiter) or
229.BR INT_MAX
230(wake up all waiters).
730bfbda
MK
231.\" FIXME Please confirm that the following is correct:
232No guarantee is provided about which waiters are awoken
233(e.g., a waiter with a higher scheduling priority is not guaranteed
234to be awoken in preference to a waiter with a lower priority).
4798a7f3 235
fea681da
MK
236The arguments
237.IR timeout ,
238.I uaddr2
239and
240.I val3
241are ignored.
242
243For
a8bda636 244.BR futex (7),
fea681da
MK
245this is executed if incrementing
246the count showed that there were waiters, once the futex value has been set
247to 1 (indicating that it is available).
a7c2bf45
MK
248.TP
249.BR FUTEX_FD " (from Linux 2.6.0 up to and including Linux 2.6.25)"
250.\" Strictly speaking, from Linux 2.5.x to 2.6.25
251This operation creates a file descriptor that is associated with the futex at
252.IR uaddr .
253.\" , suitable for .BR poll (2).
254The calling process must close the returned file descriptor after use.
255When another process performs a
256.BR FUTEX_WAKE
257on the futex, the file descriptor indicates as being readable with
258.BR select (2),
259.BR poll (2),
260and
261.BR epoll (7)
262
263The file descriptor can be used to obtain asynchronous notifications:
264if
265.I val
266is nonzero, then when another process executes a
267.BR FUTEX_WAKE ,
268the caller will receive the signal number that was passed in
269.IR val .
270
271The arguments
272.IR timeout ,
273.I uaddr2
274and
275.I val3
276are ignored.
277
278To prevent race conditions, the caller should test if the futex has
279been upped after
280.B FUTEX_FD
281returns.
282
283Because it was inherently racy,
284.B FUTEX_FD
285has been removed
286.\" commit 82af7aca56c67061420d618cc5a30f0fd4106b80
287from Linux 2.6.26 onward.
288.TP
289.BR FUTEX_REQUEUE " (since Linux 2.6.0)"
290.\" Strictly speaking: from Linux 2.5.70
291.\"
292.\" FIXME I added this warning. Okay?
293.IR "Avoid using this operation" .
294It is broken (unavoidably racy) for its intended purpose.
295Use
296.BR FUTEX_CMP_REQUEUE
297instead.
298
299This operation performs the same task as
300.BR FUTEX_CMP_REQUEUE ,
301except that no check is made using the value in
302.IR val3 .
303(The argument
304.I val3
305is ignored.)
306.TP
307.BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)"
308This operation was added as a replacement for the earlier
309.BR FUTEX_REQUEUE ,
310because that operation was racy for its intended use.
311
312As with
313.BR FUTEX_REQUEUE ,
314the
315.BR FUTEX_CMP_REQUEUE
316operation is used to avoid a "thundering herd" effect when
317.B FUTEX_WAKE
318is used and all processes woken up need to acquire another futex.
319It differs from
320.BR FUTEX_REQUEUE
321in that it first checks whether the location
322.I uaddr
323still contains the value
324.IR val3 .
325If not, the operation fails with the error
326.BR EAGAIN .
327.\" FIXME I added the following sentence on rational for FUTEX_CMP_REQUEUE.
328.\" Is it correct? SHould it be expanded?
329This additional feature of
330.BR FUTEX_CMP_REQUEUE
331can be used by the caller to (atomically) detect changes
332in the value of the target futex at
333.IR uaddr2 .
334
335The operation wakes up a maximum of
336.I val
337waiters that are waiting on the futex at
338.IR uaddr .
339If there are more than
340.I val
341waiters, then the remaining waiters are removed
342from the wait queue of the source futex at
343.I uaddr
344and added to the wait queue of the target futex at
345.IR uaddr2 .
346The
347.I timeout
348argument is (ab)used to specify a cap on the number of waiters
349that are requeued to the futex at
350.IR uaddr2 ;
351the kernel casts the
352.I timeout
353value to
354.IR u32 .
355
356.\" FIXME Please review the following new paragraph to see if it is
357.\" accurate.
358Typical values to specify for
359.I val
360are 0 or or 1.
361(Specifying
362.BR INT_MAX
363is not useful, because it would make the
364.BR FUTEX_CMP_REQUEUE
365operation equivalent to
366.BR FUTEX_WAKE .)
367The cap value specified via the (abused)
368.I timeout
369argument is typically either 1 or
370.BR INT_MAX .
371(Specifying the argument as 0 is not useful, because it would make the
372.BR FUTEX_CMP_REQUEUE
373operation equivalent to
374.BR FUTEX_WAIT .)
6bac3b85
MK
375.\"
376.\" FIXME I added some FUTEX_WAKE_OP text, and I'd be happy if someone
377.\" checked it.
fea681da 378.TP
d67e21f5
MK
379.BR FUTEX_WAKE_OP " (since Linux 2.6.14)"
380.\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721
6bac3b85
MK
381.\" Author: Jakub Jelinek <jakub@redhat.com>
382.\" Date: Tue Sep 6 15:16:25 2005 -0700
383This operation was added to support some user-space use cases
384where more than one futex must be handled at the same time.
385The most notable example is the implementation of
386.BR pthread_cond_signal (3),
387which requires operations on two futexes,
388the one used to implement the mutex and the one used in the implementation
389of the wait queue associated with the condition variable.
390.BR FUTEX_WAKE_OP
391allows such cases to be implemented without leading to
392high rates of contention and context switching.
393
394The
395.BR FUTEX_WAIT_OP
396operation is equivalent to atomically executing the following code:
397
398.in +4n
399.nf
400int oldval = *(int *) uaddr2;
401*(int *) uaddr2 = oldval \fIop\fP \fIoparg\fP;
402futex(uaddr, FUTEX_WAKE, val, 0, 0, 0);
403if (oldval \fIcmp\fP \fIcmparg\fP)
404 futex(uaddr2, FUTEX_WAKE, nr_wake2, 0, 0, 0);
405.fi
406.in
407
408In other words,
409.BR FUTEX_WAIT_OP
410does the following:
411.RS
412.IP * 3
413saves the original value of the futex at
414.IR uaddr2 ;
415.IP *
416performs an operation to modify the value of the futex at
417.IR uaddr2 ;
418.IP *
419wakes up a maximum of
420.I val
421waiters on the futex
422.IR uaddr ;
423and
424.IP *
425dependent on the results of a test of the original value of the futex at
426.IR uaddr2 ,
427wakes up a maximum of
428.I nr_wake2
429waiters on the futex
430.IR uaddr2 .
431.RE
432.IP
433The
434.I nr_wake2
435value is actually the
436.BR futex ()
437.I timeout
438argument (ab)used to specify how many of the waiters on the futex at
439.IR uaddr2
440are to be woken up;
441the kernel casts the
442.I timeout
443value to
444.IR u32 .
445
446The operation and comparison that are to be performed are encoded
447in the bits of the argument
448.IR val3 .
449Pictorially, the encoding is:
450
451.in +4n
452.nf
453 +-----+-----+---------------+---------------+
454 | op | cmp | oparg | cmparg |
455 +-----+-----+---------------+---------------+
456# of bits: 4 4 12 12
457
458.fi
459.in
460
461Expressed in code, the encoding is:
462
463.in +4n
464.nf
465#define FUTEX_OP(op, oparg, cmp, cmparg) \\
466 (((op & 0xf) << 28) | \\
467 ((cmp & 0xf) << 24) | \\
468 ((oparg & 0xfff) << 12) | \\
469 (cmparg & 0xfff))
470.fi
471.in
472
473In the above,
474.I op
475and
476.I cmp
477are each one of the codes listed below.
478The
479.I oparg
480and
481.I cmparg
482components are literal numeric values, except as noted below.
483
484The
485.I op
486component has one of the following values:
487
488.in +4n
489.nf
490FUTEX_OP_SET 0 /* uaddr2 = oparg; */
491FUTEX_OP_ADD 1 /* uaddr2 += oparg; */
492FUTEX_OP_OR 2 /* uaddr2 |= oparg; */
493FUTEX_OP_ANDN 3 /* uaddr2 &= ~oparg; */
494FUTEX_OP_XOR 4 /* uaddr2 ^= oparg; */
495.fi
496.in
497
498In addition, bit-wise ORing the following value into
499.I op
500causes
501.IR "(1\ <<\ oparg)"
502to be used as the operand:
503
504.in +4n
505.nf
506FUTEX_OP_ARG_SHIFT 8 /* Use (1 << oparg) as operand */
507.fi
508.in
509
510The
511.I cmp
512field is one of the following:
513
514.in +4n
515.nf
516FUTEX_OP_CMP_EQ 0 /* if (oldval == cmparg) wake */
517FUTEX_OP_CMP_NE 1 /* if (oldval != cmparg) wake */
518FUTEX_OP_CMP_LT 2 /* if (oldval < cmparg) wake */
519FUTEX_OP_CMP_LE 3 /* if (oldval <= cmparg) wake */
520FUTEX_OP_CMP_GT 4 /* if (oldval > cmparg) wake */
521FUTEX_OP_CMP_GE 5 /* if (oldval >= cmparg) wake */
522.fi
523.in
524
525The return value of
526.BR FUTEX_WAKE_OP
527is the sum of the number of waiters woken on the futex
528.IR uaddr
529plus the number of waiters woken on the futex
530.IR uaddr2 .
d67e21f5 531.TP
79c9b436
TG
532.BR FUTEX_WAIT_BITSET " (since Linux 2.6.25)"
533.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
fd9e59d4 534This operation is like
79c9b436
TG
535.BR FUTEX_WAIT
536except that
537.I val3
538is used to provide a 32-bit bitset to the kernel.
539This bitset is stored in the kernel-internal state of the waiter.
540See the description of
541.BR FUTEX_WAKE_BITSET
542for further details.
543
fd9e59d4
MK
544The
545.BR FUTEX_WAIT_BITSET
546also interprets the
547.I timeout
548argument differently from
549.BR FUTEX_WAIT .
550See the discussion of
551.BR FUTEX_CLOCK_REALTIME ,
552above.
553
79c9b436
TG
554The
555.I uaddr2
556argument is ignored.
557.TP
d67e21f5
MK
558.BR FUTEX_WAKE_BITSET " (since Linux 2.6.25)"
559.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
55cc422d
TG
560This operation is the same as
561.BR FUTEX_WAKE
562except that the
563.I val3
564argument is used to provide a 32-bit bitset to the kernel.
98d769c0
MK
565This bitset is used to select which waiters should be woken up.
566The selection is done by a bit-wise AND of the "wake" bitset
567(i.e., the value in
568.IR val3 )
569and the bitset which is stored in the kernel-internal
09cb4ce7 570state of the waiter (the "wait" bitset that is set using
98d769c0
MK
571.BR FUTEX_WAIT_BITSET ).
572All of the waiters for which the result of the AND is nonzero are woken up;
573the remaining waiters are left sleeping.
574
e9d4496b
MK
575.\" FIXME please review this paragraph that I added
576The effect of
577.BR FUTEX_WAIT_BITSET
578and
579.BR FUTEX_WAKE_BITSET
580is to allow selective wake-ups among multiple waiters that are waiting
581on the same futex;
582since a futex has a size of 32 bits,
583these operations provide 32 wakeup "channels".
584(The
585.BR FUTEX_WAIT
586and
587.BR FUTEX_WAKE
588operations correspond to
589.BR FUTEX_WAIT_BITSET
590and
591.BR FUTEX_WAKE_BITSET
592operations where the bitsets are all ones.)
09cb4ce7 593Note, however, that using this bitset multiplexing feature on a
e9d4496b
MK
594futex is less efficient than simply using multiple futexes,
595because employing bitset multiplexing requires the kernel
596to check all waiters on a futex,
597including those that are not interested in being woken up
598(i.e., they do not have the relevant bit set in their "wait" bitset).
599.\" According to http://locklessinc.com/articles/futex_cheat_sheet/:
600.\"
601.\" "The original reason for the addition of these extensions
602.\" was to improve the performance of pthread read-write locks
603.\" in glibc. However, the pthreads library no longer uses the
604.\" same locking algorithm, and these extensions are not used
605.\" without the bitset parameter being all ones.
606.\"
607.\" The page goes on to note that the FUTEX_WAIT_BITSET operation
608.\" is nevertheless used (with a bitset of all ones) in order to
609.\" obtain the absolute timeout functionality that is useful
610.\" for efficiently implementing Pthreads APIs (which use absolute
611.\" timeouts); FUTEX_WAIT provides only relative timeouts.
612
98d769c0
MK
613The
614.I uaddr2
615and
616.I timeout
617arguments are ignored.
bd90a5f9
MK
618.\"
619.\"
620.SS Priority-inheritance futexes
b52e1cd4
MK
621Linux supports priority-inheritance (PI) futexes in order to handle
622priority-inversion problems that can be encountered with
623normal futex locks.
79d918c7
MK
624.\"
625.\" FIXME ===== Start of adapted Hart/Guniguntala text =====
626.\" The following text is drawn from the Hart/Guniguntala paper,
627.\" but I have reworded some pieces significantly. Please check it.
628.\"
629The PI futex operations described below differ from the other
630futex operations in that they impose policy on the use of the futex value:
631.IP * 3
7c16fbff 632If the lock is unowned, the futex value shall be 0.
79d918c7
MK
633.IP *
634If the lock is owned, the futex value shall be the thread ID (TID; see
635.BR gettid (2))
636of the owning thread.
637.IP *
638.\" FIXME In the following line, I added "the lock is owned and". Okay?
639If the lock is owned and there are threads contending for the lock,
640then the
641.B FUTEX_WAITERS
642bit shall be set in the futex value; in other words, the futex value is:
643
644 FUTEX_WAITERS | TID
645.PP
646With this policy in place,
647a user-space application can acquire an unowned
b52e1cd4 648lock or release an uncontended lock using a atomic
79d918c7 649.\" FIXME In the following line, I added "user-space". Okay?
b52e1cd4
MK
650user-space instructions (e.g.,
651.I cmpxchg
652on the x86 architecture).
653Locking an unowned lock simply consists of setting
654the futex value to the caller's TID.
655Releasing an uncontended lock simply requires setting the futex value to 0.
656
657If a futex is currently owned (i.e., has a nonzero value),
658waiters must employ the
79d918c7
MK
659.B FUTEX_LOCK_PI
660operation to acquire the lock.
b52e1cd4 661If a lock is contended (i.e., the
79d918c7 662.B FUTEX_WAITERS
b52e1cd4 663bit is set in the futex value), the lock owner must employ the
79d918c7 664.B FUTEX_UNLOCK_PI
b52e1cd4
MK
665operation to release the lock.
666
79d918c7
MK
667In the cases where callers are forced into the kernel
668(i.e., required to perform a
669.BR futex ()
670operation),
671they then deal directly with a so-called RT-mutex,
672a kernel locking mechanism which implements the required
673priority-inheritance semantics.
674After the RT-mutex is acquired, the futex value is updated accordingly,
675before the calling thread returns to user space.
676.\" FIXME ===== End of adapted Hart/Guniguntala text =====
677
678It is important
679.\" FIXME We need some explanation here of why it is important to note this
680to note that the kernel will update the futex value prior
681to returning to user space.
682Unlike the other futex operations described above,
683the PI futex operations are designed
7c16fbff 684for the implementation of very specific IPC mechanisms).
fc57e6bb
MK
685.\"
686.\" FIXME We don't quite have a definition anywhere of what a PI futex
687.\" is (vs a non-PI futex). Below, we have the information of
688.\" FUTEX_CMP_REQUEUE_PI requeues from a non-PI futex to a
689.\" PI futex, but what determines whether the futex is of one
690.\" kind of the other? We should have such a definition somewhere
691.\" about here.
bd90a5f9
MK
692
693PI futexes are operated on by specifying one of the following values in
694.IR futex_op :
d67e21f5
MK
695.TP
696.BR FUTEX_LOCK_PI " (since Linux 2.6.18)"
697.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
67833bec
MK
698.\"
699.\" FIXME I did some significant rewording of tglx's text.
700.\" Please check, in case I injected errors.
701.\"
702This operation is used after after an attempt to acquire
703the futex lock via an atomic user-space instruction failed
704because the futex has a nonzero value\(emspecifically,
705because it contained the namespace-specific TID of the lock owner.
67259526 706.\" FIXME In the preceding line, what does "namespace-specific" mean?
67833bec 707.\" (I kept those words from tglx.)
67259526 708.\" That is, what kind of namespace are we talking about?
67833bec
MK
709.\" (I suppose we are talking PID namespaces here, but I want to
710.\" be sure.)
711
712The operation checks the value of the futex at the address
713.IR uaddr .
714If the value is 0, then the kernel tries to atomically set the futex value to
715the caller's TID.
716If that fails,
717.\" FIXME What would be the cause of failure?
718or the futex value is nonzero,
719the kernel atomically sets the
e0547e70 720.B FUTEX_WAITERS
67833bec
MK
721bit, which signals the futex owner that it cannot unlock the futex in
722user space atomically by setting the futex value to 0.
723After that, the kernel tries to find the thread which is
724associated with the owner TID,
725.\" FIXME Could I get a bit more detail on the next two lines?
726.\" What is "creates or reuses kernel state" about?
727creates or reuses kernel state on behalf of the owner
728and attaches the waiter to it.
67259526
MK
729.\" FIXME In the next line, what type of "priority" are we talking about?
730.\" Realtime priorities for SCHED_FIFO and SCHED_RR?
731.\" Or something else?
e0547e70
TG
732The enqueing of the waiter is in descending priority order if more
733than one waiter exists.
67259526 734.\" FIXME What does "bandwidth" refer to in the next line?
e0547e70 735The owner inherits either the priority or the bandwidth of the waiter.
67259526
MK
736.\" FIXME In the preceding line, what determines whether the
737.\" owner inherits the priority versus the bandwidth?
67833bec
MK
738.\"
739.\" FIXME Could I get some help translating the next sentence into
740.\" something that user-space developers (and I) can understand?
741.\" In particular, what are "nexted locks" in this context?
e0547e70
TG
742This inheritance follows the lock chain in the case of
743nested locking and performs deadlock detection.
744
9ce19cf1
MK
745.\" FIXME tglx says "The timeout argument is handled as described in
746.\" FUTEX_WAIT." However, it appears to me that this is not right.
747.\" Is the following formulation correct.
e0547e70
TG
748The
749.I timeout
9ce19cf1
MK
750argument provides a timeout for the lock attempt.
751It is interpreted as an absolute time, measured against the
752.BR CLOCK_REALTIME
753clock.
754If
755.I timeout
756is NULL, the operation will block indefinitely.
e0547e70 757
a449c634 758The
e0547e70
TG
759.IR uaddr2 ,
760.IR val ,
761and
762.IR val3
a449c634 763arguments are ignored.
fedaeaf3 764.\" FIXME
a9dcb4d1
MK
765.\" tglx noted the following "ERROR" case for FUTEX_LOCK_PI and
766.\" FUTEX_TRYLOCK_PI
767.\" > [EOWNERDIED] The owner of the futex died and the kernel made the
768.\" > caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit
769.\" > in the futex userspace value. Caller is responsible for cleanup
fedaeaf3 770.\"
a9dcb4d1 771.\" However, there is no such thing as an EOWNERDIED error. I had a look
fedaeaf3
MK
772.\" through the kernel source for the FUTEX_OWNER_DIED cases and didn't
773.\" see an obvious error associated with them. Can you clarify? (I think
774.\" the point is that this condition, which is described in
775.\" Documentation/robust-futexes.txt, is not an error as such. However,
776.\" I'm not yet sure of how to describe it in the man page.)
67833bec 777.\"
d67e21f5 778.TP
12fdbe23 779.BR FUTEX_TRYLOCK_PI " (since Linux 2.6.18)"
d67e21f5 780.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
12fdbe23
MK
781This operation tries to acquire the futex at
782.IR uaddr .
0b761826
MK
783.\" FIXME I think it would be helpful here to say a few more words about
784.\" the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI
fa0388c3 785It deals with the situation where the TID value at
12fdbe23
MK
786.I uaddr
787is 0, but the
b52e1cd4 788.B FUTEX_WAITERS
12fdbe23 789bit is set.
fa0388c3
MK
790.\" FIXME How does the situation in the previous sentence come about?
791.\" Probably it would be helpful to say something about that in
792.\" the man page.
badbf70c 793.\" FIXME And *how* does FUTEX_TRYLOCK_PI deal with this situation?
12fdbe23 794User space cannot handle this race free.
084744ef
MK
795
796The
797.IR uaddr2 ,
798.IR val ,
799.IR timeout ,
800and
801.IR val3
802arguments are ignored.
d67e21f5 803.TP
12fdbe23 804.BR FUTEX_UNLOCK_PI " (since Linux 2.6.18)"
d67e21f5 805.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
ecae2099
TG
806This operation wakes the top priority waiter which is waiting in
807.B FUTEX_LOCK_PI
808on the futex address provided by the
809.I uaddr
810argument.
811
812This is called when the user space value at
813.I uaddr
814cannot be changed atomically from a TID (of the owner) to 0.
815
816The
817.IR uaddr2 ,
818.IR val ,
819.IR timeout ,
820and
821.IR val3
11a194bf 822arguments are ignored.
d67e21f5 823.TP
d67e21f5
MK
824.BR FUTEX_CMP_REQUEUE_PI " (since Linux 2.6.31)"
825.\" commit 52400ba946759af28442dee6265c5c0180ac7122
826.\" FIXME to complete
f812a08b
DH
827This operation is a PI-aware variant of
828.BR FUTEX_CMP_REQUEUE .
829It requeues waiters that are blocked via
830.B FUTEX_WAIT_REQUEUE_PI
831on
832.I uaddr
833from a non-PI source futex
834.RI ( uaddr )
835to a PI target futex
836.RI ( uaddr2 ).
837
9e54d26d
MK
838As with
839.BR FUTEX_CMP_REQUEUE ,
840this operation wakes up a maximum of
841.I val
842waiters that are waiting on the futex at
843.IR uaddr .
844However, for
845.BR FUTEX_CMP_REQUEUE_PI ,
846.I val
847is required to be 1.
848The remaining waiters are removed from the wait queue of the source futex at
849.I uaddr
850and added to the wait queue of the target futex at
851.IR uaddr2 .
f812a08b 852
9e54d26d
MK
853The
854.I val3
855and
856.I timeout
857arguments serve the same purposes as for
858.BR FUTEX_CMP_REQUEUE .
d67e21f5
MK
859.TP
860.BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)"
861.\" commit 52400ba946759af28442dee6265c5c0180ac7122
862.\" FIXME to complete
dd218aaa
MK
863.\"
864.\" FIXME Employs 'timeout' argument, supports FUTEX_CLOCK_REALTIME
865.\" 'timeout' can be NULL
866.\"
d67e21f5 867[As yet undocumented]
47297adb 868.SH RETURN VALUE
fea681da 869.PP
6f147f79 870In the event of an error, all operations return \-1 and set
e808bba0 871.I errno
6f147f79 872to indicate the cause of the error.
e808bba0
MK
873The return value on success depends on the operation,
874as described in the following list:
fea681da
MK
875.TP
876.B FUTEX_WAIT
682edefb
MK
877Returns 0 if the process was woken by a
878.B FUTEX_WAKE
7446a837
MK
879or
880.B FUTEX_WAKE_BITSET
682edefb 881call.
fea681da
MK
882.TP
883.B FUTEX_WAKE
884Returns the number of processes woken up.
885.TP
886.B FUTEX_FD
887Returns the new file descriptor associated with the futex.
888.TP
889.B FUTEX_REQUEUE
890Returns the number of processes woken up.
891.TP
892.B FUTEX_CMP_REQUEUE
3dfcc11d
MK
893Returns the total number of processes woken up or requeued to the futex at
894.IR uaddr2 .
895If this value is greater than
896.IR val ,
897then difference is the number of waiters requeued to the futex at
898.IR uaddr2 .
519f2c3d
MK
899.\"
900.\" FIXME Add success returns for other operations
dcad19c0
MK
901.TP
902.B FUTEX_WAKE_OP
a8b5b324
MK
903.\" FIXME Is the following correct?
904Returns the total number of waiters that were woken up.
905This is the sum of the woken waiters on the two futexes at
906.I uaddr
907and
908.IR uaddr2 .
dcad19c0
MK
909.TP
910.B FUTEX_WAIT_BITSET
7bcc5351
MK
911.\" FIXME Is the following correct?
912Returns 0 if the process was woken by a
913.B FUTEX_WAKE
914or
915.B FUTEX_WAKE_BITSET
916call.
dcad19c0
MK
917.TP
918.B FUTEX_WAKE_BITSET
b884566b
MK
919.\" FIXME Is the following correct?
920Returns the number of processes woken up.
dcad19c0
MK
921.TP
922.B FUTEX_LOCK_PI
bf02a260
MK
923.\" FIXME Is the following correct?
924Returns 0 if the futex was successfully locked.
dcad19c0
MK
925.TP
926.B FUTEX_TRYLOCK_PI
5c716eef
MK
927.\" FIXME Is the following correct?
928Returns 0 if the futex was successfully locked.
dcad19c0
MK
929.TP
930.B FUTEX_UNLOCK_PI
52bb928f
MK
931.\" FIXME Is the following correct?
932Returns 0 if the futex was successfully unlocked.
dcad19c0
MK
933.TP
934.B FUTEX_CMP_REQUEUE_PI
dddd395a
MK
935.\" FIXME Is the following correct?
936Returns the total number of processes woken up or requeued to the futex at
937.IR uaddr2 .
938If this value is greater than
939.IR val ,
940then difference is the number of waiters requeued to the futex at
941.IR uaddr2 .
dcad19c0
MK
942.TP
943.B FUTEX_WAIT_REQUEUE_PI
22c15de9
MK
944.\" FIXME Is the following correct?
945Returns 0 if the caller was successfully requeued to the futex at
946.IR uaddr2 .
fea681da
MK
947.SH ERRORS
948.TP
949.B EACCES
950No read access to futex memory.
951.TP
952.B EAGAIN
badbf70c
MK
953.RB ( FUTEX_WAIT )
954The value pointed to by
955.I uaddr
956was not equal to the expected value
957.I val
958at the time of the call.
959.TP
960.B EAGAIN
682edefb 961.B FUTEX_CMP_REQUEUE
e808bba0 962detected that the value pointed to by
9f6c40c0
МК
963.I uaddr
964is not equal to the expected value
965.IR val3 .
fd1dc4c2 966.\" FIXME: Is the following sentence correct?
fea681da 967(This probably indicates a race;
682edefb
MK
968use the safe
969.B FUTEX_WAKE
970now.)
c0091dd3
MK
971.\"
972.\" FIXME Should there be an EAGAIN case for FUTEX_TRYLOCK_PI?
973.\" It seems so, looking at the handling of the rt_mutex_trylock()
974.\" call in futex_lock_pi()
975.\"
fea681da 976.TP
5662f56a
MK
977.BR EAGAIN
978.RB ( FUTEX_LOCK_PI ,
979.BR FUTEX_TRYLOCK_PI )
980The futex owner thread ID is about to exit,
981but has not yet handled the internal state cleanup.
982Try again.
61f8c1d1
MK
983.\"
984.\" FIXME Is there not also an EAGAIN error case on 'uaddr2' for
985.\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
986.\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
987.\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EAGAIN?
5662f56a 988.TP
7a39e745
MK
989.BR EDEADLK
990.RB ( FUTEX_LOCK_PI ,
991.BR FUTEX_TRYLOCK_PI )
992The futex at
993.I uaddr
994is already locked by the caller.
d08ce5dd
MK
995.\"
996.\" FIXME Is there not also an EDEADLK error case on 'uaddr2' for
997.\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
998.\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
999.\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EDEADLK?
7a39e745 1000.TP
fea681da 1001.B EFAULT
1ea901e8
MK
1002A required pointer argument (i.e.,
1003.IR uaddr ,
1004.IR uaddr2 ,
1005or
1006.IR timeout )
496df304 1007did not point to a valid user-space address.
fea681da 1008.TP
9f6c40c0 1009.B EINTR
e808bba0 1010A
9f6c40c0 1011.B FUTEX_WAIT
2674f781
MK
1012or
1013.B FUTEX_WAIT_BITSET
e808bba0
MK
1014operation was interrupted by a signal (see
1015.BR signal (7))
1016or a spurious wakeup.
9f6c40c0 1017.TP
fea681da 1018.B EINVAL
180f97b7
MK
1019The operation in
1020.IR futex_op
1021is one of those that employs a timeout, but the supplied
fb2f4c27
MK
1022.I timeout
1023argument was invalid
1024.RI ( tv_sec
1025was less than zero, or
1026.IR tv_nsec
1027was not less than 1000,000,000).
1028.TP
1029.B EINVAL
0c74df0b
MK
1030The operation specified in
1031.BR futex_op
1032employs one or both of the pointers
51ee94be 1033.I uaddr
a1f47699 1034and
0c74df0b
MK
1035.IR uaddr2 ,
1036but one of these does not point to a valid object\(emthat is,
1037the address is not four-byte-aligned.
51ee94be
MK
1038.TP
1039.B EINVAL
bae14b6c 1040.RB ( FUTEX_WAKE ,
5447735d 1041.BR FUTEX_WAKE_OP ,
98d769c0 1042.BR FUTEX_WAKE_BITSET ,
e169277f
MK
1043.BR FUTEX_REQUEUE ,
1044.BR FUTEX_CMP_REQUEUE )
496df304 1045The kernel detected an inconsistency between the user-space state at
9534086b
TG
1046.I uaddr
1047and the kernel state\(emthat is, it detected a waiter which waits in
5447735d
MK
1048.BR FUTEX_LOCK_PI
1049on
1050.IR uaddr .
9534086b
TG
1051.TP
1052.B EINVAL
55cc422d
TG
1053.RB ( FUTEX_WAIT_BITSET ,
1054.BR FUTEX_WAKE_BITSET )
79c9b436
TG
1055The bitset supplied in
1056.IR val3
1057is zero.
1058.TP
1059.B EINVAL
add875c0
MK
1060.RB ( FUTEX_REQUEUE )
1061.\" FIXME tglx suggested adding this, but does this error really
1062.\" occur for FUTEX_REQUEUE?
1063.I uaddr
1064equals
1065.IR uaddr2
1066(i.e., an attempt was made to requeue to the same futex).
1067.TP
ff597681
MK
1068.BR EINVAL
1069.RB ( FUTEX_FD )
1070The signal number supplied in
1071.I val
1072is invalid.
1073.TP
6bac3b85 1074.B EINVAL
a218ef20 1075.RB ( FUTEX_LOCK_PI ,
ce022f18
MK
1076.BR FUTEX_TRYLOCK_PI ,
1077.BR FUTEX_UNLOCK_PI )
a218ef20
MK
1078The kernel detected an inconsistency between the user-space state at
1079.I uaddr
1080and the kernel state.
ce022f18
MK
1081This indicates either state corruption
1082.\" FIXME tglx did not mention the "state corruption" for FUTEX_UNLOCK_PI.
1083.\" Does that case also apply for FUTEX_UNLOCK_PI?
1084or that the kernel found a waiter on
a218ef20
MK
1085.I uaddr
1086which is waiting via
1087.BR FUTEX_WAIT
1088or
1089.BR FUTEX_WAIT_BITSET .
1090.TP
1091.B EINVAL
4832b48a 1092Invalid argument.
fea681da 1093.TP
a449c634
MK
1094.BR ENOMEM
1095.RB ( FUTEX_LOCK_PI ,
e34a8fb6
MK
1096.BR FUTEX_TRYLOCK_PI ,
1097.BR FUTEX_CMP_REQUEUE_PI )
a449c634
MK
1098The kernel could not allocate memory to hold state information.
1099.TP
fea681da 1100.B ENFILE
ff597681 1101.RB ( FUTEX_FD )
fea681da 1102The system limit on the total number of open files has been reached.
4701fc28
MK
1103.TP
1104.B ENOSYS
1105Invalid operation specified in
d33602c4 1106.IR futex_op .
9f6c40c0 1107.TP
4a7e5b05
MK
1108.B ENOSYS
1109The
1110.BR FUTEX_CLOCK_REALTIME
1111option was specified in
1afcee7c 1112.IR futex_op ,
4a7e5b05
MK
1113but the accompanying operation was neither
1114.BR FUTEX_WAIT_BITSET
1115nor
1116.BR FUTEX_WAIT_REQUEUE_PI .
1117.TP
a9dcb4d1
MK
1118.BR ENOSYS
1119.RB ( FUTEX_LOCK_PI ,
f2424fae
MK
1120.BR FUTEX_TRYLOCK_PI ,
1121.BR FUTEX_UNLOCK_PI )
a9dcb4d1
MK
1122A run-time check determined that the operation not available.
1123.BR FUTEX_LOCK_PI
1124and
1125.BR FUTEX_TRYLOCK_PI
1126are not implemented on all architectures and
1127not supported on some CPU variants.
1128.TP
c7589177
MK
1129.BR EPERM
1130.RB ( FUTEX_LOCK_PI ,
1131.BR FUTEX_TRYLOCK_PI )
1132The caller is not allowed to attach itself to the futex.
1133(This may be caused by a state corruption in user space.)
61f8c1d1
MK
1134.\"
1135.\" FIXME Is there not also an EPERM error case on 'uaddr2' for
1136.\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
1137.\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
1138.\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EPERM?
c7589177 1139.TP
76f347ba
MK
1140.BR EPERM
1141.BR FUTEX_UNLOCK_PI
1142The caller does not own the futex.
1143.TP
0b0e4934
MK
1144.BR ESRCH
1145.RB ( FUTEX_LOCK_PI ,
1146.BR FUTEX_TRYLOCK_PI )
1147.\" FIXME I reworded the following sentence a bit differently from
1148.\" tglx's formulation. Is it okay?
1149The thread ID in the futex at
1150.I uaddr
1151does not exist.
61f8c1d1
MK
1152.\"
1153.\" FIXME Is there not also an ESRCH error case on 'uaddr2' for
1154.\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
1155.\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
1156.\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> ESRCH?
0b0e4934 1157.TP
9f6c40c0 1158.B ETIMEDOUT
4d85047f
MK
1159The operation in
1160.IR futex_op
1161employed the timeout specified in
1162.IR timeout ,
1163and the timeout expired before the operation completed.
47297adb 1164.SH VERSIONS
a1d5f77c 1165.PP
81c9d87e
MK
1166Futexes were first made available in a stable kernel release
1167with Linux 2.6.0.
1168
a1d5f77c
MK
1169Initial futex support was merged in Linux 2.5.7 but with different semantics
1170from what was described above.
52dee70e 1171A four-argument system call with the semantics
fd3fa7ef 1172described in this page was introduced in Linux 2.5.40.
11b520ed 1173In Linux 2.5.70, one argument
a1d5f77c 1174was added.
11b520ed 1175In Linux 2.6.7, a sixth argument was added\(emmessy, especially
a1d5f77c 1176on the s390 architecture.
47297adb 1177.SH CONFORMING TO
8382f16d 1178This system call is Linux-specific.
47297adb 1179.SH NOTES
fea681da 1180.PP
fcdad7d6 1181To reiterate, bare futexes are not intended as an easy-to-use abstraction
c13182ef 1182for end-users.
fcdad7d6 1183(There is no wrapper function for this system call in glibc.)
c13182ef 1184Implementors are expected to be assembly literate and to have
7fac88a9 1185read the sources of the futex user-space library referenced below.
d282bb24 1186.\" .SH AUTHORS
fea681da
MK
1187.\" .PP
1188.\" Futexes were designed and worked on by
1189.\" Hubertus Franke (IBM Thomas J. Watson Research Center),
1190.\" Matthew Kirkwood, Ingo Molnar (Red Hat)
1191.\" and Rusty Russell (IBM Linux Technology Center).
1192.\" This page written by bert hubert.
47297adb 1193.SH SEE ALSO
9913033c 1194.BR get_robust_list (2),
d806bc05 1195.BR restart_syscall (2),
14d8dd3b 1196.BR futex (7)
fea681da 1197.PP
f5ad572f
MK
1198The following kernel source files:
1199.IP * 2
1200.I Documentation/pi-futex.txt
1201.IP *
1202.I Documentation/futex-requeue-pi.txt
1203.IP *
1204.I Documentation/locking/rt-mutex.txt
1205.IP *
1206.I Documentation/locking/rt-mutex-design.txt
43b99089 1207.PP
52087dd3 1208\fIFuss, Futexes and Furwocks: Fast Userlevel Locking in Linux\fP
9b936e9e
MK
1209(proceedings of the Ottawa Linux Symposium 2002), online at
1210.br
608bf950
SK
1211.UR http://kernel.org\:/doc\:/ols\:/2002\:/ols2002-pages-479-495.pdf
1212.UE
f42eb21b 1213
0483b6cc
MK
1214\fIRequeue-PI: Making Glibc Condvars PI-Aware\fP
1215(2009 Real-Time Linux Workshop)
1216.UR http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
1217.UE
1218
f42eb21b
MK
1219\fIFutexes Are Tricky\fP (updated in 2011), Ulrich Drepper
1220.UR http://www.akkadia.org/drepper/futex.pdf
1221.UE
9b936e9e
MK
1222.PP
1223Futex example library, futex-*.tar.bz2 at
1224.br
a605264d 1225.UR ftp://ftp.kernel.org\:/pub\:/linux\:/kernel\:/people\:/rusty/
608bf950 1226.UE
34f14794
MK
1227.\"
1228.\" FIXME Are there any other resources that should be listed
1229.\" in the SEE ALSO section?