]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/open.2
lseek.2: Remove slightly bogus advice about race conditions
[thirdparty/man-pages.git] / man2 / open.2
CommitLineData
fea681da 1.\" This manpage is Copyright (C) 1992 Drew Eckhardt;
fd185f58
MK
2.\" and Copyright (C) 1993 Michael Haardt, Ian Jackson.
3.\" and Copyright (C) 2008 Greg Banks
7b8ba76c 4.\" and Copyright (C) 2006, 2008, 2013, 2014 Michael Kerrisk <mtk.manpages@gmail.com>
fea681da 5.\"
93015253 6.\" %%%LICENSE_START(VERBATIM)
fea681da
MK
7.\" Permission is granted to make and distribute verbatim copies of this
8.\" manual provided the copyright notice and this permission notice are
9.\" preserved on all copies.
10.\"
11.\" Permission is granted to copy and distribute modified versions of this
12.\" manual under the conditions for verbatim copying, provided that the
13.\" entire resulting derived work is distributed under the terms of a
14.\" permission notice identical to this one.
c13182ef 15.\"
fea681da
MK
16.\" Since the Linux kernel and libraries are constantly changing, this
17.\" manual page may be incorrect or out-of-date. The author(s) assume no
18.\" responsibility for errors or omissions, or for damages resulting from
19.\" the use of the information contained herein. The author(s) may not
20.\" have taken the same level of care in the production of this manual,
21.\" which is licensed free of charge, as they might when working
22.\" professionally.
c13182ef 23.\"
fea681da
MK
24.\" Formatted or processed versions of this manual, if unaccompanied by
25.\" the source, must acknowledge the copyright and authors of this work.
4b72fb64 26.\" %%%LICENSE_END
fea681da
MK
27.\"
28.\" Modified 1993-07-21 by Rik Faith <faith@cs.unc.edu>
29.\" Modified 1994-08-21 by Michael Haardt
30.\" Modified 1996-04-13 by Andries Brouwer <aeb@cwi.nl>
31.\" Modified 1996-05-13 by Thomas Koenig
32.\" Modified 1996-12-20 by Michael Haardt
33.\" Modified 1999-02-19 by Andries Brouwer <aeb@cwi.nl>
34.\" Modified 1998-11-28 by Joseph S. Myers <jsm28@hermes.cam.ac.uk>
35.\" Modified 1999-06-03 by Michael Haardt
c11b1abf
MK
36.\" Modified 2002-05-07 by Michael Kerrisk <mtk.manpages@gmail.com>
37.\" Modified 2004-06-23 by Michael Kerrisk <mtk.manpages@gmail.com>
1c1e15ed
MK
38.\" 2004-12-08, mtk, reordered flags list alphabetically
39.\" 2004-12-08, Martin Pool <mbp@sourcefrog.net> (& mtk), added O_NOATIME
fe75ec04 40.\" 2007-09-18, mtk, Added description of O_CLOEXEC + other minor edits
447bb15e 41.\" 2008-01-03, mtk, with input from Trond Myklebust
f4b9d6a5
MK
42.\" <trond.myklebust@fys.uio.no> and Timo Sirainen <tss@iki.fi>
43.\" Rewrite description of O_EXCL.
ddc4d339
MK
44.\" 2008-01-11, Greg Banks <gnb@melbourne.sgi.com>: add more detail
45.\" on O_DIRECT.
d77eb764 46.\" 2008-02-26, Michael Haardt: Reorganized text for O_CREAT and mode
fea681da 47.\"
61b7c1e1 48.\" FIXME . Apr 08: The next POSIX revision has O_EXEC, O_SEARCH, and
9f91e36c
MK
49.\" O_TTYINIT. Eventually these may need to be documented. --mtk
50.\"
35deeb87 51.TH OPEN 2 2016-12-12 "Linux" "Linux Programmer's Manual"
fea681da 52.SH NAME
7b8ba76c 53open, openat, creat \- open and possibly create a file
fea681da
MK
54.SH SYNOPSIS
55.nf
56.B #include <sys/types.h>
57.B #include <sys/stat.h>
58.B #include <fcntl.h>
59.sp
60.BI "int open(const char *" pathname ", int " flags );
61.BI "int open(const char *" pathname ", int " flags ", mode_t " mode );
5895e7eb 62
fea681da 63.BI "int creat(const char *" pathname ", mode_t " mode );
7b8ba76c
MK
64.sp
65.BI "int openat(int " dirfd ", const char *" pathname ", int " flags );
66.BI "int openat(int " dirfd ", const char *" pathname ", int " flags \
67", mode_t " mode );
fea681da 68.fi
7b8ba76c
MK
69.sp
70.in -4n
71Feature Test Macro Requirements for glibc (see
72.BR feature_test_macros (7)):
73.in
74.sp
75.BR openat ():
76.PD 0
77.ad l
78.RS 4
79.TP 4
80Since glibc 2.10:
b0da7b8b 81_POSIX_C_SOURCE\ >=\ 200809L
7b8ba76c
MK
82.TP
83Before glibc 2.10:
84_ATFILE_SOURCE
85.RE
86.ad
87.PD
fea681da 88.SH DESCRIPTION
e366dbc4 89Given a
0daa9e92 90.I pathname
e366dbc4 91for a file,
1f6ceb40 92.BR open ()
2fda57bd 93returns a file descriptor, a small, nonnegative integer
e366dbc4
MK
94for use in subsequent system calls
95.RB ( read "(2), " write "(2), " lseek "(2), " fcntl "(2), etc.)."
96The file descriptor returned by a successful call will be
2c4bff36 97the lowest-numbered file descriptor not currently open for the process.
e366dbc4 98.PP
fe75ec04 99By default, the new file descriptor is set to remain open across an
e366dbc4 100.BR execve (2)
1f6ceb40
MK
101(i.e., the
102.B FD_CLOEXEC
103file descriptor flag described in
31d79098
SP
104.BR fcntl (2)
105is initially disabled); the
fe75ec04 106.B O_CLOEXEC
d6a74b95 107flag, described below, can be used to change this default.
1f6ceb40 108The file offset is set to the beginning of the file (see
c13182ef 109.BR lseek (2)).
e366dbc4
MK
110.PP
111A call to
112.BR open ()
113creates a new
114.IR "open file description" ,
115an entry in the system-wide table of open files.
61b12e2b 116The open file description records the file offset and the file status flags
20ee63c1 117(see below).
61b12e2b 118A file descriptor is a reference to an open file description;
2c4bff36
MK
119this reference is unaffected if
120.I pathname
121is subsequently removed or modified to refer to a different file.
d20d9d33 122For further details on open file descriptions, see NOTES.
e366dbc4 123.PP
c4bb193f 124The argument
fea681da 125.I flags
e366dbc4
MK
126must include one of the following
127.IR "access modes" :
c7992edc 128.BR O_RDONLY ", " O_WRONLY ", or " O_RDWR .
e366dbc4
MK
129These request opening the file read-only, write-only, or read/write,
130respectively.
bfe9ba67
MK
131
132In addition, zero or more file creation flags and file status flags
c13182ef 133can be
fea681da 134.RI bitwise- or 'd
e366dbc4 135in
bfe9ba67 136.IR flags .
c13182ef
MK
137The
138.I file creation flags
139are
0e40804c 140.BR O_CLOEXEC ,
b072a788 141.BR O_CREAT ,
0e40804c
MK
142.BR O_DIRECTORY ,
143.BR O_EXCL ,
144.BR O_NOCTTY ,
145.BR O_NOFOLLOW ,
f2698a42 146.BR O_TMPFILE ,
0e40804c 147and
15fb5d03 148.BR O_TRUNC .
c13182ef
MK
149The
150.I file status flags
bfe9ba67 151are all of the remaining flags listed below.
0e40804c 152.\" SUSv4 divides the flags into:
93ee8f96
MK
153.\" * Access mode
154.\" * File creation
155.\" * File status
156.\" * Other (O_CLOEXEC, O_DIRECTORY, O_NOFOLLOW)
157.\" though it's not clear what the difference between "other" and
0e40804c
MK
158.\" "File creation" flags is. I raised an Aardvark to see if this
159.\" can be clarified in SUSv4; 10 Oct 2008.
160.\" http://thread.gmane.org/gmane.comp.standards.posix.austin.general/64/focus=67
161.\" TC1 (balloted in 2013), resolved this, so that those three constants
162.\" are also categorized" as file status flags.
163.\"
bfe9ba67 164The distinction between these two groups of flags is that
68210340
MK
165the file creation flags affect the semantics of the open operation itself,
166while the file status flags affect the semantics of subsequent I/O operations.
167The file status flags can be retrieved and (in some cases)
566b427d
MK
168modified; see
169.BR fcntl (2)
170for details.
171
bfe9ba67 172The full list of file creation flags and file status flags is as follows:
fea681da 173.TP
1c1e15ed 174.B O_APPEND
c13182ef
MK
175The file is opened in append mode.
176Before each
0bfa087b 177.BR write (2),
1e568304 178the file offset is positioned at the end of the file,
1c1e15ed 179as if with
0bfa087b 180.BR lseek (2).
1c1e15ed 181.B O_APPEND
9ee4a2b6 182may lead to corrupted files on NFS filesystems if more than one process
c13182ef 183appends data to a file at once.
a4391429
MK
184.\" For more background, see
185.\" http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=453946
186.\" http://nfs.sourceforge.net/
c13182ef 187This is because NFS does not support
1c1e15ed
MK
188appending to a file, so the client kernel has to simulate it, which
189can't be done without a race condition.
190.TP
191.B O_ASYNC
b50582eb 192Enable signal-driven I/O:
8bd58774
MK
193generate a signal
194.RB ( SIGIO
195by default, but this can be changed via
1c1e15ed
MK
196.BR fcntl (2))
197when input or output becomes possible on this file descriptor.
33a0ccb2 198This feature is available only for terminals, pseudoterminals,
1f6ceb40
MK
199sockets, and (since Linux 2.6) pipes and FIFOs.
200See
1c1e15ed
MK
201.BR fcntl (2)
202for further details.
9bde4908 203See also BUGS, below.
fe75ec04 204.TP
31c1f2b0 205.BR O_CLOEXEC " (since Linux 2.6.23)"
7fdec065 206.\" NOTE! several other man pages refer to this text
fe75ec04 207Enable the close-on-exec flag for the new file descriptor.
00d82ce8
MK
208.\" FIXME . for later review when Issue 8 is one day released...
209.\" POSIX proposes to fix many APIs that provide hidden FDs
210.\" http://austingroupbugs.net/tag_view_page.php?tag_id=8
211.\" http://austingroupbugs.net/view.php?id=368
24ec631f 212Specifying this flag permits a program to avoid additional
fe75ec04
MK
213.BR fcntl (2)
214.B F_SETFD
24ec631f 215operations to set the
0daa9e92 216.B FD_CLOEXEC
fe75ec04 217flag.
7756d157
MK
218
219Note that the use of this flag is essential in some multithreaded programs,
220because using a separate
fe75ec04
MK
221.BR fcntl (2)
222.B F_SETFD
223operation to set the
0daa9e92 224.B FD_CLOEXEC
fe75ec04 225flag does not suffice to avoid race conditions
7756d157
MK
226where one thread opens a file descriptor and
227attempts to set its close-on-exec flag using
228.BR fcntl (2)
229at the same time as another thread does a
fe75ec04
MK
230.BR fork (2)
231plus
232.BR execve (2).
7756d157 233Depending on the order of execution,
30821db8 234the race may lead to the file descriptor returned by
7756d157
MK
235.BR open ()
236being unintentionally leaked to the program executed by the child process
237created by
238.BR fork (2).
239(This kind of race is in principle possible for any system call
240that creates a file descriptor whose close-on-exec flag should be set,
241and various other Linux system calls provide an equivalent of the
242.BR O_CLOEXEC
243flag to deal with this problem.)
fe75ec04 244.\" This flag fixes only one form of the race condition;
d9cb0d7d 245.\" The race can also occur with, for example, file descriptors
fe75ec04 246.\" returned by accept(), pipe(), etc.
1c1e15ed 247.TP
fea681da 248.B O_CREAT
f1ad56a6 249If the file does not exist, it will be created.
7351ae87 250
40169a93 251The owner (user ID) of the new file is set to the effective user ID
c13182ef 252of the process.
ddf5e4ab
MK
253
254The group ownership (group ID) of the new file is set either to
255the effective group ID of the process (System V semantics)
256or to the group ID of the parent directory (BSD semantics).
257On Linux, the behavior depends on whether the
258set-group-ID mode bit is set on the parent directory:
259if that bit is set, then BSD semantics apply;
260otherwise, System V semantics apply.
261For some filesystems, the behavior also depends on the
fea681da
MK
262.I bsdgroups
263and
264.I sysvgroups
ddf5e4ab 265mount options described in
fea681da 266.BR mount (8)).
8b39ad66
MK
267.\" As at 2.6.25, bsdgroups is supported by ext2, ext3, ext4, and
268.\" XFS (since 2.6.14).
4e698277
MK
269.RS
270.PP
1bab84a8 271The
4e698277 272.I mode
1bab84a8 273argument specifies the file mode bits be applied when a new file is created.
4e698277
MK
274This argument must be supplied when
275.B O_CREAT
f2698a42
AL
276or
277.B O_TMPFILE
4e698277
MK
278is specified in
279.IR flags ;
f2698a42 280if neither
4e698277 281.B O_CREAT
f2698a42
AL
282nor
283.B O_TMPFILE
284is specified, then
4e698277
MK
285.I mode
286is ignored.
58222012 287The effective mode is modified by the process's
4e698277 288.I umask
58222012
MK
289in the usual way: in the absence of a default ACL, the mode of the
290created file is
84a275c4 291.IR "(mode\ &\ ~umask)" .
33a0ccb2 292Note that this mode applies only to future accesses of the
4e698277
MK
293newly created file; the
294.BR open ()
295call that creates a read-only file may well return a read/write
296file descriptor.
297.PP
298The following symbolic constants are provided for
299.IR mode :
300.TP 9
301.B S_IRWXU
97d5b762 30200700 user (file owner) has read, write, and execute permission
4e698277
MK
303.TP
304.B S_IRUSR
30500400 user has read permission
306.TP
307.B S_IWUSR
30800200 user has write permission
309.TP
310.B S_IXUSR
31100100 user has execute permission
312.TP
313.B S_IRWXG
97d5b762 31400070 group has read, write, and execute permission
4e698277
MK
315.TP
316.B S_IRGRP
31700040 group has read permission
318.TP
319.B S_IWGRP
32000020 group has write permission
321.TP
322.B S_IXGRP
32300010 group has execute permission
324.TP
325.B S_IRWXO
97d5b762 32600007 others have read, write, and execute permission
4e698277
MK
327.TP
328.B S_IROTH
32900004 others have read permission
330.TP
331.B S_IWOTH
33200002 others have write permission
333.TP
334.B S_IXOTH
33500001 others have execute permission
336.RE
9e1d8950
MK
337.IP
338According to POSIX, the effect when other bits are set in
339.I mode
340is unspecified.
341On Linux, the following bits are also honored in
342.IR mode :
343.RS
344.TP 9
345.B S_ISUID
3460004000 set-user-ID bit
347.TP
348.B S_ISGID
3490002000 set-group-ID bit (see
350.BR stat (2))
351.TP
352.B S_ISVTX
3530001000 sticky bit (see
354.BR stat (2))
355.RE
fea681da 356.TP
31c1f2b0 357.BR O_DIRECT " (since Linux 2.4.10)"
1c1e15ed
MK
358Try to minimize cache effects of the I/O to and from this file.
359In general this will degrade performance, but it is useful in
360special situations, such as when applications do their own caching.
bce0482f 361File I/O is done directly to/from user-space buffers.
015221ef
CH
362The
363.B O_DIRECT
0deb3ce9 364flag on its own makes an effort to transfer data synchronously,
015221ef
CH
365but does not give the guarantees of the
366.B O_SYNC
0deb3ce9
JM
367flag that data and necessary metadata are transferred.
368To guarantee synchronous I/O,
015221ef
CH
369.B O_SYNC
370must be used in addition to
371.BR O_DIRECT .
be02e49f 372See NOTES below for further discussion.
9b54d4fa 373.sp
c13182ef 374A semantically similar (but deprecated) interface for block devices
9b54d4fa 375is described in
1c1e15ed
MK
376.BR raw (8).
377.TP
378.B O_DIRECTORY
a8d55537 379If \fIpathname\fP is not a directory, cause the open to fail.
9f8d688a
MK
380.\" But see the following and its replies:
381.\" http://marc.theaimsgroup.com/?t=112748702800001&r=1&w=2
382.\" [PATCH] open: O_DIRECTORY and O_CREAT together should fail
383.\" O_DIRECTORY | O_CREAT causes O_DIRECTORY to be ignored.
65496644 384This flag was added in kernel version 2.1.126, to
60a90ecd
MK
385avoid denial-of-service problems if
386.BR opendir (3)
387is called on a
a3041a58 388FIFO or tape device.
1c1e15ed 389.TP
6cf19e62
MK
390.B O_DSYNC
391Write operations on the file will complete according to the requirements of
392synchronized I/O
393.I data
394integrity completion.
395
396By the time
397.BR write (2)
398(and similar)
399return, the output data
400has been transferred to the underlying hardware,
401along with any file metadata that would be required to retrieve that data
402(i.e., as though each
403.BR write (2)
404was followed by a call to
405.BR fdatasync (2)).
406.IR "See NOTES below" .
407.TP
fea681da 408.B O_EXCL
f4b9d6a5
MK
409Ensure that this call creates the file:
410if this flag is specified in conjunction with
fea681da 411.BR O_CREAT ,
f4b9d6a5
MK
412and
413.I pathname
414already exists, then
1c1e15ed 415.BR open ()
c13182ef 416will fail.
f4b9d6a5
MK
417
418When these two flags are specified, symbolic links are not followed:
419.\" POSIX.1-2001 explicitly requires this behavior.
420if
421.I pathname
422is a symbolic link, then
423.BR open ()
424fails regardless of where the symbolic link points to.
425
10b7a945
IHV
426In general, the behavior of
427.B O_EXCL
428is undefined if it is used without
429.BR O_CREAT .
430There is one exception: on Linux 2.6 and later,
431.B O_EXCL
432can be used without
433.B O_CREAT
434if
435.I pathname
436refers to a block device.
6303d401
DB
437If the block device is in use by the system (e.g., mounted),
438.BR open ()
10b7a945
IHV
439fails with the error
440.BR EBUSY .
441
efe08656 442On NFS,
f4b9d6a5 443.B O_EXCL
33a0ccb2 444is supported only when using NFSv3 or later on kernel 2.6 or later.
efe08656 445In NFS environments where
fea681da 446.B O_EXCL
f4b9d6a5
MK
447support is not provided, programs that rely on it
448for performing locking tasks will contain a race condition.
449Portable programs that want to perform atomic file locking using a lockfile,
450and need to avoid reliance on NFS support for
451.BR O_EXCL ,
452can create a unique file on
9ee4a2b6 453the same filesystem (e.g., incorporating hostname and PID), and use
fea681da 454.BR link (2)
c13182ef 455to make a link to the lockfile.
60a90ecd
MK
456If
457.BR link (2)
f4b9d6a5 458returns 0, the lock is successful.
c13182ef 459Otherwise, use
fea681da
MK
460.BR stat (2)
461on the unique file to check if its link count has increased to 2,
462in which case the lock is also successful.
463.TP
1c1e15ed
MK
464.B O_LARGEFILE
465(LFS)
466Allow files whose sizes cannot be represented in an
8478ee02 467.I off_t
1c1e15ed 468(but can be represented in an
8478ee02 469.IR off64_t )
1c1e15ed 470to be opened.
c13182ef 471The
bcdd964e 472.B _LARGEFILE64_SOURCE
e417acb0
MK
473macro must be defined
474(before including
475.I any
476header files)
477in order to obtain this definition.
c13182ef 478Setting the
bcdd964e 479.B _FILE_OFFSET_BITS
9f3d8b28
MK
480feature test macro to 64 (rather than using
481.BR O_LARGEFILE )
12e263f1 482is the preferred
9f3d8b28 483method of accessing large files on 32-bit systems (see
2dcbf4f7 484.BR feature_test_macros (7)).
1c1e15ed 485.TP
31c1f2b0 486.BR O_NOATIME " (since Linux 2.6.8)"
1bb72c96
MK
487Do not update the file last access time
488.RI ( st_atime
489in the inode)
310b7919 490when the file is
1c1e15ed 491.BR read (2).
47c906e5
MK
492
493This flag can be employed only if one of the following conditions is true:
494.RS
495.IP * 3
496The effective UID of the process
497.\" Strictly speaking: the filesystem UID
498matches the owner UID of the file.
499.IP *
500The calling process has the
501.BR CAP_FOWNER
502capability in its user namespace and
503the owner UID of the file has a mapping in the namespace.
504.RE
505.IP
1c1e15ed
MK
506This flag is intended for use by indexing or backup programs,
507where its use can significantly reduce the amount of disk activity.
9ee4a2b6 508This flag may not be effective on all filesystems.
1c1e15ed 509One example is NFS, where the server maintains the access time.
0e1ad98c 510.\" The O_NOATIME flag also affects the treatment of st_atime
92057f4d 511.\" by mmap() and readdir(2), MTK, Dec 04.
1c1e15ed 512.TP
fea681da
MK
513.B O_NOCTTY
514If
515.I pathname
5503c85e 516refers to a terminal device\(emsee
1bb72c96
MK
517.BR tty (4)\(emit
518will not become the process's controlling terminal even if the
fea681da
MK
519process does not have one.
520.TP
1c1e15ed 521.B O_NOFOLLOW
6ccb7137
MK
522If \fIpathname\fP is a symbolic link, then the open fails, with the error
523.BR ELOOP .
7fba0065
MK
524Symbolic links in earlier components of the pathname will still be
525followed.
526(Note that the
527.B ELOOP
528error that can occur in this case is indistinguishable from the case where
6ccb7137
MK
529an open fails because there are too many symbolic links found
530while resolving components in the prefix part of the pathname.)
7fba0065 531
8db11e23
MK
532This flag is a FreeBSD extension, which was added to Linux in version 2.1.126,
533and has subsequently been standardized in POSIX.1-2008.
7fba0065 534
1135dbe1 535See also
843068bd 536.BR O_PATH
1135dbe1 537below.
e366dbc4
MK
538.\" The headers from glibc 2.0.100 and later include a
539.\" definition of this flag; \fIkernels before 2.1.126 will ignore it if
a8d55537 540.\" used\fP.
fea681da
MK
541.TP
542.BR O_NONBLOCK " or " O_NDELAY
ff40dbb3 543When possible, the file is opened in nonblocking mode.
c13182ef 544Neither the
1c1e15ed 545.BR open ()
fea681da
MK
546nor any subsequent operations on the file descriptor which is
547returned will cause the calling process to wait.
403b78f8 548
9f629381
MK
549Note that this flag has no effect for regular files and block devices;
550that is, I/O operations will (briefly) block when device activity
551is required, regardless of whether
552.B O_NONBLOCK
553is set.
554Since
555.B O_NONBLOCK
556semantics might eventually be implemented,
557applications should not depend upon blocking behavior
558when specifying this flag for regular files and block devices.
559
fea681da 560For the handling of FIFOs (named pipes), see also
af5b2ef2 561.BR fifo (7).
db28bfac 562For a discussion of the effect of
0daa9e92 563.B O_NONBLOCK
db28bfac
MK
564in conjunction with mandatory file locks and with file leases, see
565.BR fcntl (2).
fea681da 566.TP
1135dbe1
MK
567.BR O_PATH " (since Linux 2.6.39)"
568.\" commit 1abf0c718f15a56a0a435588d1b104c7a37dc9bd
569.\" commit 326be7b484843988afe57566b627fb7a70beac56
570.\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d
571.\"
572.\" http://thread.gmane.org/gmane.linux.man/2790/focus=3496
573.\" Subject: Re: [PATCH] open(2): document O_PATH
574.\" Newsgroups: gmane.linux.man, gmane.linux.kernel
575.\"
1135dbe1 576Obtain a file descriptor that can be used for two purposes:
9ee4a2b6 577to indicate a location in the filesystem tree and
1135dbe1
MK
578to perform operations that act purely at the file descriptor level.
579The file itself is not opened, and other file operations (e.g.,
580.BR read (2),
581.BR write (2),
582.BR fchmod (2),
583.BR fchown (2),
2510e4e5
RH
584.BR fgetxattr (2),
585.BR mmap (2))
1135dbe1
MK
586fail with the error
587.BR EBADF .
588
589The following operations
590.I can
591be performed on the resulting file descriptor:
592.RS
593.IP * 3
594.BR close (2);
595.BR fchdir (2)
596(since Linux 3.5);
597.\" commit 332a2e1244bd08b9e3ecd378028513396a004a24
598.BR fstat (2)
599(since Linux 3.6).
600.\" fstat(): commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2
601.IP *
602Duplicating the file descriptor
603.RB ( dup (2),
604.BR fcntl (2)
605.BR F_DUPFD ,
606etc.).
607.IP *
608Getting and setting file descriptor flags
609.RB ( fcntl (2)
610.BR F_GETFD
611and
612.BR F_SETFD ).
09f677a3
MK
613.IP *
614Retrieving open file status flags using the
615.BR fcntl (2)
13a082cb 616.BR F_GETFL
09f677a3
MK
617operation: the returned flags will include the bit
618.BR O_PATH .
1135dbe1
MK
619.IP *
620Passing the file descriptor as the
621.IR dirfd
622argument of
490f876a 623.BR openat ()
1135dbe1 624and the other "*at()" system calls.
7dee406b
AL
625This includes
626.BR linkat (2)
627with
0da5e58a 628.BR AT_EMPTY_PATH
7dee406b
AL
629(or via procfs using
630.BR AT_SYMLINK_FOLLOW )
631even if the file is not a directory.
1135dbe1
MK
632.IP *
633Passing the file descriptor to another process via a UNIX domain socket
634(see
635.BR SCM_RIGHTS
636in
637.BR unix (7)).
638.RE
639.IP
640When
641.B O_PATH
642is specified in
643.IR flags ,
644flag bits other than
6807fc6f
MK
645.BR O_CLOEXEC ,
646.BR O_DIRECTORY ,
1135dbe1
MK
647and
648.BR O_NOFOLLOW
649are ignored.
650
d30344ab
MK
651If
652.I pathname
653is a symbolic link and the
1135dbe1
MK
654.BR O_NOFOLLOW
655flag is also specified,
656then the call returns a file descriptor referring to the symbolic link.
657This file descriptor can be used as the
658.I dirfd
659argument in calls to
660.BR fchownat (2),
661.BR fstatat (2),
662.BR linkat (2),
663and
664.BR readlinkat (2)
665with an empty pathname to have the calls operate on the symbolic link.
666.TP
fea681da 667.B O_SYNC
6cf19e62
MK
668Write operations on the file will complete according to the requirements of
669synchronized I/O
670.I file
671integrity completion
f36a1468 672(by contrast with the
6cf19e62
MK
673synchronized I/O
674.I data
675integrity completion
676provided by
677.BR O_DSYNC .)
678
679By the time
680.BR write (2)
681(and similar)
682return, the output data and associated file metadata
683have been transferred to the underlying hardware
684(i.e., as though each
685.BR write (2)
686was followed by a call to
687.BR fsync (2)).
688.IR "See NOTES below" .
fea681da 689.TP
40398c1a
MK
690.BR O_TMPFILE " (since Linux 3.11)"
691.\" commit 60545d0d4610b02e55f65d141c95b18ccf855b6e
692.\" commit f4e0c30c191f87851c4a53454abb55ee276f4a7e
693.\" commit bb458c644a59dbba3a1fe59b27106c5e68e1c4bd
694Create an unnamed temporary file.
695The
696.I pathname
697argument specifies a directory;
698an unnamed inode will be created in that directory's filesystem.
699Anything written to the resulting file will be lost when
700the last file descriptor is closed, unless the file is given a name.
701
702.B O_TMPFILE
703must be specified with one of
704.B O_RDWR
705or
706.B O_WRONLY
707and, optionally,
708.BR O_EXCL .
709If
710.B O_EXCL
711is not specified, then
712.BR linkat (2)
713can be used to link the temporary file into the filesystem, making it
714permanent, using code like the following:
715
716.in +4n
717.nf
718char path[PATH_MAX];
719fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
0fb83d00
MK
720 S_IRUSR | S_IWUSR);
721
40398c1a 722/* File I/O on 'fd'... */
0fb83d00 723
40398c1a 724snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
e1252130 725linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
0fb83d00 726 AT_SYMLINK_FOLLOW);
40398c1a
MK
727.fi
728.in
729
730In this case,
731the
732.BR open ()
733.I mode
734argument determines the file permission mode, as with
735.BR O_CREAT .
736
0115aaed
MK
737Specifying
738.B O_EXCL
739in conjunction with
740.B O_TMPFILE
741prevents a temporary file from being linked into the filesystem
742in the above manner.
743(Note that the meaning of
744.B O_EXCL
745in this case is different from the meaning of
746.B O_EXCL
747otherwise.)
748
40398c1a
MK
749There are two main use cases for
750.\" Inspired by http://lwn.net/Articles/559147/
751.BR O_TMPFILE :
752.RS
753.IP * 3
754Improved
755.BR tmpfile (3)
756functionality: race-free creation of temporary files that
757(1) are automatically deleted when closed;
758(2) can never be reached via any pathname;
759(3) are not subject to symlink attacks; and
760(4) do not require the caller to devise unique names.
761.IP *
762Creating a file that is initially invisible, which is then populated
8b04592d 763with data and adjusted to have appropriate filesystem attributes
c89a9937
EB
764.RB ( fchown (2),
765.BR fchmod (2),
40398c1a
MK
766.BR fsetxattr (2),
767etc.)
768before being atomically linked into the filesystem
769in a fully formed state (using
770.BR linkat (2)
771as described above).
772.RE
773.IP
774.B O_TMPFILE
775requires support by the underlying filesystem;
40398c1a 776only a subset of Linux filesystems provide that support.
cde2074a 777In the initial implementation, support was provided in
9af6b115 778the ext2, ext3, ext4, UDF, Minix, and shmem filesystems.
bd79a35a 779.\" To check for support, grep for "tmpfile" in kernel sources
6065b906
MK
780Support for other filesystems has subsequently been added as follows:
781XFS (Linux 3.15);
cde2074a
MK
782.\" commit 99b6436bc29e4f10e4388c27a3e4810191cc4788
783.\" commit ab29743117f9f4c22ac44c13c1647fb24fb2bafe
1b9d5819 784Btrfs (Linux 3.16);
e746db2e 785.\" commit ef3b9af50bfa6a1f02cd7b3f5124b712b1ba3e3c
6065b906 786F2FS (Linux 3.16);
bd79a35a 787.\" commit 50732df02eefb39ab414ef655979c2c9b64ad21c
6065b906 788and ubifs (Linux 4.9)
40398c1a 789.TP
1c1e15ed 790.B O_TRUNC
4d61d36a 791If the file already exists and is a regular file and the access mode allows
682edefb
MK
792writing (i.e., is
793.B O_RDWR
794or
795.BR O_WRONLY )
796it will be truncated to length 0.
797If the file is a FIFO or terminal device file, the
798.B O_TRUNC
c13182ef 799flag is ignored.
2b9b829d 800Otherwise, the effect of
682edefb
MK
801.B O_TRUNC
802is unspecified.
7b8ba76c 803.SS creat()
1f7191bb 804A call to
1c1e15ed 805.BR creat ()
1f7191bb 806is equivalent to calling
1c1e15ed 807.BR open ()
fea681da
MK
808with
809.I flags
810equal to
811.BR O_CREAT|O_WRONLY|O_TRUNC .
7b8ba76c
MK
812.SS openat()
813The
814.BR openat ()
815system call operates in exactly the same way as
cadd38ba 816.BR open (),
7b8ba76c
MK
817except for the differences described here.
818
819If the pathname given in
820.I pathname
821is relative, then it is interpreted relative to the directory
3ad65ff0 822referred to by the file descriptor
7b8ba76c
MK
823.I dirfd
824(rather than relative to the current working directory of
825the calling process, as is done by
cadd38ba 826.BR open ()
7b8ba76c
MK
827for a relative pathname).
828
829If
830.I pathname
831is relative and
832.I dirfd
833is the special value
834.BR AT_FDCWD ,
835then
836.I pathname
837is interpreted relative to the current working
838directory of the calling process (like
cadd38ba 839.BR open ()).
7b8ba76c
MK
840
841If
842.I pathname
843is absolute, then
844.I dirfd
845is ignored.
47297adb 846.SH RETURN VALUE
7b8ba76c
MK
847.BR open (),
848.BR openat (),
c13182ef 849and
e1d6264d 850.BR creat ()
1c1e15ed
MK
851return the new file descriptor, or \-1 if an error occurred
852(in which case,
fea681da
MK
853.I errno
854is set appropriately).
fea681da 855.SH ERRORS
7b8ba76c
MK
856.BR open (),
857.BR openat (),
858and
859.BR creat ()
860can fail with the following errors:
fea681da
MK
861.TP
862.B EACCES
863The requested access to the file is not allowed, or search permission
864is denied for one of the directories in the path prefix of
865.IR pathname ,
866or the file did not exist yet and write access to the parent directory
867is not allowed.
868(See also
ad7cc990 869.BR path_resolution (7).)
fea681da 870.TP
a1f01685
MH
871.B EDQUOT
872Where
873.B O_CREAT
874is specified, the file does not exist, and the user's quota of disk
9ee4a2b6 875blocks or inodes on the filesystem has been exhausted.
a1f01685 876.TP
fea681da
MK
877.B EEXIST
878.I pathname
879already exists and
880.BR O_CREAT " and " O_EXCL
881were used.
882.TP
883.B EFAULT
0daa9e92 884.I pathname
e1d6264d 885points outside your accessible address space.
fea681da 886.TP
9f5773f7 887.B EFBIG
7c7fb552
MK
888See
889.BR EOVERFLOW .
9f5773f7 890.TP
e51412ea
MK
891.B EINTR
892While blocked waiting to complete an open of a slow device
893(e.g., a FIFO; see
894.BR fifo (7)),
895the call was interrupted by a signal handler; see
896.BR signal (7).
897.TP
ef490193
DG
898.B EINVAL
899The filesystem does not support the
900.BR O_DIRECT
e6f89ed2
MK
901flag.
902See
ef490193
DG
903.BR NOTES
904for more information.
905.TP
8e335391
MK
906.B EINVAL
907Invalid value in
908.\" In particular, __O_TMPFILE instead of O_TMPFILE
909.IR flags .
910.TP
911.B EINVAL
912.B O_TMPFILE
913was specified in
914.IR flags ,
915but neither
916.B O_WRONLY
917nor
918.B O_RDWR
919was specified.
920.TP
fea681da
MK
921.B EISDIR
922.I pathname
923refers to a directory and the access requested involved writing
924(that is,
925.B O_WRONLY
926or
927.B O_RDWR
928is set).
929.TP
8e335391 930.B EISDIR
843068bd
MK
931.I pathname
932refers to an existing directory,
8e335391
MK
933.B O_TMPFILE
934and one of
935.B O_WRONLY
936or
937.B O_RDWR
938were specified in
939.IR flags ,
940but this kernel version does not provide the
941.B O_TMPFILE
942functionality.
943.TP
fea681da
MK
944.B ELOOP
945Too many symbolic links were encountered in resolving
289f7907
MK
946.IR pathname .
947.TP
948.B ELOOP
fea681da 949.I pathname
289f7907
MK
950was a symbolic link, and
951.I flags
952specified
953.BR O_NOFOLLOW
954but not
955.BR O_PATH .
fea681da
MK
956.TP
957.B EMFILE
26c32fab 958The per-process limit on the number of open file descriptors has been reached
12c21590
MK
959(see the description of
960.BR RLIMIT_NOFILE
961in
962.BR getrlimit (2)).
fea681da
MK
963.TP
964.B ENAMETOOLONG
0daa9e92 965.I pathname
e1d6264d 966was too long.
fea681da
MK
967.TP
968.B ENFILE
e258766b 969The system-wide limit on the total number of open files has been reached.
fea681da
MK
970.TP
971.B ENODEV
972.I pathname
973refers to a device special file and no corresponding device exists.
682edefb
MK
974(This is a Linux kernel bug; in this situation
975.B ENXIO
976must be returned.)
fea681da
MK
977.TP
978.B ENOENT
682edefb
MK
979.B O_CREAT
980is not set and the named file does not exist.
fea681da
MK
981Or, a directory component in
982.I pathname
983does not exist or is a dangling symbolic link.
984.TP
ba03011f
MK
985.B ENOENT
986.I pathname
987refers to a nonexistent directory,
988.B O_TMPFILE
989and one of
990.B O_WRONLY
991or
992.B O_RDWR
993were specified in
994.IR flags ,
995but this kernel version does not provide the
996.B O_TMPFILE
997functionality.
998.TP
fea681da 999.B ENOMEM
8ef529f9
MK
1000The named file is a FIFO,
1001but memory for the FIFO buffer can't be allocated because
1002the per-user hard limit on memory allocation for pipes has been reached
1003and the caller is not privileged; see
1004.BR pipe (7).
1005.TP
1006.B ENOMEM
fea681da
MK
1007Insufficient kernel memory was available.
1008.TP
1009.B ENOSPC
1010.I pathname
1011was to be created but the device containing
1012.I pathname
1013has no room for the new file.
1014.TP
1015.B ENOTDIR
1016A component used as a directory in
1017.I pathname
a8d55537 1018is not, in fact, a directory, or \fBO_DIRECTORY\fP was specified and
fea681da
MK
1019.I pathname
1020was not a directory.
1021.TP
1022.B ENXIO
682edefb 1023.BR O_NONBLOCK " | " O_WRONLY
103ea4f6
MK
1024is set, the named file is a FIFO, and
1025no process has the FIFO open for reading.
7b032b23
MK
1026.TP
1027.B ENXIO
1028The file is a device special file and no corresponding device exists.
fea681da 1029.TP
bbe02b45
MK
1030.BR EOPNOTSUPP
1031The filesystem containing
1032.I pathname
1033does not support
1034.BR O_TMPFILE .
1035.TP
7c7fb552
MK
1036.B EOVERFLOW
1037.I pathname
1038refers to a regular file that is too large to be opened.
1039The usual scenario here is that an application compiled
1040on a 32-bit platform without
5e4dc269 1041.I -D_FILE_OFFSET_BITS=64
7c7fb552 1042tried to open a file whose size exceeds
4e1a4d72
MK
1043.I (1<<31)-1
1044bytes;
7c7fb552
MK
1045see also
1046.B O_LARGEFILE
1047above.
c84d3aa3 1048This is the error specified by POSIX.1;
7c7fb552
MK
1049in kernels before 2.6.24, Linux gave the error
1050.B EFBIG
1051for this case.
1052.\" See http://bugzilla.kernel.org/show_bug.cgi?id=7253
1053.\" "Open of a large file on 32-bit fails with EFBIG, should be EOVERFLOW"
1054.\" Reported 2006-10-03
1055.TP
1c1e15ed
MK
1056.B EPERM
1057The
1058.B O_NOATIME
1059flag was specified, but the effective user ID of the caller
9ee4a2b6 1060.\" Strictly speaking, it's the filesystem UID... (MTK)
47c906e5 1061did not match the owner of the file and the caller was not privileged.
1c1e15ed 1062.TP
fbab10e5
MK
1063.B EPERM
1064The operation was prevented by a file seal; see
1065.BR fcntl (2).
1066.TP
fea681da
MK
1067.B EROFS
1068.I pathname
9ee4a2b6 1069refers to a file on a read-only filesystem and write access was
fea681da
MK
1070requested.
1071.TP
1072.B ETXTBSY
1073.I pathname
1074refers to an executable image which is currently being executed and
1075write access was requested.
d3952311
MK
1076.TP
1077.B EWOULDBLOCK
1078The
1079.B O_NONBLOCK
1080flag was specified, and an incompatible lease was held on the file
1081(see
1082.BR fcntl (2)).
7b8ba76c
MK
1083.PP
1084The following additional errors can occur for
1085.BR openat ():
1086.TP
1087.B EBADF
1088.I dirfd
1089is not a valid file descriptor.
1090.TP
1091.B ENOTDIR
1092.I pathname
2feae602 1093is a relative pathname and
7b8ba76c
MK
1094.I dirfd
1095is a file descriptor referring to a file other than a directory.
1096.SH VERSIONS
1097.BR openat ()
1098was added to Linux in kernel 2.6.16;
1099library support was added to glibc in version 2.4.
47297adb 1100.SH CONFORMING TO
7b8ba76c
MK
1101.BR open (),
1102.BR creat ()
72ac7268
MK
1103SVr4, 4.3BSD, POSIX.1-2001, POSIX.1-2008.
1104
7b8ba76c
MK
1105.BR openat ():
1106POSIX.1-2008.
7b8ba76c 1107
fea681da 1108The
72ac7268 1109.BR O_DIRECT ,
1c1e15ed 1110.BR O_NOATIME ,
72ac7268 1111.BR O_PATH ,
fea681da 1112and
72ac7268
MK
1113.BR O_TMPFILE
1114flags are Linux-specific.
1115One must define
61b7c1e1
MK
1116.B _GNU_SOURCE
1117to obtain their definitions.
9f91e36c
MK
1118
1119The
72ac7268
MK
1120.BR O_CLOEXEC ,
1121.BR O_DIRECTORY ,
1122and
1123.BR O_NOFOLLOW
1124flags are not specified in POSIX.1-2001,
1125but are specified in POSIX.1-2008.
1126Since glibc 2.12, one can obtain their definitions by defining either
1127.B _POSIX_C_SOURCE
1128with a value greater than or equal to 200809L or
1129.BR _XOPEN_SOURCE
1130with a value greater than or equal to 700.
1131In glibc 2.11 and earlier, one obtains the definitions by defining
1132.BR _GNU_SOURCE .
9f91e36c 1133
72ac7268
MK
1134As noted in
1135.BR feature_test_macros (7),
84fc2a6e 1136feature test macros such as
72ac7268
MK
1137.BR _POSIX_C_SOURCE ,
1138.BR _XOPEN_SOURCE ,
1139and
fe75ec04 1140.B _GNU_SOURCE
72ac7268 1141must be defined before including
e417acb0 1142.I any
72ac7268 1143header files.
a1d5f77c 1144.SH NOTES
988db661 1145Under Linux, the
a1d5f77c
MK
1146.B O_NONBLOCK
1147flag indicates that one wants to open
1148but does not necessarily have the intention to read or write.
1149This is typically used to open devices in order to get a file descriptor
1150for use with
1151.BR ioctl (2).
fea681da
MK
1152.LP
1153The (undefined) effect of
1154.B O_RDONLY | O_TRUNC
c13182ef 1155varies among implementations.
bcdd964e 1156On many systems the file is actually truncated.
fea681da
MK
1157.\" Linux 2.0, 2.5: truncate
1158.\" Solaris 5.7, 5.8: truncate
1159.\" Irix 6.5: truncate
1160.\" Tru64 5.1B: truncate
1161.\" HP-UX 11.22: truncate
1162.\" FreeBSD 4.7: truncate
a1d5f77c 1163
5dc8986d
MK
1164Note that
1165.BR open ()
1166can open device special files, but
1167.BR creat ()
1168cannot create them; use
1169.BR mknod (2)
1170instead.
1171
1172If the file is newly created, its
1173.IR st_atime ,
1174.IR st_ctime ,
1175.I st_mtime
1176fields
1177(respectively, time of last access, time of last status change, and
1178time of last modification; see
1179.BR stat (2))
1180are set
1181to the current time, and so are the
1182.I st_ctime
1183and
1184.I st_mtime
1185fields of the
1186parent directory.
1187Otherwise, if the file is modified because of the
1188.B O_TRUNC
3a9c5a29
MK
1189flag, its
1190.I st_ctime
1191and
1192.I st_mtime
1193fields are set to the current time.
aaf7a574
MK
1194
1195The files in the
1196.I /proc/[pid]/fd
1197directory show the open file descriptors of the process with the PID
1198.IR pid .
1199The files in the
1200.I /proc/[pid]/fdinfo
1201directory show even more information about these files descriptors.
1202See
1203.BR proc (5)
1204for further details of both of these directories.
5dc8986d
MK
1205.\"
1206.\"
d20d9d33
MK
1207.SS Open file descriptions
1208The term open file description is the one used by POSIX to refer to the
1209entries in the system-wide table of open files.
91085d85 1210In other contexts, this object is
d20d9d33
MK
1211variously also called an "open file object",
1212a "file handle", an "open file table entry",
1213or\(emin kernel-developer parlance\(ema
1214.IR "struct file" .
1215
1216When a file descriptor is duplicated (using
1217.BR dup (2)
1218or similar),
1219the duplicate refers to the same open file description
1220as the original file descriptor,
1221and the two file descriptors consequently share
1222the file offset and file status flags.
1223Such sharing can also occur between processes:
1224a child process created via
91085d85 1225.BR fork (2)
d20d9d33
MK
1226inherits duplicates of its parent's file descriptors,
1227and those duplicates refer to the same open file descriptions.
1228
1229Each
bf7bc8b8 1230.BR open ()
d20d9d33
MK
1231of a file creates a new open file description;
1232thus, there may be multiple open file descriptions
1233corresponding to a file inode.
9539ebc9
MK
1234
1235On Linux, one can use the
1236.BR kcmp (2)
1237.B KCMP_FILE
1238operation to test whether two file descriptors
1239(in the same process or in two different processes)
1240refer to the same open file description.
d20d9d33
MK
1241.\"
1242.\"
5dc8986d 1243.SS Synchronized I/O
6cf19e62
MK
1244The POSIX.1-2008 "synchronized I/O" option
1245specifies different variants of synchronized I/O,
1246and specifies the
1247.BR open ()
1248flags
015221ef
CH
1249.BR O_SYNC ,
1250.BR O_DSYNC ,
1251and
6cf19e62
MK
1252.BR O_RSYNC
1253for controlling the behavior.
1254Regardless of whether an implementation supports this option,
1255it must at least support the use of
1256.BR O_SYNC
1257for regular files.
1258
89851a00 1259Linux implements
6cf19e62
MK
1260.BR O_SYNC
1261and
1262.BR O_DSYNC ,
1263but not
015221ef 1264.BR O_RSYNC .
6cf19e62
MK
1265(Somewhat incorrectly, glibc defines
1266.BR O_RSYNC
1267to have the same value as
1268.BR O_SYNC .)
1269
1270.BR O_SYNC
1271provides synchronized I/O
1272.I file
1273integrity completion,
1274meaning write operations will flush data and all associated metadata
1275to the underlying hardware.
1276.BR O_DSYNC
1277provides synchronized I/O
1278.I data
1279integrity completion,
1280meaning write operations will flush data
1281to the underlying hardware,
1282but will only flush metadata updates that are required
1283to allow a subsequent read operation to complete successfully.
1284Data integrity completion can reduce the number of disk operations
1285that are required for applications that don't need the guarantees
1286of file integrity completion.
1287
a83923ca 1288To understand the difference between the two types of completion,
6cf19e62
MK
1289consider two pieces of file metadata:
1290the file last modification timestamp
1291.RI ( st_mtime )
1292and the file length.
1293All write operations will update the last file modification timestamp,
1294but only writes that add data to the end of the
1295file will change the file length.
1296The last modification timestamp is not needed to ensure that
1297a read completes successfully, but the file length is.
1298Thus,
1299.BR O_DSYNC
1300would only guarantee to flush updates to the file length metadata
1301(whereas
1302.BR O_SYNC
1303would also always flush the last modification timestamp metadata).
1304
1305Before Linux 2.6.33, Linux implemented only the
1306.BR O_SYNC
89851a00 1307flag for
6cf19e62
MK
1308.BR open ().
1309However, when that flag was specified,
1310most filesystems actually provided the equivalent of synchronized I/O
1311.I data
1312integrity completion (i.e.,
1313.BR O_SYNC
1314was actually implemented as the equivalent of
1315.BR O_DSYNC ).
1316
1317Since Linux 2.6.33, proper
1318.BR O_SYNC
1319support is provided.
1320However, to ensure backward binary compatibility,
1321.BR O_DSYNC
1322was defined with the same value as the historical
015221ef 1323.BR O_SYNC ,
015221ef 1324and
6cf19e62 1325.BR O_SYNC
89851a00 1326was defined as a new (two-bit) flag value that includes the
6cf19e62
MK
1327.BR O_DSYNC
1328flag value.
1329This ensures that applications compiled against
1330new headers get at least
1331.BR O_DSYNC
1332semantics on pre-2.6.33 kernels.
5dc8986d
MK
1333.\"
1334.\"
1335.SS NFS
1336There are many infelicities in the protocol underlying NFS, affecting
1337amongst others
1338.BR O_SYNC " and " O_NDELAY .
a1d5f77c 1339
9ee4a2b6 1340On NFS filesystems with UID mapping enabled,
a1d5f77c
MK
1341.BR open ()
1342may
75b94dc3 1343return a file descriptor but, for example,
a1d5f77c
MK
1344.BR read (2)
1345requests are denied
1346with \fBEACCES\fP.
1347This is because the client performs
1348.BR open ()
1349by checking the
1350permissions, but UID mapping is performed by the server upon
1351read and write requests.
5dc8986d
MK
1352.\"
1353.\"
1bdc161d
MK
1354.SS FIFOs
1355Opening the read or write end of a FIFO blocks until the other
1356end is also opened (by another process or thread).
1357See
1358.BR fifo (7)
1359for further details.
1360.\"
1361.\"
5dc8986d
MK
1362.SS File access mode
1363Unlike the other values that can be specified in
1364.IR flags ,
1365the
1366.I "access mode"
1367values
1368.BR O_RDONLY ", " O_WRONLY ", and " O_RDWR
1369do not specify individual bits.
1370Rather, they define the low order two bits of
1371.IR flags ,
1372and are defined respectively as 0, 1, and 2.
1373In other words, the combination
1374.B "O_RDONLY | O_WRONLY"
1375is a logical error, and certainly does not have the same meaning as
1376.BR O_RDWR .
a1d5f77c 1377
5dc8986d
MK
1378Linux reserves the special, nonstandard access mode 3 (binary 11) in
1379.I flags
1380to mean:
d9cb0d7d 1381check for read and write permission on the file and return a file descriptor
5dc8986d
MK
1382that can't be used for reading or writing.
1383This nonstandard access mode is used by some Linux drivers to return a
d9cb0d7d 1384file descriptor that is to be used only for device-specific
5dc8986d
MK
1385.BR ioctl (2)
1386operations.
1387.\" See for example util-linux's disk-utils/setfdprm.c
1388.\" For some background on access mode 3, see
1389.\" http://thread.gmane.org/gmane.linux.kernel/653123
1390.\" "[RFC] correct flags to f_mode conversion in __dentry_open"
1391.\" LKML, 12 Mar 2008
7b8ba76c
MK
1392.\"
1393.\"
80d250b4 1394.SS Rationale for openat() and other "directory file descriptor" APIs
7b8ba76c 1395.BR openat ()
80d250b4
MK
1396and the other system calls and library functions that take
1397a directory file descriptor argument
7b8ba76c 1398(i.e.,
c6a16783 1399.BR execveat (2),
7b8ba76c 1400.BR faccessat (2),
80d250b4 1401.BR fanotify_mark (2),
7b8ba76c
MK
1402.BR fchmodat (2),
1403.BR fchownat (2),
1404.BR fstatat (2),
1405.BR futimesat (2),
1406.BR linkat (2),
1407.BR mkdirat (2),
1408.BR mknodat (2),
80d250b4 1409.BR name_to_handle_at (2),
7b8ba76c
MK
1410.BR readlinkat (2),
1411.BR renameat (2),
1412.BR symlinkat (2),
1413.BR unlinkat (2),
f37759b1 1414.BR utimensat (2),
80d250b4 1415.BR mkfifoat (3),
7b8ba76c 1416and
80d250b4 1417.BR scandirat (3))
7b8ba76c
MK
1418are supported
1419for two reasons.
92692952 1420Here, the explanation is in terms of the
7b8ba76c 1421.BR openat ()
d26f8a31 1422call, but the rationale is analogous for the other interfaces.
7b8ba76c
MK
1423
1424First,
1425.BR openat ()
1426allows an application to avoid race conditions that could
1427occur when using
cadd38ba 1428.BR open ()
7b8ba76c
MK
1429to open files in directories other than the current working directory.
1430These race conditions result from the fact that some component
1431of the directory prefix given to
cadd38ba 1432.BR open ()
7b8ba76c 1433could be changed in parallel with the call to
cadd38ba 1434.BR open ().
54305f5b
MK
1435Suppose, for example, that we wish to create the file
1436.I path/to/xxx.dep
1437if the file
1438.I path/to/xxx
1439exists.
1440The problem is that between the existence check and the file creation step,
1441.I path
1442or
1443.I to
1444(which might be symbolic links)
1445could be modified to point to a different location.
7b8ba76c
MK
1446Such races can be avoided by
1447opening a file descriptor for the target directory,
1448and then specifying that file descriptor as the
1449.I dirfd
54305f5b
MK
1450argument of (say)
1451.BR fstatat (2)
1452and
7b8ba76c
MK
1453.BR openat ().
1454
1455Second,
1456.BR openat ()
1457allows the implementation of a per-thread "current working
1458directory", via file descriptor(s) maintained by the application.
1459(This functionality can also be obtained by tricks based
1460on the use of
1461.IR /proc/self/fd/ dirfd,
1462but less efficiently.)
1463.\"
1464.\"
ddc4d339
MK
1465.SS O_DIRECT
1466.LP
1467The
1468.B O_DIRECT
1469flag may impose alignment restrictions on the length and address
7fac88a9 1470of user-space buffers and the file offset of I/Os.
ddc4d339 1471In Linux alignment
9ee4a2b6 1472restrictions vary by filesystem and kernel version and might be
ddc4d339 1473absent entirely.
9ee4a2b6 1474However there is currently no filesystem\-independent
ddc4d339 1475interface for an application to discover these restrictions for a given
9ee4a2b6
MK
1476file or filesystem.
1477Some filesystems provide their own interfaces
ddc4d339
MK
1478for doing so, for example the
1479.B XFS_IOC_DIOINFO
1480operation in
1481.BR xfsctl (3).
1482.LP
85c2bdba
MK
1483Under Linux 2.4, transfer sizes, and the alignment of the user buffer
1484and the file offset must all be multiples of the logical block size
9ee4a2b6 1485of the filesystem.
21557928 1486Since Linux 2.6.0, alignment to the logical block size of the
e6042e4a 1487underlying storage (typically 512 bytes) suffices.
21557928 1488The logical block size can be determined using the
e6042e4a
PS
1489.BR ioctl (2)
1490.B BLKSSZGET
21557928
MK
1491operation or from the shell using the command:
1492
1493 blockdev \-\-getss
1847167b
NP
1494.LP
1495.B O_DIRECT
1496I/Os should never be run concurrently with the
04cd7f64 1497.BR fork (2)
1847167b
NP
1498system call,
1499if the memory buffer is a private mapping
1500(i.e., any mapping created with the
02ace852 1501.BR mmap (2)
1847167b 1502.BR MAP_PRIVATE
0ab8aeec 1503flag;
1847167b
NP
1504this includes memory allocated on the heap and statically allocated buffers).
1505Any such I/Os, whether submitted via an asynchronous I/O interface or from
1506another thread in the process,
1507should be completed before
1508.BR fork (2)
1509is called.
1510Failure to do so can result in data corruption and undefined behavior in
1511parent and child processes.
1512This restriction does not apply when the memory buffer for the
1513.B O_DIRECT
1514I/Os was created using
1515.BR shmat (2)
1516or
1517.BR mmap (2)
1518with the
1519.B MAP_SHARED
1520flag.
1521Nor does this restriction apply when the memory buffer has been advised as
1522.B MADV_DONTFORK
0ab8aeec 1523with
02ace852 1524.BR madvise (2),
1847167b
NP
1525ensuring that it will not be available
1526to the child after
1527.BR fork (2).
ddc4d339
MK
1528.LP
1529The
1530.B O_DIRECT
1531flag was introduced in SGI IRIX, where it has alignment
1532restrictions similar to those of Linux 2.4.
1533IRIX has also a
1534.BR fcntl (2)
1535call to query appropriate alignments, and sizes.
1536FreeBSD 4.x introduced
1537a flag of the same name, but without alignment restrictions.
1538.LP
1539.B O_DIRECT
1540support was added under Linux in kernel version 2.4.10.
1541Older Linux kernels simply ignore this flag.
9ee4a2b6 1542Some filesystems may not implement the flag and
ddc4d339
MK
1543.BR open ()
1544will fail with
1545.B EINVAL
1546if it is used.
1547.LP
1548Applications should avoid mixing
1549.B O_DIRECT
1550and normal I/O to the same file,
1551and especially to overlapping byte regions in the same file.
9ee4a2b6 1552Even when the filesystem correctly handles the coherency issues in
ddc4d339
MK
1553this situation, overall I/O throughput is likely to be slower than
1554using either mode alone.
1555Likewise, applications should avoid mixing
1556.BR mmap (2)
1557of files with direct I/O to the same files.
1558.LP
a1fa36af 1559The behavior of
ddc4d339 1560.B O_DIRECT
9ee4a2b6 1561with NFS will differ from local filesystems.
ddc4d339
MK
1562Older kernels, or
1563kernels configured in certain ways, may not support this combination.
1564The NFS protocol does not support passing the flag to the server, so
1565.B O_DIRECT
33a0ccb2 1566I/O will bypass the page cache only on the client; the server may
ddc4d339
MK
1567still cache the I/O.
1568The client asks the server to make the I/O
1569synchronous to preserve the synchronous semantics of
1570.BR O_DIRECT .
1571Some servers will perform poorly under these circumstances, especially
1572if the I/O size is small.
1573Some servers may also be configured to
1574lie to clients about the I/O having reached stable storage; this
1575will avoid the performance penalty at some risk to data integrity
1576in the event of server power failure.
1577The Linux NFS client places no alignment restrictions on
1578.B O_DIRECT
1579I/O.
1580.PP
1581In summary,
1582.B O_DIRECT
1583is a potentially powerful tool that should be used with caution.
1584It is recommended that applications treat use of
1585.B O_DIRECT
1586as a performance option which is disabled by default.
1587.PP
1588.RS
fea681da
MK
1589"The thing that has always disturbed me about O_DIRECT is that the whole
1590interface is just stupid, and was probably designed by a deranged monkey
5503c85e 1591on some serious mind-controlling substances."\(emLinus
ddc4d339
MK
1592.RE
1593.SH BUGS
b50582eb
MK
1594Currently, it is not possible to enable signal-driven
1595I/O by specifying
1596.B O_ASYNC
c13182ef 1597when calling
b50582eb
MK
1598.BR open ();
1599use
1600.BR fcntl (2)
1601to enable this flag.
0e1ad98c 1602.\" FIXME . Check bugzilla report on open(O_ASYNC)
92057f4d 1603.\" See http://bugzilla.kernel.org/show_bug.cgi?id=5993
0d730fcc
MK
1604
1605One must check for two different error codes,
1606.B EISDIR
1607and
1608.BR ENOENT ,
1609when trying to determine whether the kernel supports
0d55b37f 1610.B O_TMPFILE
0d730fcc 1611functionality.
320f8a8e
MK
1612
1613When both
1614.B O_CREAT
1615and
1616.B O_DIRECTORY
1617are specified in
1618.IR flags
1619and the file specified by
1620.I pathname
1621does not exist,
1622.BR open ()
1623will create a regular file (i.e.,
1624.B O_DIRECTORY
1625is ignored).
47297adb 1626.SH SEE ALSO
a3bf8022
MK
1627.BR chmod (2),
1628.BR chown (2),
fea681da 1629.BR close (2),
e366dbc4 1630.BR dup (2),
fea681da
MK
1631.BR fcntl (2),
1632.BR link (2),
1f6ceb40 1633.BR lseek (2),
fea681da 1634.BR mknod (2),
e366dbc4 1635.BR mmap (2),
f0c34053 1636.BR mount (2),
fa5d243f 1637.BR open_by_handle_at (2),
fea681da
MK
1638.BR read (2),
1639.BR socket (2),
1640.BR stat (2),
1641.BR umask (2),
1642.BR unlink (2),
1643.BR write (2),
1644.BR fopen (3),
b31056e3 1645.BR acl (5),
f0c34053 1646.BR fifo (7),
a9cfde1d
MK
1647.BR path_resolution (7),
1648.BR symlink (7)