]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/open.2
statfs.2: Add some comments noting filesystems that are no longer current
[thirdparty/man-pages.git] / man2 / open.2
CommitLineData
fea681da 1.\" This manpage is Copyright (C) 1992 Drew Eckhardt;
fd185f58
MK
2.\" and Copyright (C) 1993 Michael Haardt, Ian Jackson.
3.\" and Copyright (C) 2008 Greg Banks
7b8ba76c 4.\" and Copyright (C) 2006, 2008, 2013, 2014 Michael Kerrisk <mtk.manpages@gmail.com>
fea681da 5.\"
93015253 6.\" %%%LICENSE_START(VERBATIM)
fea681da
MK
7.\" Permission is granted to make and distribute verbatim copies of this
8.\" manual provided the copyright notice and this permission notice are
9.\" preserved on all copies.
10.\"
11.\" Permission is granted to copy and distribute modified versions of this
12.\" manual under the conditions for verbatim copying, provided that the
13.\" entire resulting derived work is distributed under the terms of a
14.\" permission notice identical to this one.
c13182ef 15.\"
fea681da
MK
16.\" Since the Linux kernel and libraries are constantly changing, this
17.\" manual page may be incorrect or out-of-date. The author(s) assume no
18.\" responsibility for errors or omissions, or for damages resulting from
19.\" the use of the information contained herein. The author(s) may not
20.\" have taken the same level of care in the production of this manual,
21.\" which is licensed free of charge, as they might when working
22.\" professionally.
c13182ef 23.\"
fea681da
MK
24.\" Formatted or processed versions of this manual, if unaccompanied by
25.\" the source, must acknowledge the copyright and authors of this work.
4b72fb64 26.\" %%%LICENSE_END
fea681da
MK
27.\"
28.\" Modified 1993-07-21 by Rik Faith <faith@cs.unc.edu>
29.\" Modified 1994-08-21 by Michael Haardt
30.\" Modified 1996-04-13 by Andries Brouwer <aeb@cwi.nl>
31.\" Modified 1996-05-13 by Thomas Koenig
32.\" Modified 1996-12-20 by Michael Haardt
33.\" Modified 1999-02-19 by Andries Brouwer <aeb@cwi.nl>
34.\" Modified 1998-11-28 by Joseph S. Myers <jsm28@hermes.cam.ac.uk>
35.\" Modified 1999-06-03 by Michael Haardt
c11b1abf
MK
36.\" Modified 2002-05-07 by Michael Kerrisk <mtk.manpages@gmail.com>
37.\" Modified 2004-06-23 by Michael Kerrisk <mtk.manpages@gmail.com>
1c1e15ed
MK
38.\" 2004-12-08, mtk, reordered flags list alphabetically
39.\" 2004-12-08, Martin Pool <mbp@sourcefrog.net> (& mtk), added O_NOATIME
fe75ec04 40.\" 2007-09-18, mtk, Added description of O_CLOEXEC + other minor edits
447bb15e 41.\" 2008-01-03, mtk, with input from Trond Myklebust
f4b9d6a5
MK
42.\" <trond.myklebust@fys.uio.no> and Timo Sirainen <tss@iki.fi>
43.\" Rewrite description of O_EXCL.
ddc4d339
MK
44.\" 2008-01-11, Greg Banks <gnb@melbourne.sgi.com>: add more detail
45.\" on O_DIRECT.
d77eb764 46.\" 2008-02-26, Michael Haardt: Reorganized text for O_CREAT and mode
fea681da 47.\"
61b7c1e1 48.\" FIXME . Apr 08: The next POSIX revision has O_EXEC, O_SEARCH, and
9f91e36c
MK
49.\" O_TTYINIT. Eventually these may need to be documented. --mtk
50.\"
4b8c67d9 51.TH OPEN 2 2017-09-15 "Linux" "Linux Programmer's Manual"
fea681da 52.SH NAME
7b8ba76c 53open, openat, creat \- open and possibly create a file
fea681da
MK
54.SH SYNOPSIS
55.nf
56.B #include <sys/types.h>
57.B #include <sys/stat.h>
58.B #include <fcntl.h>
5355ff82 59.PP
fea681da
MK
60.BI "int open(const char *" pathname ", int " flags );
61.BI "int open(const char *" pathname ", int " flags ", mode_t " mode );
5355ff82 62.PP
fea681da 63.BI "int creat(const char *" pathname ", mode_t " mode );
5355ff82 64.PP
7b8ba76c
MK
65.BI "int openat(int " dirfd ", const char *" pathname ", int " flags );
66.BI "int openat(int " dirfd ", const char *" pathname ", int " flags \
67", mode_t " mode );
fea681da 68.fi
5355ff82 69.PP
7b8ba76c
MK
70.in -4n
71Feature Test Macro Requirements for glibc (see
72.BR feature_test_macros (7)):
73.in
5355ff82 74.PP
7b8ba76c
MK
75.BR openat ():
76.PD 0
77.ad l
78.RS 4
79.TP 4
80Since glibc 2.10:
b0da7b8b 81_POSIX_C_SOURCE\ >=\ 200809L
7b8ba76c
MK
82.TP
83Before glibc 2.10:
84_ATFILE_SOURCE
85.RE
86.ad
87.PD
fea681da 88.SH DESCRIPTION
e366dbc4 89Given a
0daa9e92 90.I pathname
e366dbc4 91for a file,
1f6ceb40 92.BR open ()
2fda57bd 93returns a file descriptor, a small, nonnegative integer
e366dbc4
MK
94for use in subsequent system calls
95.RB ( read "(2), " write "(2), " lseek "(2), " fcntl "(2), etc.)."
96The file descriptor returned by a successful call will be
2c4bff36 97the lowest-numbered file descriptor not currently open for the process.
e366dbc4 98.PP
fe75ec04 99By default, the new file descriptor is set to remain open across an
e366dbc4 100.BR execve (2)
1f6ceb40
MK
101(i.e., the
102.B FD_CLOEXEC
103file descriptor flag described in
31d79098
SP
104.BR fcntl (2)
105is initially disabled); the
fe75ec04 106.B O_CLOEXEC
d6a74b95 107flag, described below, can be used to change this default.
1f6ceb40 108The file offset is set to the beginning of the file (see
c13182ef 109.BR lseek (2)).
e366dbc4
MK
110.PP
111A call to
112.BR open ()
113creates a new
114.IR "open file description" ,
115an entry in the system-wide table of open files.
61b12e2b 116The open file description records the file offset and the file status flags
20ee63c1 117(see below).
61b12e2b 118A file descriptor is a reference to an open file description;
2c4bff36
MK
119this reference is unaffected if
120.I pathname
121is subsequently removed or modified to refer to a different file.
d20d9d33 122For further details on open file descriptions, see NOTES.
e366dbc4 123.PP
c4bb193f 124The argument
fea681da 125.I flags
e366dbc4
MK
126must include one of the following
127.IR "access modes" :
c7992edc 128.BR O_RDONLY ", " O_WRONLY ", or " O_RDWR .
e366dbc4
MK
129These request opening the file read-only, write-only, or read/write,
130respectively.
5355ff82 131.PP
bfe9ba67 132In addition, zero or more file creation flags and file status flags
c13182ef 133can be
fea681da 134.RI bitwise- or 'd
e366dbc4 135in
bfe9ba67 136.IR flags .
c13182ef
MK
137The
138.I file creation flags
139are
0e40804c 140.BR O_CLOEXEC ,
b072a788 141.BR O_CREAT ,
0e40804c
MK
142.BR O_DIRECTORY ,
143.BR O_EXCL ,
144.BR O_NOCTTY ,
145.BR O_NOFOLLOW ,
f2698a42 146.BR O_TMPFILE ,
0e40804c 147and
15fb5d03 148.BR O_TRUNC .
c13182ef
MK
149The
150.I file status flags
bfe9ba67 151are all of the remaining flags listed below.
0e40804c 152.\" SUSv4 divides the flags into:
93ee8f96
MK
153.\" * Access mode
154.\" * File creation
155.\" * File status
156.\" * Other (O_CLOEXEC, O_DIRECTORY, O_NOFOLLOW)
157.\" though it's not clear what the difference between "other" and
0e40804c
MK
158.\" "File creation" flags is. I raised an Aardvark to see if this
159.\" can be clarified in SUSv4; 10 Oct 2008.
160.\" http://thread.gmane.org/gmane.comp.standards.posix.austin.general/64/focus=67
161.\" TC1 (balloted in 2013), resolved this, so that those three constants
162.\" are also categorized" as file status flags.
163.\"
bfe9ba67 164The distinction between these two groups of flags is that
68210340
MK
165the file creation flags affect the semantics of the open operation itself,
166while the file status flags affect the semantics of subsequent I/O operations.
167The file status flags can be retrieved and (in some cases)
566b427d
MK
168modified; see
169.BR fcntl (2)
170for details.
5355ff82 171.PP
bfe9ba67 172The full list of file creation flags and file status flags is as follows:
fea681da 173.TP
1c1e15ed 174.B O_APPEND
c13182ef
MK
175The file is opened in append mode.
176Before each
0bfa087b 177.BR write (2),
1e568304 178the file offset is positioned at the end of the file,
1c1e15ed 179as if with
0bfa087b 180.BR lseek (2).
17efe87f 181The modification of the file offset and the write operation
20b8f0e2 182are performed as a single atomic step.
5355ff82 183.IP
1c1e15ed 184.B O_APPEND
9ee4a2b6 185may lead to corrupted files on NFS filesystems if more than one process
c13182ef 186appends data to a file at once.
a4391429
MK
187.\" For more background, see
188.\" http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=453946
189.\" http://nfs.sourceforge.net/
c13182ef 190This is because NFS does not support
1c1e15ed
MK
191appending to a file, so the client kernel has to simulate it, which
192can't be done without a race condition.
193.TP
194.B O_ASYNC
b50582eb 195Enable signal-driven I/O:
8bd58774
MK
196generate a signal
197.RB ( SIGIO
198by default, but this can be changed via
1c1e15ed
MK
199.BR fcntl (2))
200when input or output becomes possible on this file descriptor.
33a0ccb2 201This feature is available only for terminals, pseudoterminals,
1f6ceb40
MK
202sockets, and (since Linux 2.6) pipes and FIFOs.
203See
1c1e15ed
MK
204.BR fcntl (2)
205for further details.
9bde4908 206See also BUGS, below.
fe75ec04 207.TP
31c1f2b0 208.BR O_CLOEXEC " (since Linux 2.6.23)"
7fdec065 209.\" NOTE! several other man pages refer to this text
fe75ec04 210Enable the close-on-exec flag for the new file descriptor.
00d82ce8
MK
211.\" FIXME . for later review when Issue 8 is one day released...
212.\" POSIX proposes to fix many APIs that provide hidden FDs
213.\" http://austingroupbugs.net/tag_view_page.php?tag_id=8
214.\" http://austingroupbugs.net/view.php?id=368
24ec631f 215Specifying this flag permits a program to avoid additional
fe75ec04
MK
216.BR fcntl (2)
217.B F_SETFD
24ec631f 218operations to set the
0daa9e92 219.B FD_CLOEXEC
fe75ec04 220flag.
5355ff82 221.IP
7756d157
MK
222Note that the use of this flag is essential in some multithreaded programs,
223because using a separate
fe75ec04
MK
224.BR fcntl (2)
225.B F_SETFD
226operation to set the
0daa9e92 227.B FD_CLOEXEC
fe75ec04 228flag does not suffice to avoid race conditions
7756d157
MK
229where one thread opens a file descriptor and
230attempts to set its close-on-exec flag using
231.BR fcntl (2)
232at the same time as another thread does a
fe75ec04
MK
233.BR fork (2)
234plus
235.BR execve (2).
7756d157 236Depending on the order of execution,
30821db8 237the race may lead to the file descriptor returned by
7756d157
MK
238.BR open ()
239being unintentionally leaked to the program executed by the child process
240created by
241.BR fork (2).
242(This kind of race is in principle possible for any system call
243that creates a file descriptor whose close-on-exec flag should be set,
244and various other Linux system calls provide an equivalent of the
245.BR O_CLOEXEC
246flag to deal with this problem.)
fe75ec04 247.\" This flag fixes only one form of the race condition;
d9cb0d7d 248.\" The race can also occur with, for example, file descriptors
fe75ec04 249.\" returned by accept(), pipe(), etc.
1c1e15ed 250.TP
fea681da 251.B O_CREAT
f1ad56a6 252If the file does not exist, it will be created.
5355ff82 253.IP
40169a93 254The owner (user ID) of the new file is set to the effective user ID
c13182ef 255of the process.
5355ff82 256.IP
ddf5e4ab
MK
257The group ownership (group ID) of the new file is set either to
258the effective group ID of the process (System V semantics)
259or to the group ID of the parent directory (BSD semantics).
260On Linux, the behavior depends on whether the
261set-group-ID mode bit is set on the parent directory:
262if that bit is set, then BSD semantics apply;
263otherwise, System V semantics apply.
264For some filesystems, the behavior also depends on the
fea681da
MK
265.I bsdgroups
266and
267.I sysvgroups
ddf5e4ab 268mount options described in
fea681da 269.BR mount (8)).
8b39ad66
MK
270.\" As at 2.6.25, bsdgroups is supported by ext2, ext3, ext4, and
271.\" XFS (since 2.6.14).
4e698277
MK
272.RS
273.PP
1bab84a8 274The
4e698277 275.I mode
1bab84a8 276argument specifies the file mode bits be applied when a new file is created.
4e698277
MK
277This argument must be supplied when
278.B O_CREAT
f2698a42
AL
279or
280.B O_TMPFILE
4e698277
MK
281is specified in
282.IR flags ;
f2698a42 283if neither
4e698277 284.B O_CREAT
f2698a42
AL
285nor
286.B O_TMPFILE
287is specified, then
4e698277
MK
288.I mode
289is ignored.
58222012 290The effective mode is modified by the process's
4e698277 291.I umask
58222012
MK
292in the usual way: in the absence of a default ACL, the mode of the
293created file is
84a275c4 294.IR "(mode\ &\ ~umask)" .
33a0ccb2 295Note that this mode applies only to future accesses of the
4e698277
MK
296newly created file; the
297.BR open ()
298call that creates a read-only file may well return a read/write
299file descriptor.
300.PP
301The following symbolic constants are provided for
302.IR mode :
303.TP 9
304.B S_IRWXU
97d5b762 30500700 user (file owner) has read, write, and execute permission
4e698277
MK
306.TP
307.B S_IRUSR
30800400 user has read permission
309.TP
310.B S_IWUSR
31100200 user has write permission
312.TP
313.B S_IXUSR
31400100 user has execute permission
315.TP
316.B S_IRWXG
97d5b762 31700070 group has read, write, and execute permission
4e698277
MK
318.TP
319.B S_IRGRP
32000040 group has read permission
321.TP
322.B S_IWGRP
32300020 group has write permission
324.TP
325.B S_IXGRP
32600010 group has execute permission
327.TP
328.B S_IRWXO
97d5b762 32900007 others have read, write, and execute permission
4e698277
MK
330.TP
331.B S_IROTH
33200004 others have read permission
333.TP
334.B S_IWOTH
33500002 others have write permission
336.TP
337.B S_IXOTH
33800001 others have execute permission
339.RE
9e1d8950
MK
340.IP
341According to POSIX, the effect when other bits are set in
342.I mode
343is unspecified.
344On Linux, the following bits are also honored in
345.IR mode :
346.RS
347.TP 9
348.B S_ISUID
3490004000 set-user-ID bit
350.TP
351.B S_ISGID
3520002000 set-group-ID bit (see
e6fc1596 353.BR inode (7)).
9e1d8950
MK
354.TP
355.B S_ISVTX
3560001000 sticky bit (see
e6fc1596 357.BR inode (7)).
9e1d8950 358.RE
fea681da 359.TP
31c1f2b0 360.BR O_DIRECT " (since Linux 2.4.10)"
1c1e15ed
MK
361Try to minimize cache effects of the I/O to and from this file.
362In general this will degrade performance, but it is useful in
363special situations, such as when applications do their own caching.
bce0482f 364File I/O is done directly to/from user-space buffers.
015221ef
CH
365The
366.B O_DIRECT
0deb3ce9 367flag on its own makes an effort to transfer data synchronously,
015221ef
CH
368but does not give the guarantees of the
369.B O_SYNC
0deb3ce9
JM
370flag that data and necessary metadata are transferred.
371To guarantee synchronous I/O,
015221ef
CH
372.B O_SYNC
373must be used in addition to
374.BR O_DIRECT .
be02e49f 375See NOTES below for further discussion.
5355ff82 376.IP
c13182ef 377A semantically similar (but deprecated) interface for block devices
9b54d4fa 378is described in
1c1e15ed
MK
379.BR raw (8).
380.TP
381.B O_DIRECTORY
a8d55537 382If \fIpathname\fP is not a directory, cause the open to fail.
9f8d688a
MK
383.\" But see the following and its replies:
384.\" http://marc.theaimsgroup.com/?t=112748702800001&r=1&w=2
385.\" [PATCH] open: O_DIRECTORY and O_CREAT together should fail
386.\" O_DIRECTORY | O_CREAT causes O_DIRECTORY to be ignored.
65496644 387This flag was added in kernel version 2.1.126, to
60a90ecd
MK
388avoid denial-of-service problems if
389.BR opendir (3)
390is called on a
a3041a58 391FIFO or tape device.
1c1e15ed 392.TP
6cf19e62
MK
393.B O_DSYNC
394Write operations on the file will complete according to the requirements of
395synchronized I/O
396.I data
397integrity completion.
5355ff82 398.IP
6cf19e62
MK
399By the time
400.BR write (2)
401(and similar)
402return, the output data
403has been transferred to the underlying hardware,
404along with any file metadata that would be required to retrieve that data
405(i.e., as though each
406.BR write (2)
407was followed by a call to
408.BR fdatasync (2)).
409.IR "See NOTES below" .
410.TP
fea681da 411.B O_EXCL
f4b9d6a5
MK
412Ensure that this call creates the file:
413if this flag is specified in conjunction with
fea681da 414.BR O_CREAT ,
f4b9d6a5
MK
415and
416.I pathname
417already exists, then
1c1e15ed 418.BR open ()
c13182ef 419will fail.
5355ff82 420.IP
f4b9d6a5
MK
421When these two flags are specified, symbolic links are not followed:
422.\" POSIX.1-2001 explicitly requires this behavior.
423if
424.I pathname
425is a symbolic link, then
426.BR open ()
427fails regardless of where the symbolic link points to.
5355ff82 428.IP
10b7a945
IHV
429In general, the behavior of
430.B O_EXCL
431is undefined if it is used without
432.BR O_CREAT .
433There is one exception: on Linux 2.6 and later,
434.B O_EXCL
435can be used without
436.B O_CREAT
437if
438.I pathname
439refers to a block device.
6303d401
DB
440If the block device is in use by the system (e.g., mounted),
441.BR open ()
10b7a945
IHV
442fails with the error
443.BR EBUSY .
5355ff82 444.IP
efe08656 445On NFS,
f4b9d6a5 446.B O_EXCL
33a0ccb2 447is supported only when using NFSv3 or later on kernel 2.6 or later.
efe08656 448In NFS environments where
fea681da 449.B O_EXCL
f4b9d6a5
MK
450support is not provided, programs that rely on it
451for performing locking tasks will contain a race condition.
452Portable programs that want to perform atomic file locking using a lockfile,
453and need to avoid reliance on NFS support for
454.BR O_EXCL ,
455can create a unique file on
9ee4a2b6 456the same filesystem (e.g., incorporating hostname and PID), and use
fea681da 457.BR link (2)
c13182ef 458to make a link to the lockfile.
60a90ecd
MK
459If
460.BR link (2)
f4b9d6a5 461returns 0, the lock is successful.
c13182ef 462Otherwise, use
fea681da
MK
463.BR stat (2)
464on the unique file to check if its link count has increased to 2,
465in which case the lock is also successful.
466.TP
1c1e15ed
MK
467.B O_LARGEFILE
468(LFS)
469Allow files whose sizes cannot be represented in an
8478ee02 470.I off_t
1c1e15ed 471(but can be represented in an
8478ee02 472.IR off64_t )
1c1e15ed 473to be opened.
c13182ef 474The
bcdd964e 475.B _LARGEFILE64_SOURCE
e417acb0
MK
476macro must be defined
477(before including
478.I any
479header files)
480in order to obtain this definition.
c13182ef 481Setting the
bcdd964e 482.B _FILE_OFFSET_BITS
9f3d8b28
MK
483feature test macro to 64 (rather than using
484.BR O_LARGEFILE )
12e263f1 485is the preferred
9f3d8b28 486method of accessing large files on 32-bit systems (see
2dcbf4f7 487.BR feature_test_macros (7)).
1c1e15ed 488.TP
31c1f2b0 489.BR O_NOATIME " (since Linux 2.6.8)"
1bb72c96
MK
490Do not update the file last access time
491.RI ( st_atime
492in the inode)
310b7919 493when the file is
1c1e15ed 494.BR read (2).
5355ff82 495.IP
47c906e5
MK
496This flag can be employed only if one of the following conditions is true:
497.RS
498.IP * 3
499The effective UID of the process
500.\" Strictly speaking: the filesystem UID
501matches the owner UID of the file.
502.IP *
503The calling process has the
504.BR CAP_FOWNER
505capability in its user namespace and
506the owner UID of the file has a mapping in the namespace.
507.RE
508.IP
1c1e15ed
MK
509This flag is intended for use by indexing or backup programs,
510where its use can significantly reduce the amount of disk activity.
9ee4a2b6 511This flag may not be effective on all filesystems.
1c1e15ed 512One example is NFS, where the server maintains the access time.
0e1ad98c 513.\" The O_NOATIME flag also affects the treatment of st_atime
92057f4d 514.\" by mmap() and readdir(2), MTK, Dec 04.
1c1e15ed 515.TP
fea681da
MK
516.B O_NOCTTY
517If
518.I pathname
5503c85e 519refers to a terminal device\(emsee
1bb72c96
MK
520.BR tty (4)\(emit
521will not become the process's controlling terminal even if the
fea681da
MK
522process does not have one.
523.TP
1c1e15ed 524.B O_NOFOLLOW
6ccb7137
MK
525If \fIpathname\fP is a symbolic link, then the open fails, with the error
526.BR ELOOP .
7fba0065
MK
527Symbolic links in earlier components of the pathname will still be
528followed.
529(Note that the
530.B ELOOP
531error that can occur in this case is indistinguishable from the case where
6ccb7137
MK
532an open fails because there are too many symbolic links found
533while resolving components in the prefix part of the pathname.)
5355ff82 534.IP
8db11e23
MK
535This flag is a FreeBSD extension, which was added to Linux in version 2.1.126,
536and has subsequently been standardized in POSIX.1-2008.
5355ff82 537.IP
1135dbe1 538See also
843068bd 539.BR O_PATH
1135dbe1 540below.
e366dbc4
MK
541.\" The headers from glibc 2.0.100 and later include a
542.\" definition of this flag; \fIkernels before 2.1.126 will ignore it if
a8d55537 543.\" used\fP.
fea681da
MK
544.TP
545.BR O_NONBLOCK " or " O_NDELAY
ff40dbb3 546When possible, the file is opened in nonblocking mode.
c13182ef 547Neither the
1c1e15ed 548.BR open ()
fea681da
MK
549nor any subsequent operations on the file descriptor which is
550returned will cause the calling process to wait.
5355ff82 551.IP
9f629381
MK
552Note that this flag has no effect for regular files and block devices;
553that is, I/O operations will (briefly) block when device activity
554is required, regardless of whether
555.B O_NONBLOCK
556is set.
557Since
558.B O_NONBLOCK
559semantics might eventually be implemented,
560applications should not depend upon blocking behavior
561when specifying this flag for regular files and block devices.
5355ff82 562.IP
fea681da 563For the handling of FIFOs (named pipes), see also
af5b2ef2 564.BR fifo (7).
db28bfac 565For a discussion of the effect of
0daa9e92 566.B O_NONBLOCK
db28bfac
MK
567in conjunction with mandatory file locks and with file leases, see
568.BR fcntl (2).
fea681da 569.TP
1135dbe1
MK
570.BR O_PATH " (since Linux 2.6.39)"
571.\" commit 1abf0c718f15a56a0a435588d1b104c7a37dc9bd
572.\" commit 326be7b484843988afe57566b627fb7a70beac56
573.\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d
574.\"
575.\" http://thread.gmane.org/gmane.linux.man/2790/focus=3496
576.\" Subject: Re: [PATCH] open(2): document O_PATH
577.\" Newsgroups: gmane.linux.man, gmane.linux.kernel
578.\"
1135dbe1 579Obtain a file descriptor that can be used for two purposes:
9ee4a2b6 580to indicate a location in the filesystem tree and
1135dbe1
MK
581to perform operations that act purely at the file descriptor level.
582The file itself is not opened, and other file operations (e.g.,
583.BR read (2),
584.BR write (2),
585.BR fchmod (2),
586.BR fchown (2),
2510e4e5 587.BR fgetxattr (2),
97a45d02 588.BR ioctl (2),
2510e4e5 589.BR mmap (2))
1135dbe1
MK
590fail with the error
591.BR EBADF .
5355ff82 592.IP
1135dbe1
MK
593The following operations
594.I can
595be performed on the resulting file descriptor:
596.RS
597.IP * 3
b9307a4a
MK
598.BR close (2).
599.IP *
1135dbe1 600.BR fchdir (2)
b9307a4a 601(since Linux 3.5).
1135dbe1 602.\" commit 332a2e1244bd08b9e3ecd378028513396a004a24
b9307a4a 603.IP *
1135dbe1 604.BR fstat (2)
b9307a4a
MK
605(since Linux 3.6).
606.IP *
1135dbe1 607.\" fstat(): commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2
97a45d02
N
608.BR fstatfs (2)
609(since Linux 3.12).
610.\" fstatfs(): commit 9d05746e7b16d8565dddbe3200faa1e669d23bbf
1135dbe1
MK
611.IP *
612Duplicating the file descriptor
613.RB ( dup (2),
614.BR fcntl (2)
615.BR F_DUPFD ,
616etc.).
617.IP *
618Getting and setting file descriptor flags
619.RB ( fcntl (2)
620.BR F_GETFD
621and
622.BR F_SETFD ).
09f677a3
MK
623.IP *
624Retrieving open file status flags using the
625.BR fcntl (2)
13a082cb 626.BR F_GETFL
09f677a3
MK
627operation: the returned flags will include the bit
628.BR O_PATH .
1135dbe1
MK
629.IP *
630Passing the file descriptor as the
631.IR dirfd
632argument of
490f876a 633.BR openat ()
1135dbe1 634and the other "*at()" system calls.
7dee406b
AL
635This includes
636.BR linkat (2)
637with
0da5e58a 638.BR AT_EMPTY_PATH
7dee406b
AL
639(or via procfs using
640.BR AT_SYMLINK_FOLLOW )
641even if the file is not a directory.
1135dbe1
MK
642.IP *
643Passing the file descriptor to another process via a UNIX domain socket
644(see
645.BR SCM_RIGHTS
646in
647.BR unix (7)).
648.RE
649.IP
650When
651.B O_PATH
652is specified in
653.IR flags ,
654flag bits other than
6807fc6f
MK
655.BR O_CLOEXEC ,
656.BR O_DIRECTORY ,
1135dbe1
MK
657and
658.BR O_NOFOLLOW
659are ignored.
5355ff82 660.IP
d30344ab
MK
661If
662.I pathname
663is a symbolic link and the
1135dbe1
MK
664.BR O_NOFOLLOW
665flag is also specified,
666then the call returns a file descriptor referring to the symbolic link.
667This file descriptor can be used as the
668.I dirfd
669argument in calls to
670.BR fchownat (2),
671.BR fstatat (2),
672.BR linkat (2),
673and
674.BR readlinkat (2)
675with an empty pathname to have the calls operate on the symbolic link.
5355ff82 676.IP
97a45d02
N
677If
678.I pathname
679refers to an automount point that has not yet been triggered, so no
680other filesystem is mounted on it, then the call returns a file
681descriptor referring to the automount directory without triggering a mount.
682.BR fstatfs (2)
683can then be used to determine if it is, in fact, an untriggered
684automount point
685.RB ( ".f_type == AUTOFS_SUPER_MAGIC" ).
d1304ede
MK
686.IP
687One use of
688.B O_PATH
689for regular files is to provide the equivalent of POSIX.1's
690.B O_EXEC
691functionality.
692This permits us to open a file for which we have execute
ebab32e1 693permission but not read permission, and then execute that file,
d1304ede
MK
694with steps something like the following:
695.IP
696.in +4n
697.EX
698char buf[PATH_MAX];
699fd = open("some_prog", O_PATH);
700snprintf(buf, "/proc/self/fd/%d", fd);
701execl(buf, "some_prog", (char *) NULL);
702.EE
703.in
e982cebf
MK
704.IP
705An
706.B O_PATH
707file descriptor can also be passed as the argument of
708.BR fexecve (3).
1135dbe1 709.TP
fea681da 710.B O_SYNC
6cf19e62
MK
711Write operations on the file will complete according to the requirements of
712synchronized I/O
713.I file
714integrity completion
f36a1468 715(by contrast with the
6cf19e62
MK
716synchronized I/O
717.I data
718integrity completion
719provided by
720.BR O_DSYNC .)
5355ff82 721.IP
6cf19e62
MK
722By the time
723.BR write (2)
724(and similar)
725return, the output data and associated file metadata
726have been transferred to the underlying hardware
727(i.e., as though each
728.BR write (2)
729was followed by a call to
730.BR fsync (2)).
731.IR "See NOTES below" .
fea681da 732.TP
40398c1a
MK
733.BR O_TMPFILE " (since Linux 3.11)"
734.\" commit 60545d0d4610b02e55f65d141c95b18ccf855b6e
735.\" commit f4e0c30c191f87851c4a53454abb55ee276f4a7e
736.\" commit bb458c644a59dbba3a1fe59b27106c5e68e1c4bd
737Create an unnamed temporary file.
738The
739.I pathname
740argument specifies a directory;
741an unnamed inode will be created in that directory's filesystem.
742Anything written to the resulting file will be lost when
743the last file descriptor is closed, unless the file is given a name.
5355ff82 744.IP
40398c1a
MK
745.B O_TMPFILE
746must be specified with one of
747.B O_RDWR
748or
749.B O_WRONLY
750and, optionally,
751.BR O_EXCL .
752If
753.B O_EXCL
754is not specified, then
755.BR linkat (2)
756can be used to link the temporary file into the filesystem, making it
757permanent, using code like the following:
5355ff82 758.IP
40398c1a 759.in +4n
5355ff82 760.EX
40398c1a
MK
761char path[PATH_MAX];
762fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
0fb83d00
MK
763 S_IRUSR | S_IWUSR);
764
40398c1a 765/* File I/O on 'fd'... */
0fb83d00 766
40398c1a 767snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
e1252130 768linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
0fb83d00 769 AT_SYMLINK_FOLLOW);
5355ff82 770.EE
40398c1a 771.in
5355ff82 772.IP
40398c1a
MK
773In this case,
774the
775.BR open ()
776.I mode
777argument determines the file permission mode, as with
778.BR O_CREAT .
5355ff82 779.IP
0115aaed
MK
780Specifying
781.B O_EXCL
782in conjunction with
783.B O_TMPFILE
784prevents a temporary file from being linked into the filesystem
785in the above manner.
786(Note that the meaning of
787.B O_EXCL
788in this case is different from the meaning of
789.B O_EXCL
790otherwise.)
5355ff82 791.IP
40398c1a
MK
792There are two main use cases for
793.\" Inspired by http://lwn.net/Articles/559147/
794.BR O_TMPFILE :
795.RS
796.IP * 3
797Improved
798.BR tmpfile (3)
799functionality: race-free creation of temporary files that
800(1) are automatically deleted when closed;
801(2) can never be reached via any pathname;
802(3) are not subject to symlink attacks; and
803(4) do not require the caller to devise unique names.
804.IP *
805Creating a file that is initially invisible, which is then populated
8b04592d 806with data and adjusted to have appropriate filesystem attributes
c89a9937
EB
807.RB ( fchown (2),
808.BR fchmod (2),
40398c1a
MK
809.BR fsetxattr (2),
810etc.)
811before being atomically linked into the filesystem
812in a fully formed state (using
813.BR linkat (2)
814as described above).
815.RE
816.IP
817.B O_TMPFILE
818requires support by the underlying filesystem;
40398c1a 819only a subset of Linux filesystems provide that support.
cde2074a 820In the initial implementation, support was provided in
9af6b115 821the ext2, ext3, ext4, UDF, Minix, and shmem filesystems.
bd79a35a 822.\" To check for support, grep for "tmpfile" in kernel sources
6065b906
MK
823Support for other filesystems has subsequently been added as follows:
824XFS (Linux 3.15);
cde2074a
MK
825.\" commit 99b6436bc29e4f10e4388c27a3e4810191cc4788
826.\" commit ab29743117f9f4c22ac44c13c1647fb24fb2bafe
1b9d5819 827Btrfs (Linux 3.16);
e746db2e 828.\" commit ef3b9af50bfa6a1f02cd7b3f5124b712b1ba3e3c
6065b906 829F2FS (Linux 3.16);
bd79a35a 830.\" commit 50732df02eefb39ab414ef655979c2c9b64ad21c
6065b906 831and ubifs (Linux 4.9)
40398c1a 832.TP
1c1e15ed 833.B O_TRUNC
4d61d36a 834If the file already exists and is a regular file and the access mode allows
682edefb
MK
835writing (i.e., is
836.B O_RDWR
837or
838.BR O_WRONLY )
839it will be truncated to length 0.
840If the file is a FIFO or terminal device file, the
841.B O_TRUNC
c13182ef 842flag is ignored.
2b9b829d 843Otherwise, the effect of
682edefb
MK
844.B O_TRUNC
845is unspecified.
7b8ba76c 846.SS creat()
1f7191bb 847A call to
1c1e15ed 848.BR creat ()
1f7191bb 849is equivalent to calling
1c1e15ed 850.BR open ()
fea681da
MK
851with
852.I flags
853equal to
854.BR O_CREAT|O_WRONLY|O_TRUNC .
7b8ba76c
MK
855.SS openat()
856The
857.BR openat ()
858system call operates in exactly the same way as
cadd38ba 859.BR open (),
7b8ba76c 860except for the differences described here.
5355ff82 861.IP
7b8ba76c
MK
862If the pathname given in
863.I pathname
864is relative, then it is interpreted relative to the directory
3ad65ff0 865referred to by the file descriptor
7b8ba76c
MK
866.I dirfd
867(rather than relative to the current working directory of
868the calling process, as is done by
cadd38ba 869.BR open ()
7b8ba76c 870for a relative pathname).
5355ff82 871.IP
7b8ba76c
MK
872If
873.I pathname
874is relative and
875.I dirfd
876is the special value
877.BR AT_FDCWD ,
878then
879.I pathname
880is interpreted relative to the current working
881directory of the calling process (like
cadd38ba 882.BR open ()).
5355ff82 883.IP
7b8ba76c
MK
884If
885.I pathname
886is absolute, then
887.I dirfd
888is ignored.
47297adb 889.SH RETURN VALUE
7b8ba76c
MK
890.BR open (),
891.BR openat (),
c13182ef 892and
e1d6264d 893.BR creat ()
1c1e15ed
MK
894return the new file descriptor, or \-1 if an error occurred
895(in which case,
fea681da
MK
896.I errno
897is set appropriately).
fea681da 898.SH ERRORS
7b8ba76c
MK
899.BR open (),
900.BR openat (),
901and
902.BR creat ()
903can fail with the following errors:
fea681da
MK
904.TP
905.B EACCES
906The requested access to the file is not allowed, or search permission
907is denied for one of the directories in the path prefix of
908.IR pathname ,
909or the file did not exist yet and write access to the parent directory
910is not allowed.
911(See also
ad7cc990 912.BR path_resolution (7).)
fea681da 913.TP
a1f01685
MH
914.B EDQUOT
915Where
916.B O_CREAT
917is specified, the file does not exist, and the user's quota of disk
9ee4a2b6 918blocks or inodes on the filesystem has been exhausted.
a1f01685 919.TP
fea681da
MK
920.B EEXIST
921.I pathname
922already exists and
923.BR O_CREAT " and " O_EXCL
924were used.
925.TP
926.B EFAULT
0daa9e92 927.I pathname
e1d6264d 928points outside your accessible address space.
fea681da 929.TP
9f5773f7 930.B EFBIG
7c7fb552
MK
931See
932.BR EOVERFLOW .
9f5773f7 933.TP
e51412ea
MK
934.B EINTR
935While blocked waiting to complete an open of a slow device
936(e.g., a FIFO; see
937.BR fifo (7)),
938the call was interrupted by a signal handler; see
939.BR signal (7).
940.TP
ef490193
DG
941.B EINVAL
942The filesystem does not support the
943.BR O_DIRECT
e6f89ed2
MK
944flag.
945See
ef490193
DG
946.BR NOTES
947for more information.
948.TP
8e335391
MK
949.B EINVAL
950Invalid value in
951.\" In particular, __O_TMPFILE instead of O_TMPFILE
952.IR flags .
953.TP
954.B EINVAL
955.B O_TMPFILE
956was specified in
957.IR flags ,
958but neither
959.B O_WRONLY
960nor
961.B O_RDWR
962was specified.
963.TP
fea681da
MK
964.B EISDIR
965.I pathname
966refers to a directory and the access requested involved writing
967(that is,
968.B O_WRONLY
969or
970.B O_RDWR
971is set).
972.TP
8e335391 973.B EISDIR
843068bd
MK
974.I pathname
975refers to an existing directory,
8e335391
MK
976.B O_TMPFILE
977and one of
978.B O_WRONLY
979or
980.B O_RDWR
981were specified in
982.IR flags ,
983but this kernel version does not provide the
984.B O_TMPFILE
985functionality.
986.TP
fea681da
MK
987.B ELOOP
988Too many symbolic links were encountered in resolving
289f7907
MK
989.IR pathname .
990.TP
991.B ELOOP
fea681da 992.I pathname
289f7907
MK
993was a symbolic link, and
994.I flags
995specified
996.BR O_NOFOLLOW
997but not
998.BR O_PATH .
fea681da
MK
999.TP
1000.B EMFILE
26c32fab 1001The per-process limit on the number of open file descriptors has been reached
12c21590
MK
1002(see the description of
1003.BR RLIMIT_NOFILE
1004in
1005.BR getrlimit (2)).
fea681da
MK
1006.TP
1007.B ENAMETOOLONG
0daa9e92 1008.I pathname
e1d6264d 1009was too long.
fea681da
MK
1010.TP
1011.B ENFILE
e258766b 1012The system-wide limit on the total number of open files has been reached.
fea681da
MK
1013.TP
1014.B ENODEV
1015.I pathname
1016refers to a device special file and no corresponding device exists.
682edefb
MK
1017(This is a Linux kernel bug; in this situation
1018.B ENXIO
1019must be returned.)
fea681da
MK
1020.TP
1021.B ENOENT
682edefb
MK
1022.B O_CREAT
1023is not set and the named file does not exist.
fea681da
MK
1024Or, a directory component in
1025.I pathname
1026does not exist or is a dangling symbolic link.
1027.TP
ba03011f
MK
1028.B ENOENT
1029.I pathname
1030refers to a nonexistent directory,
1031.B O_TMPFILE
1032and one of
1033.B O_WRONLY
1034or
1035.B O_RDWR
1036were specified in
1037.IR flags ,
1038but this kernel version does not provide the
1039.B O_TMPFILE
1040functionality.
1041.TP
fea681da 1042.B ENOMEM
8ef529f9
MK
1043The named file is a FIFO,
1044but memory for the FIFO buffer can't be allocated because
1045the per-user hard limit on memory allocation for pipes has been reached
1046and the caller is not privileged; see
1047.BR pipe (7).
1048.TP
1049.B ENOMEM
fea681da
MK
1050Insufficient kernel memory was available.
1051.TP
1052.B ENOSPC
1053.I pathname
1054was to be created but the device containing
1055.I pathname
1056has no room for the new file.
1057.TP
1058.B ENOTDIR
1059A component used as a directory in
1060.I pathname
a8d55537 1061is not, in fact, a directory, or \fBO_DIRECTORY\fP was specified and
fea681da
MK
1062.I pathname
1063was not a directory.
1064.TP
1065.B ENXIO
682edefb 1066.BR O_NONBLOCK " | " O_WRONLY
103ea4f6
MK
1067is set, the named file is a FIFO, and
1068no process has the FIFO open for reading.
7b032b23
MK
1069.TP
1070.B ENXIO
1071The file is a device special file and no corresponding device exists.
fea681da 1072.TP
bbe02b45
MK
1073.BR EOPNOTSUPP
1074The filesystem containing
1075.I pathname
1076does not support
1077.BR O_TMPFILE .
1078.TP
7c7fb552
MK
1079.B EOVERFLOW
1080.I pathname
1081refers to a regular file that is too large to be opened.
1082The usual scenario here is that an application compiled
1083on a 32-bit platform without
5e4dc269 1084.I -D_FILE_OFFSET_BITS=64
7c7fb552 1085tried to open a file whose size exceeds
4e1a4d72
MK
1086.I (1<<31)-1
1087bytes;
7c7fb552
MK
1088see also
1089.B O_LARGEFILE
1090above.
c84d3aa3 1091This is the error specified by POSIX.1;
7c7fb552
MK
1092in kernels before 2.6.24, Linux gave the error
1093.B EFBIG
1094for this case.
1095.\" See http://bugzilla.kernel.org/show_bug.cgi?id=7253
1096.\" "Open of a large file on 32-bit fails with EFBIG, should be EOVERFLOW"
1097.\" Reported 2006-10-03
1098.TP
1c1e15ed
MK
1099.B EPERM
1100The
1101.B O_NOATIME
1102flag was specified, but the effective user ID of the caller
9ee4a2b6 1103.\" Strictly speaking, it's the filesystem UID... (MTK)
47c906e5 1104did not match the owner of the file and the caller was not privileged.
1c1e15ed 1105.TP
fbab10e5
MK
1106.B EPERM
1107The operation was prevented by a file seal; see
1108.BR fcntl (2).
1109.TP
fea681da
MK
1110.B EROFS
1111.I pathname
9ee4a2b6 1112refers to a file on a read-only filesystem and write access was
fea681da
MK
1113requested.
1114.TP
1115.B ETXTBSY
1116.I pathname
1117refers to an executable image which is currently being executed and
1118write access was requested.
d3952311
MK
1119.TP
1120.B EWOULDBLOCK
1121The
1122.B O_NONBLOCK
1123flag was specified, and an incompatible lease was held on the file
1124(see
1125.BR fcntl (2)).
7b8ba76c
MK
1126.PP
1127The following additional errors can occur for
1128.BR openat ():
1129.TP
1130.B EBADF
1131.I dirfd
1132is not a valid file descriptor.
1133.TP
1134.B ENOTDIR
1135.I pathname
2feae602 1136is a relative pathname and
7b8ba76c
MK
1137.I dirfd
1138is a file descriptor referring to a file other than a directory.
1139.SH VERSIONS
1140.BR openat ()
1141was added to Linux in kernel 2.6.16;
1142library support was added to glibc in version 2.4.
47297adb 1143.SH CONFORMING TO
7b8ba76c
MK
1144.BR open (),
1145.BR creat ()
72ac7268 1146SVr4, 4.3BSD, POSIX.1-2001, POSIX.1-2008.
5355ff82 1147.PP
7b8ba76c
MK
1148.BR openat ():
1149POSIX.1-2008.
5355ff82 1150.PP
fea681da 1151The
72ac7268 1152.BR O_DIRECT ,
1c1e15ed 1153.BR O_NOATIME ,
72ac7268 1154.BR O_PATH ,
fea681da 1155and
72ac7268
MK
1156.BR O_TMPFILE
1157flags are Linux-specific.
1158One must define
61b7c1e1
MK
1159.B _GNU_SOURCE
1160to obtain their definitions.
5355ff82 1161.PP
9f91e36c 1162The
72ac7268
MK
1163.BR O_CLOEXEC ,
1164.BR O_DIRECTORY ,
1165and
1166.BR O_NOFOLLOW
1167flags are not specified in POSIX.1-2001,
1168but are specified in POSIX.1-2008.
1169Since glibc 2.12, one can obtain their definitions by defining either
1170.B _POSIX_C_SOURCE
1171with a value greater than or equal to 200809L or
1172.BR _XOPEN_SOURCE
1173with a value greater than or equal to 700.
1174In glibc 2.11 and earlier, one obtains the definitions by defining
1175.BR _GNU_SOURCE .
5355ff82 1176.PP
72ac7268
MK
1177As noted in
1178.BR feature_test_macros (7),
84fc2a6e 1179feature test macros such as
72ac7268
MK
1180.BR _POSIX_C_SOURCE ,
1181.BR _XOPEN_SOURCE ,
1182and
fe75ec04 1183.B _GNU_SOURCE
72ac7268 1184must be defined before including
e417acb0 1185.I any
72ac7268 1186header files.
a1d5f77c 1187.SH NOTES
988db661 1188Under Linux, the
a1d5f77c
MK
1189.B O_NONBLOCK
1190flag indicates that one wants to open
1191but does not necessarily have the intention to read or write.
1192This is typically used to open devices in order to get a file descriptor
1193for use with
1194.BR ioctl (2).
dd3568a1 1195.PP
fea681da
MK
1196The (undefined) effect of
1197.B O_RDONLY | O_TRUNC
c13182ef 1198varies among implementations.
bcdd964e 1199On many systems the file is actually truncated.
fea681da
MK
1200.\" Linux 2.0, 2.5: truncate
1201.\" Solaris 5.7, 5.8: truncate
1202.\" Irix 6.5: truncate
1203.\" Tru64 5.1B: truncate
1204.\" HP-UX 11.22: truncate
1205.\" FreeBSD 4.7: truncate
5355ff82 1206.PP
5dc8986d
MK
1207Note that
1208.BR open ()
1209can open device special files, but
1210.BR creat ()
1211cannot create them; use
1212.BR mknod (2)
1213instead.
5355ff82 1214.PP
5dc8986d
MK
1215If the file is newly created, its
1216.IR st_atime ,
1217.IR st_ctime ,
1218.I st_mtime
1219fields
1220(respectively, time of last access, time of last status change, and
1221time of last modification; see
1222.BR stat (2))
1223are set
1224to the current time, and so are the
1225.I st_ctime
1226and
1227.I st_mtime
1228fields of the
1229parent directory.
1230Otherwise, if the file is modified because of the
1231.B O_TRUNC
3a9c5a29
MK
1232flag, its
1233.I st_ctime
1234and
1235.I st_mtime
1236fields are set to the current time.
5355ff82 1237.PP
aaf7a574
MK
1238The files in the
1239.I /proc/[pid]/fd
1240directory show the open file descriptors of the process with the PID
1241.IR pid .
1242The files in the
1243.I /proc/[pid]/fdinfo
1244directory show even more information about these files descriptors.
1245See
1246.BR proc (5)
1247for further details of both of these directories.
5dc8986d
MK
1248.\"
1249.\"
d20d9d33
MK
1250.SS Open file descriptions
1251The term open file description is the one used by POSIX to refer to the
1252entries in the system-wide table of open files.
91085d85 1253In other contexts, this object is
d20d9d33
MK
1254variously also called an "open file object",
1255a "file handle", an "open file table entry",
1256or\(emin kernel-developer parlance\(ema
1257.IR "struct file" .
5355ff82 1258.PP
d20d9d33
MK
1259When a file descriptor is duplicated (using
1260.BR dup (2)
1261or similar),
1262the duplicate refers to the same open file description
1263as the original file descriptor,
1264and the two file descriptors consequently share
1265the file offset and file status flags.
1266Such sharing can also occur between processes:
1267a child process created via
91085d85 1268.BR fork (2)
d20d9d33
MK
1269inherits duplicates of its parent's file descriptors,
1270and those duplicates refer to the same open file descriptions.
5355ff82 1271.PP
d20d9d33 1272Each
bf7bc8b8 1273.BR open ()
d20d9d33
MK
1274of a file creates a new open file description;
1275thus, there may be multiple open file descriptions
1276corresponding to a file inode.
5355ff82 1277.PP
9539ebc9
MK
1278On Linux, one can use the
1279.BR kcmp (2)
1280.B KCMP_FILE
1281operation to test whether two file descriptors
1282(in the same process or in two different processes)
1283refer to the same open file description.
d20d9d33
MK
1284.\"
1285.\"
5dc8986d 1286.SS Synchronized I/O
6cf19e62
MK
1287The POSIX.1-2008 "synchronized I/O" option
1288specifies different variants of synchronized I/O,
1289and specifies the
1290.BR open ()
1291flags
015221ef
CH
1292.BR O_SYNC ,
1293.BR O_DSYNC ,
1294and
6cf19e62
MK
1295.BR O_RSYNC
1296for controlling the behavior.
1297Regardless of whether an implementation supports this option,
1298it must at least support the use of
1299.BR O_SYNC
1300for regular files.
5355ff82 1301.PP
89851a00 1302Linux implements
6cf19e62
MK
1303.BR O_SYNC
1304and
1305.BR O_DSYNC ,
1306but not
015221ef 1307.BR O_RSYNC .
6cf19e62
MK
1308(Somewhat incorrectly, glibc defines
1309.BR O_RSYNC
1310to have the same value as
1311.BR O_SYNC .)
5355ff82 1312.PP
6cf19e62
MK
1313.BR O_SYNC
1314provides synchronized I/O
1315.I file
1316integrity completion,
1317meaning write operations will flush data and all associated metadata
1318to the underlying hardware.
1319.BR O_DSYNC
1320provides synchronized I/O
1321.I data
1322integrity completion,
1323meaning write operations will flush data
1324to the underlying hardware,
1325but will only flush metadata updates that are required
1326to allow a subsequent read operation to complete successfully.
1327Data integrity completion can reduce the number of disk operations
1328that are required for applications that don't need the guarantees
1329of file integrity completion.
5355ff82 1330.PP
a83923ca 1331To understand the difference between the two types of completion,
6cf19e62
MK
1332consider two pieces of file metadata:
1333the file last modification timestamp
1334.RI ( st_mtime )
1335and the file length.
1336All write operations will update the last file modification timestamp,
1337but only writes that add data to the end of the
1338file will change the file length.
1339The last modification timestamp is not needed to ensure that
1340a read completes successfully, but the file length is.
1341Thus,
1342.BR O_DSYNC
1343would only guarantee to flush updates to the file length metadata
1344(whereas
1345.BR O_SYNC
1346would also always flush the last modification timestamp metadata).
5355ff82 1347.PP
6cf19e62
MK
1348Before Linux 2.6.33, Linux implemented only the
1349.BR O_SYNC
89851a00 1350flag for
6cf19e62
MK
1351.BR open ().
1352However, when that flag was specified,
1353most filesystems actually provided the equivalent of synchronized I/O
1354.I data
1355integrity completion (i.e.,
1356.BR O_SYNC
1357was actually implemented as the equivalent of
1358.BR O_DSYNC ).
5355ff82 1359.PP
6cf19e62
MK
1360Since Linux 2.6.33, proper
1361.BR O_SYNC
1362support is provided.
1363However, to ensure backward binary compatibility,
1364.BR O_DSYNC
1365was defined with the same value as the historical
015221ef 1366.BR O_SYNC ,
015221ef 1367and
6cf19e62 1368.BR O_SYNC
89851a00 1369was defined as a new (two-bit) flag value that includes the
6cf19e62
MK
1370.BR O_DSYNC
1371flag value.
1372This ensures that applications compiled against
1373new headers get at least
1374.BR O_DSYNC
1375semantics on pre-2.6.33 kernels.
5dc8986d
MK
1376.\"
1377.\"
1378.SS NFS
1379There are many infelicities in the protocol underlying NFS, affecting
1380amongst others
1381.BR O_SYNC " and " O_NDELAY .
5355ff82 1382.PP
9ee4a2b6 1383On NFS filesystems with UID mapping enabled,
a1d5f77c
MK
1384.BR open ()
1385may
75b94dc3 1386return a file descriptor but, for example,
a1d5f77c
MK
1387.BR read (2)
1388requests are denied
1389with \fBEACCES\fP.
1390This is because the client performs
1391.BR open ()
1392by checking the
1393permissions, but UID mapping is performed by the server upon
1394read and write requests.
5dc8986d
MK
1395.\"
1396.\"
1bdc161d
MK
1397.SS FIFOs
1398Opening the read or write end of a FIFO blocks until the other
1399end is also opened (by another process or thread).
1400See
1401.BR fifo (7)
1402for further details.
1403.\"
1404.\"
5dc8986d
MK
1405.SS File access mode
1406Unlike the other values that can be specified in
1407.IR flags ,
1408the
1409.I "access mode"
1410values
1411.BR O_RDONLY ", " O_WRONLY ", and " O_RDWR
1412do not specify individual bits.
1413Rather, they define the low order two bits of
1414.IR flags ,
1415and are defined respectively as 0, 1, and 2.
1416In other words, the combination
1417.B "O_RDONLY | O_WRONLY"
1418is a logical error, and certainly does not have the same meaning as
1419.BR O_RDWR .
5355ff82 1420.PP
5dc8986d
MK
1421Linux reserves the special, nonstandard access mode 3 (binary 11) in
1422.I flags
1423to mean:
d9cb0d7d 1424check for read and write permission on the file and return a file descriptor
5dc8986d
MK
1425that can't be used for reading or writing.
1426This nonstandard access mode is used by some Linux drivers to return a
d9cb0d7d 1427file descriptor that is to be used only for device-specific
5dc8986d
MK
1428.BR ioctl (2)
1429operations.
1430.\" See for example util-linux's disk-utils/setfdprm.c
1431.\" For some background on access mode 3, see
1432.\" http://thread.gmane.org/gmane.linux.kernel/653123
1433.\" "[RFC] correct flags to f_mode conversion in __dentry_open"
1434.\" LKML, 12 Mar 2008
7b8ba76c
MK
1435.\"
1436.\"
80d250b4 1437.SS Rationale for openat() and other "directory file descriptor" APIs
7b8ba76c 1438.BR openat ()
80d250b4
MK
1439and the other system calls and library functions that take
1440a directory file descriptor argument
7b8ba76c 1441(i.e.,
c6a16783 1442.BR execveat (2),
7b8ba76c 1443.BR faccessat (2),
80d250b4 1444.BR fanotify_mark (2),
7b8ba76c
MK
1445.BR fchmodat (2),
1446.BR fchownat (2),
1447.BR fstatat (2),
1448.BR futimesat (2),
1449.BR linkat (2),
1450.BR mkdirat (2),
1451.BR mknodat (2),
80d250b4 1452.BR name_to_handle_at (2),
7b8ba76c
MK
1453.BR readlinkat (2),
1454.BR renameat (2),
3f092cef 1455.BR statx (2),
7b8ba76c
MK
1456.BR symlinkat (2),
1457.BR unlinkat (2),
f37759b1 1458.BR utimensat (2),
80d250b4 1459.BR mkfifoat (3),
7b8ba76c 1460and
80d250b4 1461.BR scandirat (3))
a98e0304 1462address two problems with the older interfaces that preceded them.
92692952 1463Here, the explanation is in terms of the
7b8ba76c 1464.BR openat ()
d26f8a31 1465call, but the rationale is analogous for the other interfaces.
5355ff82 1466.PP
7b8ba76c
MK
1467First,
1468.BR openat ()
1469allows an application to avoid race conditions that could
1470occur when using
cadd38ba 1471.BR open ()
7b8ba76c
MK
1472to open files in directories other than the current working directory.
1473These race conditions result from the fact that some component
1474of the directory prefix given to
cadd38ba 1475.BR open ()
7b8ba76c 1476could be changed in parallel with the call to
cadd38ba 1477.BR open ().
54305f5b 1478Suppose, for example, that we wish to create the file
a710e359 1479.I dir1/dir2/xxx.dep
54305f5b 1480if the file
a710e359 1481.I dir1/dir2/xxx
54305f5b
MK
1482exists.
1483The problem is that between the existence check and the file creation step,
a710e359 1484.I dir1
54305f5b 1485or
a710e359 1486.I dir2
54305f5b
MK
1487(which might be symbolic links)
1488could be modified to point to a different location.
7b8ba76c
MK
1489Such races can be avoided by
1490opening a file descriptor for the target directory,
1491and then specifying that file descriptor as the
1492.I dirfd
54305f5b
MK
1493argument of (say)
1494.BR fstatat (2)
1495and
7b8ba76c 1496.BR openat ().
941d2892
MK
1497The use of the
1498.I dirfd
1499file descriptor also has other benefits:
1500.IP * 3
1501the file descriptor is a stable reference to the directory,
1502even if the directory is renamed; and
1503.IP *
1504the open file descriptor prevents the underlying filesystem from
1505being dismounted,
1506just as when a process has a current working directory on a filesystem.
1507.PP
7b8ba76c
MK
1508Second,
1509.BR openat ()
1510allows the implementation of a per-thread "current working
1511directory", via file descriptor(s) maintained by the application.
1512(This functionality can also be obtained by tricks based
1513on the use of
1514.IR /proc/self/fd/ dirfd,
1515but less efficiently.)
1516.\"
1517.\"
ddc4d339 1518.SS O_DIRECT
dd3568a1 1519.PP
ddc4d339
MK
1520The
1521.B O_DIRECT
1522flag may impose alignment restrictions on the length and address
7fac88a9 1523of user-space buffers and the file offset of I/Os.
ddc4d339 1524In Linux alignment
9ee4a2b6 1525restrictions vary by filesystem and kernel version and might be
ddc4d339 1526absent entirely.
9ee4a2b6 1527However there is currently no filesystem\-independent
ddc4d339 1528interface for an application to discover these restrictions for a given
9ee4a2b6
MK
1529file or filesystem.
1530Some filesystems provide their own interfaces
ddc4d339
MK
1531for doing so, for example the
1532.B XFS_IOC_DIOINFO
1533operation in
1534.BR xfsctl (3).
dd3568a1 1535.PP
85c2bdba
MK
1536Under Linux 2.4, transfer sizes, and the alignment of the user buffer
1537and the file offset must all be multiples of the logical block size
9ee4a2b6 1538of the filesystem.
21557928 1539Since Linux 2.6.0, alignment to the logical block size of the
e6042e4a 1540underlying storage (typically 512 bytes) suffices.
21557928 1541The logical block size can be determined using the
e6042e4a
PS
1542.BR ioctl (2)
1543.B BLKSSZGET
21557928 1544operation or from the shell using the command:
5355ff82
MK
1545.PP
1546.EX
21557928 1547 blockdev \-\-getss
5355ff82
MK
1548.EE
1549.PP
1847167b
NP
1550.B O_DIRECT
1551I/Os should never be run concurrently with the
04cd7f64 1552.BR fork (2)
1847167b
NP
1553system call,
1554if the memory buffer is a private mapping
1555(i.e., any mapping created with the
02ace852 1556.BR mmap (2)
1847167b 1557.BR MAP_PRIVATE
0ab8aeec 1558flag;
1847167b
NP
1559this includes memory allocated on the heap and statically allocated buffers).
1560Any such I/Os, whether submitted via an asynchronous I/O interface or from
1561another thread in the process,
1562should be completed before
1563.BR fork (2)
1564is called.
1565Failure to do so can result in data corruption and undefined behavior in
1566parent and child processes.
1567This restriction does not apply when the memory buffer for the
1568.B O_DIRECT
1569I/Os was created using
1570.BR shmat (2)
1571or
1572.BR mmap (2)
1573with the
1574.B MAP_SHARED
1575flag.
1576Nor does this restriction apply when the memory buffer has been advised as
1577.B MADV_DONTFORK
0ab8aeec 1578with
02ace852 1579.BR madvise (2),
1847167b
NP
1580ensuring that it will not be available
1581to the child after
1582.BR fork (2).
dd3568a1 1583.PP
ddc4d339
MK
1584The
1585.B O_DIRECT
1586flag was introduced in SGI IRIX, where it has alignment
1587restrictions similar to those of Linux 2.4.
1588IRIX has also a
1589.BR fcntl (2)
1590call to query appropriate alignments, and sizes.
1591FreeBSD 4.x introduced
1592a flag of the same name, but without alignment restrictions.
dd3568a1 1593.PP
ddc4d339
MK
1594.B O_DIRECT
1595support was added under Linux in kernel version 2.4.10.
1596Older Linux kernels simply ignore this flag.
9ee4a2b6 1597Some filesystems may not implement the flag and
ddc4d339
MK
1598.BR open ()
1599will fail with
1600.B EINVAL
1601if it is used.
dd3568a1 1602.PP
ddc4d339
MK
1603Applications should avoid mixing
1604.B O_DIRECT
1605and normal I/O to the same file,
1606and especially to overlapping byte regions in the same file.
9ee4a2b6 1607Even when the filesystem correctly handles the coherency issues in
ddc4d339
MK
1608this situation, overall I/O throughput is likely to be slower than
1609using either mode alone.
1610Likewise, applications should avoid mixing
1611.BR mmap (2)
1612of files with direct I/O to the same files.
dd3568a1 1613.PP
a1fa36af 1614The behavior of
ddc4d339 1615.B O_DIRECT
9ee4a2b6 1616with NFS will differ from local filesystems.
ddc4d339
MK
1617Older kernels, or
1618kernels configured in certain ways, may not support this combination.
1619The NFS protocol does not support passing the flag to the server, so
1620.B O_DIRECT
33a0ccb2 1621I/O will bypass the page cache only on the client; the server may
ddc4d339
MK
1622still cache the I/O.
1623The client asks the server to make the I/O
1624synchronous to preserve the synchronous semantics of
1625.BR O_DIRECT .
1626Some servers will perform poorly under these circumstances, especially
1627if the I/O size is small.
1628Some servers may also be configured to
1629lie to clients about the I/O having reached stable storage; this
1630will avoid the performance penalty at some risk to data integrity
1631in the event of server power failure.
1632The Linux NFS client places no alignment restrictions on
1633.B O_DIRECT
1634I/O.
1635.PP
1636In summary,
1637.B O_DIRECT
1638is a potentially powerful tool that should be used with caution.
1639It is recommended that applications treat use of
1640.B O_DIRECT
1641as a performance option which is disabled by default.
1642.PP
1643.RS
fea681da
MK
1644"The thing that has always disturbed me about O_DIRECT is that the whole
1645interface is just stupid, and was probably designed by a deranged monkey
5503c85e 1646on some serious mind-controlling substances."\(emLinus
ddc4d339
MK
1647.RE
1648.SH BUGS
b50582eb
MK
1649Currently, it is not possible to enable signal-driven
1650I/O by specifying
1651.B O_ASYNC
c13182ef 1652when calling
b50582eb
MK
1653.BR open ();
1654use
1655.BR fcntl (2)
1656to enable this flag.
0e1ad98c 1657.\" FIXME . Check bugzilla report on open(O_ASYNC)
92057f4d 1658.\" See http://bugzilla.kernel.org/show_bug.cgi?id=5993
5355ff82 1659.PP
0d730fcc
MK
1660One must check for two different error codes,
1661.B EISDIR
1662and
1663.BR ENOENT ,
1664when trying to determine whether the kernel supports
0d55b37f 1665.B O_TMPFILE
0d730fcc 1666functionality.
5355ff82 1667.PP
320f8a8e
MK
1668When both
1669.B O_CREAT
1670and
1671.B O_DIRECTORY
1672are specified in
1673.IR flags
1674and the file specified by
1675.I pathname
1676does not exist,
1677.BR open ()
1678will create a regular file (i.e.,
1679.B O_DIRECTORY
1680is ignored).
47297adb 1681.SH SEE ALSO
a3bf8022
MK
1682.BR chmod (2),
1683.BR chown (2),
fea681da 1684.BR close (2),
e366dbc4 1685.BR dup (2),
fea681da
MK
1686.BR fcntl (2),
1687.BR link (2),
1f6ceb40 1688.BR lseek (2),
fea681da 1689.BR mknod (2),
e366dbc4 1690.BR mmap (2),
f0c34053 1691.BR mount (2),
fa5d243f 1692.BR open_by_handle_at (2),
fea681da
MK
1693.BR read (2),
1694.BR socket (2),
1695.BR stat (2),
1696.BR umask (2),
1697.BR unlink (2),
1698.BR write (2),
1699.BR fopen (3),
b31056e3 1700.BR acl (5),
f0c34053 1701.BR fifo (7),
3b363b62 1702.BR inode (7),
a9cfde1d
MK
1703.BR path_resolution (7),
1704.BR symlink (7)