]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/open.2
open.2: By contrast with O_RDONLY, no file permissions are required for O_PATH
[thirdparty/man-pages.git] / man2 / open.2
CommitLineData
fea681da 1.\" This manpage is Copyright (C) 1992 Drew Eckhardt;
fd185f58
MK
2.\" and Copyright (C) 1993 Michael Haardt, Ian Jackson.
3.\" and Copyright (C) 2008 Greg Banks
7b8ba76c 4.\" and Copyright (C) 2006, 2008, 2013, 2014 Michael Kerrisk <mtk.manpages@gmail.com>
fea681da 5.\"
93015253 6.\" %%%LICENSE_START(VERBATIM)
fea681da
MK
7.\" Permission is granted to make and distribute verbatim copies of this
8.\" manual provided the copyright notice and this permission notice are
9.\" preserved on all copies.
10.\"
11.\" Permission is granted to copy and distribute modified versions of this
12.\" manual under the conditions for verbatim copying, provided that the
13.\" entire resulting derived work is distributed under the terms of a
14.\" permission notice identical to this one.
c13182ef 15.\"
fea681da
MK
16.\" Since the Linux kernel and libraries are constantly changing, this
17.\" manual page may be incorrect or out-of-date. The author(s) assume no
18.\" responsibility for errors or omissions, or for damages resulting from
19.\" the use of the information contained herein. The author(s) may not
20.\" have taken the same level of care in the production of this manual,
21.\" which is licensed free of charge, as they might when working
22.\" professionally.
c13182ef 23.\"
fea681da
MK
24.\" Formatted or processed versions of this manual, if unaccompanied by
25.\" the source, must acknowledge the copyright and authors of this work.
4b72fb64 26.\" %%%LICENSE_END
fea681da
MK
27.\"
28.\" Modified 1993-07-21 by Rik Faith <faith@cs.unc.edu>
29.\" Modified 1994-08-21 by Michael Haardt
30.\" Modified 1996-04-13 by Andries Brouwer <aeb@cwi.nl>
31.\" Modified 1996-05-13 by Thomas Koenig
32.\" Modified 1996-12-20 by Michael Haardt
33.\" Modified 1999-02-19 by Andries Brouwer <aeb@cwi.nl>
34.\" Modified 1998-11-28 by Joseph S. Myers <jsm28@hermes.cam.ac.uk>
35.\" Modified 1999-06-03 by Michael Haardt
c11b1abf
MK
36.\" Modified 2002-05-07 by Michael Kerrisk <mtk.manpages@gmail.com>
37.\" Modified 2004-06-23 by Michael Kerrisk <mtk.manpages@gmail.com>
1c1e15ed
MK
38.\" 2004-12-08, mtk, reordered flags list alphabetically
39.\" 2004-12-08, Martin Pool <mbp@sourcefrog.net> (& mtk), added O_NOATIME
fe75ec04 40.\" 2007-09-18, mtk, Added description of O_CLOEXEC + other minor edits
447bb15e 41.\" 2008-01-03, mtk, with input from Trond Myklebust
f4b9d6a5
MK
42.\" <trond.myklebust@fys.uio.no> and Timo Sirainen <tss@iki.fi>
43.\" Rewrite description of O_EXCL.
ddc4d339
MK
44.\" 2008-01-11, Greg Banks <gnb@melbourne.sgi.com>: add more detail
45.\" on O_DIRECT.
d77eb764 46.\" 2008-02-26, Michael Haardt: Reorganized text for O_CREAT and mode
fea681da 47.\"
61b7c1e1 48.\" FIXME . Apr 08: The next POSIX revision has O_EXEC, O_SEARCH, and
9f91e36c
MK
49.\" O_TTYINIT. Eventually these may need to be documented. --mtk
50.\"
4b8c67d9 51.TH OPEN 2 2017-09-15 "Linux" "Linux Programmer's Manual"
fea681da 52.SH NAME
7b8ba76c 53open, openat, creat \- open and possibly create a file
fea681da
MK
54.SH SYNOPSIS
55.nf
56.B #include <sys/types.h>
57.B #include <sys/stat.h>
58.B #include <fcntl.h>
5355ff82 59.PP
fea681da
MK
60.BI "int open(const char *" pathname ", int " flags );
61.BI "int open(const char *" pathname ", int " flags ", mode_t " mode );
5355ff82 62.PP
fea681da 63.BI "int creat(const char *" pathname ", mode_t " mode );
5355ff82 64.PP
7b8ba76c
MK
65.BI "int openat(int " dirfd ", const char *" pathname ", int " flags );
66.BI "int openat(int " dirfd ", const char *" pathname ", int " flags \
67", mode_t " mode );
fea681da 68.fi
5355ff82 69.PP
7b8ba76c
MK
70.in -4n
71Feature Test Macro Requirements for glibc (see
72.BR feature_test_macros (7)):
73.in
5355ff82 74.PP
7b8ba76c
MK
75.BR openat ():
76.PD 0
77.ad l
78.RS 4
79.TP 4
80Since glibc 2.10:
b0da7b8b 81_POSIX_C_SOURCE\ >=\ 200809L
7b8ba76c
MK
82.TP
83Before glibc 2.10:
84_ATFILE_SOURCE
85.RE
86.ad
87.PD
fea681da 88.SH DESCRIPTION
e366dbc4 89Given a
0daa9e92 90.I pathname
e366dbc4 91for a file,
1f6ceb40 92.BR open ()
2fda57bd 93returns a file descriptor, a small, nonnegative integer
e366dbc4
MK
94for use in subsequent system calls
95.RB ( read "(2), " write "(2), " lseek "(2), " fcntl "(2), etc.)."
96The file descriptor returned by a successful call will be
2c4bff36 97the lowest-numbered file descriptor not currently open for the process.
e366dbc4 98.PP
fe75ec04 99By default, the new file descriptor is set to remain open across an
e366dbc4 100.BR execve (2)
1f6ceb40
MK
101(i.e., the
102.B FD_CLOEXEC
103file descriptor flag described in
31d79098
SP
104.BR fcntl (2)
105is initially disabled); the
fe75ec04 106.B O_CLOEXEC
d6a74b95 107flag, described below, can be used to change this default.
1f6ceb40 108The file offset is set to the beginning of the file (see
c13182ef 109.BR lseek (2)).
e366dbc4
MK
110.PP
111A call to
112.BR open ()
113creates a new
114.IR "open file description" ,
115an entry in the system-wide table of open files.
61b12e2b 116The open file description records the file offset and the file status flags
20ee63c1 117(see below).
61b12e2b 118A file descriptor is a reference to an open file description;
2c4bff36
MK
119this reference is unaffected if
120.I pathname
121is subsequently removed or modified to refer to a different file.
d20d9d33 122For further details on open file descriptions, see NOTES.
e366dbc4 123.PP
c4bb193f 124The argument
fea681da 125.I flags
e366dbc4
MK
126must include one of the following
127.IR "access modes" :
c7992edc 128.BR O_RDONLY ", " O_WRONLY ", or " O_RDWR .
e366dbc4
MK
129These request opening the file read-only, write-only, or read/write,
130respectively.
5355ff82 131.PP
bfe9ba67 132In addition, zero or more file creation flags and file status flags
c13182ef 133can be
fea681da 134.RI bitwise- or 'd
e366dbc4 135in
bfe9ba67 136.IR flags .
c13182ef
MK
137The
138.I file creation flags
139are
0e40804c 140.BR O_CLOEXEC ,
b072a788 141.BR O_CREAT ,
0e40804c
MK
142.BR O_DIRECTORY ,
143.BR O_EXCL ,
144.BR O_NOCTTY ,
145.BR O_NOFOLLOW ,
f2698a42 146.BR O_TMPFILE ,
0e40804c 147and
15fb5d03 148.BR O_TRUNC .
c13182ef
MK
149The
150.I file status flags
bfe9ba67 151are all of the remaining flags listed below.
0e40804c 152.\" SUSv4 divides the flags into:
93ee8f96
MK
153.\" * Access mode
154.\" * File creation
155.\" * File status
156.\" * Other (O_CLOEXEC, O_DIRECTORY, O_NOFOLLOW)
157.\" though it's not clear what the difference between "other" and
0e40804c
MK
158.\" "File creation" flags is. I raised an Aardvark to see if this
159.\" can be clarified in SUSv4; 10 Oct 2008.
160.\" http://thread.gmane.org/gmane.comp.standards.posix.austin.general/64/focus=67
161.\" TC1 (balloted in 2013), resolved this, so that those three constants
162.\" are also categorized" as file status flags.
163.\"
bfe9ba67 164The distinction between these two groups of flags is that
68210340
MK
165the file creation flags affect the semantics of the open operation itself,
166while the file status flags affect the semantics of subsequent I/O operations.
167The file status flags can be retrieved and (in some cases)
566b427d
MK
168modified; see
169.BR fcntl (2)
170for details.
5355ff82 171.PP
bfe9ba67 172The full list of file creation flags and file status flags is as follows:
fea681da 173.TP
1c1e15ed 174.B O_APPEND
c13182ef
MK
175The file is opened in append mode.
176Before each
0bfa087b 177.BR write (2),
1e568304 178the file offset is positioned at the end of the file,
1c1e15ed 179as if with
0bfa087b 180.BR lseek (2).
17efe87f 181The modification of the file offset and the write operation
20b8f0e2 182are performed as a single atomic step.
5355ff82 183.IP
1c1e15ed 184.B O_APPEND
9ee4a2b6 185may lead to corrupted files on NFS filesystems if more than one process
c13182ef 186appends data to a file at once.
a4391429
MK
187.\" For more background, see
188.\" http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=453946
189.\" http://nfs.sourceforge.net/
c13182ef 190This is because NFS does not support
1c1e15ed
MK
191appending to a file, so the client kernel has to simulate it, which
192can't be done without a race condition.
193.TP
194.B O_ASYNC
b50582eb 195Enable signal-driven I/O:
8bd58774
MK
196generate a signal
197.RB ( SIGIO
198by default, but this can be changed via
1c1e15ed
MK
199.BR fcntl (2))
200when input or output becomes possible on this file descriptor.
33a0ccb2 201This feature is available only for terminals, pseudoterminals,
1f6ceb40
MK
202sockets, and (since Linux 2.6) pipes and FIFOs.
203See
1c1e15ed
MK
204.BR fcntl (2)
205for further details.
9bde4908 206See also BUGS, below.
fe75ec04 207.TP
31c1f2b0 208.BR O_CLOEXEC " (since Linux 2.6.23)"
7fdec065 209.\" NOTE! several other man pages refer to this text
fe75ec04 210Enable the close-on-exec flag for the new file descriptor.
00d82ce8
MK
211.\" FIXME . for later review when Issue 8 is one day released...
212.\" POSIX proposes to fix many APIs that provide hidden FDs
213.\" http://austingroupbugs.net/tag_view_page.php?tag_id=8
214.\" http://austingroupbugs.net/view.php?id=368
24ec631f 215Specifying this flag permits a program to avoid additional
fe75ec04
MK
216.BR fcntl (2)
217.B F_SETFD
24ec631f 218operations to set the
0daa9e92 219.B FD_CLOEXEC
fe75ec04 220flag.
5355ff82 221.IP
7756d157
MK
222Note that the use of this flag is essential in some multithreaded programs,
223because using a separate
fe75ec04
MK
224.BR fcntl (2)
225.B F_SETFD
226operation to set the
0daa9e92 227.B FD_CLOEXEC
fe75ec04 228flag does not suffice to avoid race conditions
7756d157
MK
229where one thread opens a file descriptor and
230attempts to set its close-on-exec flag using
231.BR fcntl (2)
232at the same time as another thread does a
fe75ec04
MK
233.BR fork (2)
234plus
235.BR execve (2).
7756d157 236Depending on the order of execution,
30821db8 237the race may lead to the file descriptor returned by
7756d157
MK
238.BR open ()
239being unintentionally leaked to the program executed by the child process
240created by
241.BR fork (2).
242(This kind of race is in principle possible for any system call
243that creates a file descriptor whose close-on-exec flag should be set,
244and various other Linux system calls provide an equivalent of the
245.BR O_CLOEXEC
246flag to deal with this problem.)
fe75ec04 247.\" This flag fixes only one form of the race condition;
d9cb0d7d 248.\" The race can also occur with, for example, file descriptors
fe75ec04 249.\" returned by accept(), pipe(), etc.
1c1e15ed 250.TP
fea681da 251.B O_CREAT
f1ad56a6 252If the file does not exist, it will be created.
5355ff82 253.IP
40169a93 254The owner (user ID) of the new file is set to the effective user ID
c13182ef 255of the process.
5355ff82 256.IP
ddf5e4ab
MK
257The group ownership (group ID) of the new file is set either to
258the effective group ID of the process (System V semantics)
259or to the group ID of the parent directory (BSD semantics).
260On Linux, the behavior depends on whether the
261set-group-ID mode bit is set on the parent directory:
262if that bit is set, then BSD semantics apply;
263otherwise, System V semantics apply.
264For some filesystems, the behavior also depends on the
fea681da
MK
265.I bsdgroups
266and
267.I sysvgroups
ddf5e4ab 268mount options described in
fea681da 269.BR mount (8)).
8b39ad66
MK
270.\" As at 2.6.25, bsdgroups is supported by ext2, ext3, ext4, and
271.\" XFS (since 2.6.14).
4e698277
MK
272.RS
273.PP
1bab84a8 274The
4e698277 275.I mode
1bab84a8 276argument specifies the file mode bits be applied when a new file is created.
4e698277
MK
277This argument must be supplied when
278.B O_CREAT
f2698a42
AL
279or
280.B O_TMPFILE
4e698277
MK
281is specified in
282.IR flags ;
f2698a42 283if neither
4e698277 284.B O_CREAT
f2698a42
AL
285nor
286.B O_TMPFILE
287is specified, then
4e698277
MK
288.I mode
289is ignored.
58222012 290The effective mode is modified by the process's
4e698277 291.I umask
58222012
MK
292in the usual way: in the absence of a default ACL, the mode of the
293created file is
84a275c4 294.IR "(mode\ &\ ~umask)" .
33a0ccb2 295Note that this mode applies only to future accesses of the
4e698277
MK
296newly created file; the
297.BR open ()
298call that creates a read-only file may well return a read/write
299file descriptor.
300.PP
301The following symbolic constants are provided for
302.IR mode :
303.TP 9
304.B S_IRWXU
97d5b762 30500700 user (file owner) has read, write, and execute permission
4e698277
MK
306.TP
307.B S_IRUSR
30800400 user has read permission
309.TP
310.B S_IWUSR
31100200 user has write permission
312.TP
313.B S_IXUSR
31400100 user has execute permission
315.TP
316.B S_IRWXG
97d5b762 31700070 group has read, write, and execute permission
4e698277
MK
318.TP
319.B S_IRGRP
32000040 group has read permission
321.TP
322.B S_IWGRP
32300020 group has write permission
324.TP
325.B S_IXGRP
32600010 group has execute permission
327.TP
328.B S_IRWXO
97d5b762 32900007 others have read, write, and execute permission
4e698277
MK
330.TP
331.B S_IROTH
33200004 others have read permission
333.TP
334.B S_IWOTH
33500002 others have write permission
336.TP
337.B S_IXOTH
33800001 others have execute permission
339.RE
9e1d8950
MK
340.IP
341According to POSIX, the effect when other bits are set in
342.I mode
343is unspecified.
344On Linux, the following bits are also honored in
345.IR mode :
346.RS
347.TP 9
348.B S_ISUID
3490004000 set-user-ID bit
350.TP
351.B S_ISGID
3520002000 set-group-ID bit (see
e6fc1596 353.BR inode (7)).
9e1d8950
MK
354.TP
355.B S_ISVTX
3560001000 sticky bit (see
e6fc1596 357.BR inode (7)).
9e1d8950 358.RE
fea681da 359.TP
31c1f2b0 360.BR O_DIRECT " (since Linux 2.4.10)"
1c1e15ed
MK
361Try to minimize cache effects of the I/O to and from this file.
362In general this will degrade performance, but it is useful in
363special situations, such as when applications do their own caching.
bce0482f 364File I/O is done directly to/from user-space buffers.
015221ef
CH
365The
366.B O_DIRECT
0deb3ce9 367flag on its own makes an effort to transfer data synchronously,
015221ef
CH
368but does not give the guarantees of the
369.B O_SYNC
0deb3ce9
JM
370flag that data and necessary metadata are transferred.
371To guarantee synchronous I/O,
015221ef
CH
372.B O_SYNC
373must be used in addition to
374.BR O_DIRECT .
be02e49f 375See NOTES below for further discussion.
5355ff82 376.IP
c13182ef 377A semantically similar (but deprecated) interface for block devices
9b54d4fa 378is described in
1c1e15ed
MK
379.BR raw (8).
380.TP
381.B O_DIRECTORY
a8d55537 382If \fIpathname\fP is not a directory, cause the open to fail.
9f8d688a
MK
383.\" But see the following and its replies:
384.\" http://marc.theaimsgroup.com/?t=112748702800001&r=1&w=2
385.\" [PATCH] open: O_DIRECTORY and O_CREAT together should fail
386.\" O_DIRECTORY | O_CREAT causes O_DIRECTORY to be ignored.
65496644 387This flag was added in kernel version 2.1.126, to
60a90ecd
MK
388avoid denial-of-service problems if
389.BR opendir (3)
390is called on a
a3041a58 391FIFO or tape device.
1c1e15ed 392.TP
6cf19e62
MK
393.B O_DSYNC
394Write operations on the file will complete according to the requirements of
395synchronized I/O
396.I data
397integrity completion.
5355ff82 398.IP
6cf19e62
MK
399By the time
400.BR write (2)
401(and similar)
402return, the output data
403has been transferred to the underlying hardware,
404along with any file metadata that would be required to retrieve that data
405(i.e., as though each
406.BR write (2)
407was followed by a call to
408.BR fdatasync (2)).
409.IR "See NOTES below" .
410.TP
fea681da 411.B O_EXCL
f4b9d6a5
MK
412Ensure that this call creates the file:
413if this flag is specified in conjunction with
fea681da 414.BR O_CREAT ,
f4b9d6a5
MK
415and
416.I pathname
417already exists, then
1c1e15ed 418.BR open ()
c13182ef 419will fail.
5355ff82 420.IP
f4b9d6a5
MK
421When these two flags are specified, symbolic links are not followed:
422.\" POSIX.1-2001 explicitly requires this behavior.
423if
424.I pathname
425is a symbolic link, then
426.BR open ()
427fails regardless of where the symbolic link points to.
5355ff82 428.IP
10b7a945
IHV
429In general, the behavior of
430.B O_EXCL
431is undefined if it is used without
432.BR O_CREAT .
433There is one exception: on Linux 2.6 and later,
434.B O_EXCL
435can be used without
436.B O_CREAT
437if
438.I pathname
439refers to a block device.
6303d401
DB
440If the block device is in use by the system (e.g., mounted),
441.BR open ()
10b7a945
IHV
442fails with the error
443.BR EBUSY .
5355ff82 444.IP
efe08656 445On NFS,
f4b9d6a5 446.B O_EXCL
33a0ccb2 447is supported only when using NFSv3 or later on kernel 2.6 or later.
efe08656 448In NFS environments where
fea681da 449.B O_EXCL
f4b9d6a5
MK
450support is not provided, programs that rely on it
451for performing locking tasks will contain a race condition.
452Portable programs that want to perform atomic file locking using a lockfile,
453and need to avoid reliance on NFS support for
454.BR O_EXCL ,
455can create a unique file on
9ee4a2b6 456the same filesystem (e.g., incorporating hostname and PID), and use
fea681da 457.BR link (2)
c13182ef 458to make a link to the lockfile.
60a90ecd
MK
459If
460.BR link (2)
f4b9d6a5 461returns 0, the lock is successful.
c13182ef 462Otherwise, use
fea681da
MK
463.BR stat (2)
464on the unique file to check if its link count has increased to 2,
465in which case the lock is also successful.
466.TP
1c1e15ed
MK
467.B O_LARGEFILE
468(LFS)
469Allow files whose sizes cannot be represented in an
8478ee02 470.I off_t
1c1e15ed 471(but can be represented in an
8478ee02 472.IR off64_t )
1c1e15ed 473to be opened.
c13182ef 474The
bcdd964e 475.B _LARGEFILE64_SOURCE
e417acb0
MK
476macro must be defined
477(before including
478.I any
479header files)
480in order to obtain this definition.
c13182ef 481Setting the
bcdd964e 482.B _FILE_OFFSET_BITS
9f3d8b28
MK
483feature test macro to 64 (rather than using
484.BR O_LARGEFILE )
12e263f1 485is the preferred
9f3d8b28 486method of accessing large files on 32-bit systems (see
2dcbf4f7 487.BR feature_test_macros (7)).
1c1e15ed 488.TP
31c1f2b0 489.BR O_NOATIME " (since Linux 2.6.8)"
1bb72c96
MK
490Do not update the file last access time
491.RI ( st_atime
492in the inode)
310b7919 493when the file is
1c1e15ed 494.BR read (2).
5355ff82 495.IP
47c906e5
MK
496This flag can be employed only if one of the following conditions is true:
497.RS
498.IP * 3
499The effective UID of the process
500.\" Strictly speaking: the filesystem UID
501matches the owner UID of the file.
502.IP *
503The calling process has the
504.BR CAP_FOWNER
505capability in its user namespace and
506the owner UID of the file has a mapping in the namespace.
507.RE
508.IP
1c1e15ed
MK
509This flag is intended for use by indexing or backup programs,
510where its use can significantly reduce the amount of disk activity.
9ee4a2b6 511This flag may not be effective on all filesystems.
1c1e15ed 512One example is NFS, where the server maintains the access time.
0e1ad98c 513.\" The O_NOATIME flag also affects the treatment of st_atime
92057f4d 514.\" by mmap() and readdir(2), MTK, Dec 04.
1c1e15ed 515.TP
fea681da
MK
516.B O_NOCTTY
517If
518.I pathname
5503c85e 519refers to a terminal device\(emsee
1bb72c96
MK
520.BR tty (4)\(emit
521will not become the process's controlling terminal even if the
fea681da
MK
522process does not have one.
523.TP
1c1e15ed 524.B O_NOFOLLOW
6ccb7137
MK
525If \fIpathname\fP is a symbolic link, then the open fails, with the error
526.BR ELOOP .
7fba0065
MK
527Symbolic links in earlier components of the pathname will still be
528followed.
529(Note that the
530.B ELOOP
531error that can occur in this case is indistinguishable from the case where
6ccb7137
MK
532an open fails because there are too many symbolic links found
533while resolving components in the prefix part of the pathname.)
5355ff82 534.IP
8db11e23
MK
535This flag is a FreeBSD extension, which was added to Linux in version 2.1.126,
536and has subsequently been standardized in POSIX.1-2008.
5355ff82 537.IP
1135dbe1 538See also
843068bd 539.BR O_PATH
1135dbe1 540below.
e366dbc4
MK
541.\" The headers from glibc 2.0.100 and later include a
542.\" definition of this flag; \fIkernels before 2.1.126 will ignore it if
a8d55537 543.\" used\fP.
fea681da
MK
544.TP
545.BR O_NONBLOCK " or " O_NDELAY
ff40dbb3 546When possible, the file is opened in nonblocking mode.
c13182ef 547Neither the
1c1e15ed 548.BR open ()
fea681da
MK
549nor any subsequent operations on the file descriptor which is
550returned will cause the calling process to wait.
5355ff82 551.IP
9f629381
MK
552Note that this flag has no effect for regular files and block devices;
553that is, I/O operations will (briefly) block when device activity
554is required, regardless of whether
555.B O_NONBLOCK
556is set.
557Since
558.B O_NONBLOCK
559semantics might eventually be implemented,
560applications should not depend upon blocking behavior
561when specifying this flag for regular files and block devices.
5355ff82 562.IP
fea681da 563For the handling of FIFOs (named pipes), see also
af5b2ef2 564.BR fifo (7).
db28bfac 565For a discussion of the effect of
0daa9e92 566.B O_NONBLOCK
db28bfac
MK
567in conjunction with mandatory file locks and with file leases, see
568.BR fcntl (2).
fea681da 569.TP
1135dbe1
MK
570.BR O_PATH " (since Linux 2.6.39)"
571.\" commit 1abf0c718f15a56a0a435588d1b104c7a37dc9bd
572.\" commit 326be7b484843988afe57566b627fb7a70beac56
573.\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d
574.\"
575.\" http://thread.gmane.org/gmane.linux.man/2790/focus=3496
576.\" Subject: Re: [PATCH] open(2): document O_PATH
577.\" Newsgroups: gmane.linux.man, gmane.linux.kernel
578.\"
1135dbe1 579Obtain a file descriptor that can be used for two purposes:
9ee4a2b6 580to indicate a location in the filesystem tree and
1135dbe1
MK
581to perform operations that act purely at the file descriptor level.
582The file itself is not opened, and other file operations (e.g.,
583.BR read (2),
584.BR write (2),
585.BR fchmod (2),
586.BR fchown (2),
2510e4e5 587.BR fgetxattr (2),
97a45d02 588.BR ioctl (2),
2510e4e5 589.BR mmap (2))
1135dbe1
MK
590fail with the error
591.BR EBADF .
5355ff82 592.IP
1135dbe1
MK
593The following operations
594.I can
595be performed on the resulting file descriptor:
596.RS
597.IP * 3
b9307a4a
MK
598.BR close (2).
599.IP *
f3cd742c
MK
600.BR fchdir (2),
601if the file descriptor refers to a directory
b9307a4a 602(since Linux 3.5).
1135dbe1 603.\" commit 332a2e1244bd08b9e3ecd378028513396a004a24
b9307a4a 604.IP *
1135dbe1 605.BR fstat (2)
b9307a4a
MK
606(since Linux 3.6).
607.IP *
1135dbe1 608.\" fstat(): commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2
97a45d02
N
609.BR fstatfs (2)
610(since Linux 3.12).
611.\" fstatfs(): commit 9d05746e7b16d8565dddbe3200faa1e669d23bbf
1135dbe1
MK
612.IP *
613Duplicating the file descriptor
614.RB ( dup (2),
615.BR fcntl (2)
616.BR F_DUPFD ,
617etc.).
618.IP *
619Getting and setting file descriptor flags
620.RB ( fcntl (2)
621.BR F_GETFD
622and
623.BR F_SETFD ).
09f677a3
MK
624.IP *
625Retrieving open file status flags using the
626.BR fcntl (2)
13a082cb 627.BR F_GETFL
09f677a3
MK
628operation: the returned flags will include the bit
629.BR O_PATH .
1135dbe1
MK
630.IP *
631Passing the file descriptor as the
632.IR dirfd
633argument of
490f876a 634.BR openat ()
1135dbe1 635and the other "*at()" system calls.
7dee406b
AL
636This includes
637.BR linkat (2)
638with
0da5e58a 639.BR AT_EMPTY_PATH
7dee406b
AL
640(or via procfs using
641.BR AT_SYMLINK_FOLLOW )
642even if the file is not a directory.
1135dbe1
MK
643.IP *
644Passing the file descriptor to another process via a UNIX domain socket
645(see
646.BR SCM_RIGHTS
647in
648.BR unix (7)).
649.RE
650.IP
651When
652.B O_PATH
653is specified in
654.IR flags ,
655flag bits other than
6807fc6f
MK
656.BR O_CLOEXEC ,
657.BR O_DIRECTORY ,
1135dbe1
MK
658and
659.BR O_NOFOLLOW
660are ignored.
5355ff82 661.IP
4a3b9ffc
MK
662Opening a file or directory with the
663.B O_PATH
664flag requires no permissions on the object itself
665(but does require execute permission on the directories in the path prefix).
666Depending on the subsequent operation,
667a check for suitable file permissions may be performed (e.g.,
668.BR fchdir (2)
669requires execute permission on the directory referred to
670by its file descriptor argument).
671By contrast,
672obtaining a reference to a filesystem object by opening it with the
673.B O_RDONLY
674flag requires that the caller have read permission on the object,
675even when the subsequent operation (e.g.,
676.BR fchdir (2),
677.BR fstat (2))
678does not require read permission on the object.
679.IP
d30344ab
MK
680If
681.I pathname
682is a symbolic link and the
1135dbe1
MK
683.BR O_NOFOLLOW
684flag is also specified,
685then the call returns a file descriptor referring to the symbolic link.
686This file descriptor can be used as the
687.I dirfd
688argument in calls to
689.BR fchownat (2),
690.BR fstatat (2),
691.BR linkat (2),
692and
693.BR readlinkat (2)
694with an empty pathname to have the calls operate on the symbolic link.
5355ff82 695.IP
97a45d02
N
696If
697.I pathname
698refers to an automount point that has not yet been triggered, so no
699other filesystem is mounted on it, then the call returns a file
700descriptor referring to the automount directory without triggering a mount.
701.BR fstatfs (2)
702can then be used to determine if it is, in fact, an untriggered
703automount point
704.RB ( ".f_type == AUTOFS_SUPER_MAGIC" ).
d1304ede
MK
705.IP
706One use of
707.B O_PATH
708for regular files is to provide the equivalent of POSIX.1's
709.B O_EXEC
710functionality.
711This permits us to open a file for which we have execute
ebab32e1 712permission but not read permission, and then execute that file,
d1304ede
MK
713with steps something like the following:
714.IP
715.in +4n
716.EX
717char buf[PATH_MAX];
718fd = open("some_prog", O_PATH);
719snprintf(buf, "/proc/self/fd/%d", fd);
720execl(buf, "some_prog", (char *) NULL);
721.EE
722.in
e982cebf
MK
723.IP
724An
725.B O_PATH
726file descriptor can also be passed as the argument of
727.BR fexecve (3).
1135dbe1 728.TP
fea681da 729.B O_SYNC
6cf19e62
MK
730Write operations on the file will complete according to the requirements of
731synchronized I/O
732.I file
733integrity completion
f36a1468 734(by contrast with the
6cf19e62
MK
735synchronized I/O
736.I data
737integrity completion
738provided by
739.BR O_DSYNC .)
5355ff82 740.IP
6cf19e62
MK
741By the time
742.BR write (2)
743(and similar)
744return, the output data and associated file metadata
745have been transferred to the underlying hardware
746(i.e., as though each
747.BR write (2)
748was followed by a call to
749.BR fsync (2)).
750.IR "See NOTES below" .
fea681da 751.TP
40398c1a
MK
752.BR O_TMPFILE " (since Linux 3.11)"
753.\" commit 60545d0d4610b02e55f65d141c95b18ccf855b6e
754.\" commit f4e0c30c191f87851c4a53454abb55ee276f4a7e
755.\" commit bb458c644a59dbba3a1fe59b27106c5e68e1c4bd
756Create an unnamed temporary file.
757The
758.I pathname
759argument specifies a directory;
760an unnamed inode will be created in that directory's filesystem.
761Anything written to the resulting file will be lost when
762the last file descriptor is closed, unless the file is given a name.
5355ff82 763.IP
40398c1a
MK
764.B O_TMPFILE
765must be specified with one of
766.B O_RDWR
767or
768.B O_WRONLY
769and, optionally,
770.BR O_EXCL .
771If
772.B O_EXCL
773is not specified, then
774.BR linkat (2)
775can be used to link the temporary file into the filesystem, making it
776permanent, using code like the following:
5355ff82 777.IP
40398c1a 778.in +4n
5355ff82 779.EX
40398c1a
MK
780char path[PATH_MAX];
781fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
0fb83d00
MK
782 S_IRUSR | S_IWUSR);
783
40398c1a 784/* File I/O on 'fd'... */
0fb83d00 785
40398c1a 786snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
e1252130 787linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
0fb83d00 788 AT_SYMLINK_FOLLOW);
5355ff82 789.EE
40398c1a 790.in
5355ff82 791.IP
40398c1a
MK
792In this case,
793the
794.BR open ()
795.I mode
796argument determines the file permission mode, as with
797.BR O_CREAT .
5355ff82 798.IP
0115aaed
MK
799Specifying
800.B O_EXCL
801in conjunction with
802.B O_TMPFILE
803prevents a temporary file from being linked into the filesystem
804in the above manner.
805(Note that the meaning of
806.B O_EXCL
807in this case is different from the meaning of
808.B O_EXCL
809otherwise.)
5355ff82 810.IP
40398c1a
MK
811There are two main use cases for
812.\" Inspired by http://lwn.net/Articles/559147/
813.BR O_TMPFILE :
814.RS
815.IP * 3
816Improved
817.BR tmpfile (3)
818functionality: race-free creation of temporary files that
819(1) are automatically deleted when closed;
820(2) can never be reached via any pathname;
821(3) are not subject to symlink attacks; and
822(4) do not require the caller to devise unique names.
823.IP *
824Creating a file that is initially invisible, which is then populated
8b04592d 825with data and adjusted to have appropriate filesystem attributes
c89a9937
EB
826.RB ( fchown (2),
827.BR fchmod (2),
40398c1a
MK
828.BR fsetxattr (2),
829etc.)
830before being atomically linked into the filesystem
831in a fully formed state (using
832.BR linkat (2)
833as described above).
834.RE
835.IP
836.B O_TMPFILE
837requires support by the underlying filesystem;
40398c1a 838only a subset of Linux filesystems provide that support.
cde2074a 839In the initial implementation, support was provided in
9af6b115 840the ext2, ext3, ext4, UDF, Minix, and shmem filesystems.
bd79a35a 841.\" To check for support, grep for "tmpfile" in kernel sources
6065b906
MK
842Support for other filesystems has subsequently been added as follows:
843XFS (Linux 3.15);
cde2074a
MK
844.\" commit 99b6436bc29e4f10e4388c27a3e4810191cc4788
845.\" commit ab29743117f9f4c22ac44c13c1647fb24fb2bafe
1b9d5819 846Btrfs (Linux 3.16);
e746db2e 847.\" commit ef3b9af50bfa6a1f02cd7b3f5124b712b1ba3e3c
6065b906 848F2FS (Linux 3.16);
bd79a35a 849.\" commit 50732df02eefb39ab414ef655979c2c9b64ad21c
6065b906 850and ubifs (Linux 4.9)
40398c1a 851.TP
1c1e15ed 852.B O_TRUNC
4d61d36a 853If the file already exists and is a regular file and the access mode allows
682edefb
MK
854writing (i.e., is
855.B O_RDWR
856or
857.BR O_WRONLY )
858it will be truncated to length 0.
859If the file is a FIFO or terminal device file, the
860.B O_TRUNC
c13182ef 861flag is ignored.
2b9b829d 862Otherwise, the effect of
682edefb
MK
863.B O_TRUNC
864is unspecified.
7b8ba76c 865.SS creat()
1f7191bb 866A call to
1c1e15ed 867.BR creat ()
1f7191bb 868is equivalent to calling
1c1e15ed 869.BR open ()
fea681da
MK
870with
871.I flags
872equal to
873.BR O_CREAT|O_WRONLY|O_TRUNC .
7b8ba76c
MK
874.SS openat()
875The
876.BR openat ()
877system call operates in exactly the same way as
cadd38ba 878.BR open (),
7b8ba76c 879except for the differences described here.
5355ff82 880.IP
7b8ba76c
MK
881If the pathname given in
882.I pathname
883is relative, then it is interpreted relative to the directory
3ad65ff0 884referred to by the file descriptor
7b8ba76c
MK
885.I dirfd
886(rather than relative to the current working directory of
887the calling process, as is done by
cadd38ba 888.BR open ()
7b8ba76c 889for a relative pathname).
5355ff82 890.IP
7b8ba76c
MK
891If
892.I pathname
893is relative and
894.I dirfd
895is the special value
896.BR AT_FDCWD ,
897then
898.I pathname
899is interpreted relative to the current working
900directory of the calling process (like
cadd38ba 901.BR open ()).
5355ff82 902.IP
7b8ba76c
MK
903If
904.I pathname
905is absolute, then
906.I dirfd
907is ignored.
47297adb 908.SH RETURN VALUE
7b8ba76c
MK
909.BR open (),
910.BR openat (),
c13182ef 911and
e1d6264d 912.BR creat ()
1c1e15ed
MK
913return the new file descriptor, or \-1 if an error occurred
914(in which case,
fea681da
MK
915.I errno
916is set appropriately).
fea681da 917.SH ERRORS
7b8ba76c
MK
918.BR open (),
919.BR openat (),
920and
921.BR creat ()
922can fail with the following errors:
fea681da
MK
923.TP
924.B EACCES
925The requested access to the file is not allowed, or search permission
926is denied for one of the directories in the path prefix of
927.IR pathname ,
928or the file did not exist yet and write access to the parent directory
929is not allowed.
930(See also
ad7cc990 931.BR path_resolution (7).)
fea681da 932.TP
a1f01685
MH
933.B EDQUOT
934Where
935.B O_CREAT
936is specified, the file does not exist, and the user's quota of disk
9ee4a2b6 937blocks or inodes on the filesystem has been exhausted.
a1f01685 938.TP
fea681da
MK
939.B EEXIST
940.I pathname
941already exists and
942.BR O_CREAT " and " O_EXCL
943were used.
944.TP
945.B EFAULT
0daa9e92 946.I pathname
e1d6264d 947points outside your accessible address space.
fea681da 948.TP
9f5773f7 949.B EFBIG
7c7fb552
MK
950See
951.BR EOVERFLOW .
9f5773f7 952.TP
e51412ea
MK
953.B EINTR
954While blocked waiting to complete an open of a slow device
955(e.g., a FIFO; see
956.BR fifo (7)),
957the call was interrupted by a signal handler; see
958.BR signal (7).
959.TP
ef490193
DG
960.B EINVAL
961The filesystem does not support the
962.BR O_DIRECT
e6f89ed2
MK
963flag.
964See
ef490193
DG
965.BR NOTES
966for more information.
967.TP
8e335391
MK
968.B EINVAL
969Invalid value in
970.\" In particular, __O_TMPFILE instead of O_TMPFILE
971.IR flags .
972.TP
973.B EINVAL
974.B O_TMPFILE
975was specified in
976.IR flags ,
977but neither
978.B O_WRONLY
979nor
980.B O_RDWR
981was specified.
982.TP
fea681da
MK
983.B EISDIR
984.I pathname
985refers to a directory and the access requested involved writing
986(that is,
987.B O_WRONLY
988or
989.B O_RDWR
990is set).
991.TP
8e335391 992.B EISDIR
843068bd
MK
993.I pathname
994refers to an existing directory,
8e335391
MK
995.B O_TMPFILE
996and one of
997.B O_WRONLY
998or
999.B O_RDWR
1000were specified in
1001.IR flags ,
1002but this kernel version does not provide the
1003.B O_TMPFILE
1004functionality.
1005.TP
fea681da
MK
1006.B ELOOP
1007Too many symbolic links were encountered in resolving
289f7907
MK
1008.IR pathname .
1009.TP
1010.B ELOOP
fea681da 1011.I pathname
289f7907
MK
1012was a symbolic link, and
1013.I flags
1014specified
1015.BR O_NOFOLLOW
1016but not
1017.BR O_PATH .
fea681da
MK
1018.TP
1019.B EMFILE
26c32fab 1020The per-process limit on the number of open file descriptors has been reached
12c21590
MK
1021(see the description of
1022.BR RLIMIT_NOFILE
1023in
1024.BR getrlimit (2)).
fea681da
MK
1025.TP
1026.B ENAMETOOLONG
0daa9e92 1027.I pathname
e1d6264d 1028was too long.
fea681da
MK
1029.TP
1030.B ENFILE
e258766b 1031The system-wide limit on the total number of open files has been reached.
fea681da
MK
1032.TP
1033.B ENODEV
1034.I pathname
1035refers to a device special file and no corresponding device exists.
682edefb
MK
1036(This is a Linux kernel bug; in this situation
1037.B ENXIO
1038must be returned.)
fea681da
MK
1039.TP
1040.B ENOENT
682edefb
MK
1041.B O_CREAT
1042is not set and the named file does not exist.
fea681da
MK
1043Or, a directory component in
1044.I pathname
1045does not exist or is a dangling symbolic link.
1046.TP
ba03011f
MK
1047.B ENOENT
1048.I pathname
1049refers to a nonexistent directory,
1050.B O_TMPFILE
1051and one of
1052.B O_WRONLY
1053or
1054.B O_RDWR
1055were specified in
1056.IR flags ,
1057but this kernel version does not provide the
1058.B O_TMPFILE
1059functionality.
1060.TP
fea681da 1061.B ENOMEM
8ef529f9
MK
1062The named file is a FIFO,
1063but memory for the FIFO buffer can't be allocated because
1064the per-user hard limit on memory allocation for pipes has been reached
1065and the caller is not privileged; see
1066.BR pipe (7).
1067.TP
1068.B ENOMEM
fea681da
MK
1069Insufficient kernel memory was available.
1070.TP
1071.B ENOSPC
1072.I pathname
1073was to be created but the device containing
1074.I pathname
1075has no room for the new file.
1076.TP
1077.B ENOTDIR
1078A component used as a directory in
1079.I pathname
a8d55537 1080is not, in fact, a directory, or \fBO_DIRECTORY\fP was specified and
fea681da
MK
1081.I pathname
1082was not a directory.
1083.TP
1084.B ENXIO
682edefb 1085.BR O_NONBLOCK " | " O_WRONLY
103ea4f6
MK
1086is set, the named file is a FIFO, and
1087no process has the FIFO open for reading.
7b032b23
MK
1088.TP
1089.B ENXIO
1090The file is a device special file and no corresponding device exists.
fea681da 1091.TP
bbe02b45
MK
1092.BR EOPNOTSUPP
1093The filesystem containing
1094.I pathname
1095does not support
1096.BR O_TMPFILE .
1097.TP
7c7fb552
MK
1098.B EOVERFLOW
1099.I pathname
1100refers to a regular file that is too large to be opened.
1101The usual scenario here is that an application compiled
1102on a 32-bit platform without
5e4dc269 1103.I -D_FILE_OFFSET_BITS=64
7c7fb552 1104tried to open a file whose size exceeds
4e1a4d72
MK
1105.I (1<<31)-1
1106bytes;
7c7fb552
MK
1107see also
1108.B O_LARGEFILE
1109above.
c84d3aa3 1110This is the error specified by POSIX.1;
7c7fb552
MK
1111in kernels before 2.6.24, Linux gave the error
1112.B EFBIG
1113for this case.
1114.\" See http://bugzilla.kernel.org/show_bug.cgi?id=7253
1115.\" "Open of a large file on 32-bit fails with EFBIG, should be EOVERFLOW"
1116.\" Reported 2006-10-03
1117.TP
1c1e15ed
MK
1118.B EPERM
1119The
1120.B O_NOATIME
1121flag was specified, but the effective user ID of the caller
9ee4a2b6 1122.\" Strictly speaking, it's the filesystem UID... (MTK)
47c906e5 1123did not match the owner of the file and the caller was not privileged.
1c1e15ed 1124.TP
fbab10e5
MK
1125.B EPERM
1126The operation was prevented by a file seal; see
1127.BR fcntl (2).
1128.TP
fea681da
MK
1129.B EROFS
1130.I pathname
9ee4a2b6 1131refers to a file on a read-only filesystem and write access was
fea681da
MK
1132requested.
1133.TP
1134.B ETXTBSY
1135.I pathname
1136refers to an executable image which is currently being executed and
1137write access was requested.
d3952311
MK
1138.TP
1139.B EWOULDBLOCK
1140The
1141.B O_NONBLOCK
1142flag was specified, and an incompatible lease was held on the file
1143(see
1144.BR fcntl (2)).
7b8ba76c
MK
1145.PP
1146The following additional errors can occur for
1147.BR openat ():
1148.TP
1149.B EBADF
1150.I dirfd
1151is not a valid file descriptor.
1152.TP
1153.B ENOTDIR
1154.I pathname
2feae602 1155is a relative pathname and
7b8ba76c
MK
1156.I dirfd
1157is a file descriptor referring to a file other than a directory.
1158.SH VERSIONS
1159.BR openat ()
1160was added to Linux in kernel 2.6.16;
1161library support was added to glibc in version 2.4.
47297adb 1162.SH CONFORMING TO
7b8ba76c
MK
1163.BR open (),
1164.BR creat ()
72ac7268 1165SVr4, 4.3BSD, POSIX.1-2001, POSIX.1-2008.
5355ff82 1166.PP
7b8ba76c
MK
1167.BR openat ():
1168POSIX.1-2008.
5355ff82 1169.PP
fea681da 1170The
72ac7268 1171.BR O_DIRECT ,
1c1e15ed 1172.BR O_NOATIME ,
72ac7268 1173.BR O_PATH ,
fea681da 1174and
72ac7268
MK
1175.BR O_TMPFILE
1176flags are Linux-specific.
1177One must define
61b7c1e1
MK
1178.B _GNU_SOURCE
1179to obtain their definitions.
5355ff82 1180.PP
9f91e36c 1181The
72ac7268
MK
1182.BR O_CLOEXEC ,
1183.BR O_DIRECTORY ,
1184and
1185.BR O_NOFOLLOW
1186flags are not specified in POSIX.1-2001,
1187but are specified in POSIX.1-2008.
1188Since glibc 2.12, one can obtain their definitions by defining either
1189.B _POSIX_C_SOURCE
1190with a value greater than or equal to 200809L or
1191.BR _XOPEN_SOURCE
1192with a value greater than or equal to 700.
1193In glibc 2.11 and earlier, one obtains the definitions by defining
1194.BR _GNU_SOURCE .
5355ff82 1195.PP
72ac7268
MK
1196As noted in
1197.BR feature_test_macros (7),
84fc2a6e 1198feature test macros such as
72ac7268
MK
1199.BR _POSIX_C_SOURCE ,
1200.BR _XOPEN_SOURCE ,
1201and
fe75ec04 1202.B _GNU_SOURCE
72ac7268 1203must be defined before including
e417acb0 1204.I any
72ac7268 1205header files.
a1d5f77c 1206.SH NOTES
988db661 1207Under Linux, the
a1d5f77c
MK
1208.B O_NONBLOCK
1209flag indicates that one wants to open
1210but does not necessarily have the intention to read or write.
1211This is typically used to open devices in order to get a file descriptor
1212for use with
1213.BR ioctl (2).
dd3568a1 1214.PP
fea681da
MK
1215The (undefined) effect of
1216.B O_RDONLY | O_TRUNC
c13182ef 1217varies among implementations.
bcdd964e 1218On many systems the file is actually truncated.
fea681da
MK
1219.\" Linux 2.0, 2.5: truncate
1220.\" Solaris 5.7, 5.8: truncate
1221.\" Irix 6.5: truncate
1222.\" Tru64 5.1B: truncate
1223.\" HP-UX 11.22: truncate
1224.\" FreeBSD 4.7: truncate
5355ff82 1225.PP
5dc8986d
MK
1226Note that
1227.BR open ()
1228can open device special files, but
1229.BR creat ()
1230cannot create them; use
1231.BR mknod (2)
1232instead.
5355ff82 1233.PP
5dc8986d
MK
1234If the file is newly created, its
1235.IR st_atime ,
1236.IR st_ctime ,
1237.I st_mtime
1238fields
1239(respectively, time of last access, time of last status change, and
1240time of last modification; see
1241.BR stat (2))
1242are set
1243to the current time, and so are the
1244.I st_ctime
1245and
1246.I st_mtime
1247fields of the
1248parent directory.
1249Otherwise, if the file is modified because of the
1250.B O_TRUNC
3a9c5a29
MK
1251flag, its
1252.I st_ctime
1253and
1254.I st_mtime
1255fields are set to the current time.
5355ff82 1256.PP
aaf7a574
MK
1257The files in the
1258.I /proc/[pid]/fd
1259directory show the open file descriptors of the process with the PID
1260.IR pid .
1261The files in the
1262.I /proc/[pid]/fdinfo
1263directory show even more information about these files descriptors.
1264See
1265.BR proc (5)
1266for further details of both of these directories.
5dc8986d
MK
1267.\"
1268.\"
d20d9d33
MK
1269.SS Open file descriptions
1270The term open file description is the one used by POSIX to refer to the
1271entries in the system-wide table of open files.
91085d85 1272In other contexts, this object is
d20d9d33
MK
1273variously also called an "open file object",
1274a "file handle", an "open file table entry",
1275or\(emin kernel-developer parlance\(ema
1276.IR "struct file" .
5355ff82 1277.PP
d20d9d33
MK
1278When a file descriptor is duplicated (using
1279.BR dup (2)
1280or similar),
1281the duplicate refers to the same open file description
1282as the original file descriptor,
1283and the two file descriptors consequently share
1284the file offset and file status flags.
1285Such sharing can also occur between processes:
1286a child process created via
91085d85 1287.BR fork (2)
d20d9d33
MK
1288inherits duplicates of its parent's file descriptors,
1289and those duplicates refer to the same open file descriptions.
5355ff82 1290.PP
d20d9d33 1291Each
bf7bc8b8 1292.BR open ()
d20d9d33
MK
1293of a file creates a new open file description;
1294thus, there may be multiple open file descriptions
1295corresponding to a file inode.
5355ff82 1296.PP
9539ebc9
MK
1297On Linux, one can use the
1298.BR kcmp (2)
1299.B KCMP_FILE
1300operation to test whether two file descriptors
1301(in the same process or in two different processes)
1302refer to the same open file description.
d20d9d33
MK
1303.\"
1304.\"
5dc8986d 1305.SS Synchronized I/O
6cf19e62
MK
1306The POSIX.1-2008 "synchronized I/O" option
1307specifies different variants of synchronized I/O,
1308and specifies the
1309.BR open ()
1310flags
015221ef
CH
1311.BR O_SYNC ,
1312.BR O_DSYNC ,
1313and
6cf19e62
MK
1314.BR O_RSYNC
1315for controlling the behavior.
1316Regardless of whether an implementation supports this option,
1317it must at least support the use of
1318.BR O_SYNC
1319for regular files.
5355ff82 1320.PP
89851a00 1321Linux implements
6cf19e62
MK
1322.BR O_SYNC
1323and
1324.BR O_DSYNC ,
1325but not
015221ef 1326.BR O_RSYNC .
6cf19e62
MK
1327(Somewhat incorrectly, glibc defines
1328.BR O_RSYNC
1329to have the same value as
1330.BR O_SYNC .)
5355ff82 1331.PP
6cf19e62
MK
1332.BR O_SYNC
1333provides synchronized I/O
1334.I file
1335integrity completion,
1336meaning write operations will flush data and all associated metadata
1337to the underlying hardware.
1338.BR O_DSYNC
1339provides synchronized I/O
1340.I data
1341integrity completion,
1342meaning write operations will flush data
1343to the underlying hardware,
1344but will only flush metadata updates that are required
1345to allow a subsequent read operation to complete successfully.
1346Data integrity completion can reduce the number of disk operations
1347that are required for applications that don't need the guarantees
1348of file integrity completion.
5355ff82 1349.PP
a83923ca 1350To understand the difference between the two types of completion,
6cf19e62
MK
1351consider two pieces of file metadata:
1352the file last modification timestamp
1353.RI ( st_mtime )
1354and the file length.
1355All write operations will update the last file modification timestamp,
1356but only writes that add data to the end of the
1357file will change the file length.
1358The last modification timestamp is not needed to ensure that
1359a read completes successfully, but the file length is.
1360Thus,
1361.BR O_DSYNC
1362would only guarantee to flush updates to the file length metadata
1363(whereas
1364.BR O_SYNC
1365would also always flush the last modification timestamp metadata).
5355ff82 1366.PP
6cf19e62
MK
1367Before Linux 2.6.33, Linux implemented only the
1368.BR O_SYNC
89851a00 1369flag for
6cf19e62
MK
1370.BR open ().
1371However, when that flag was specified,
1372most filesystems actually provided the equivalent of synchronized I/O
1373.I data
1374integrity completion (i.e.,
1375.BR O_SYNC
1376was actually implemented as the equivalent of
1377.BR O_DSYNC ).
5355ff82 1378.PP
6cf19e62
MK
1379Since Linux 2.6.33, proper
1380.BR O_SYNC
1381support is provided.
1382However, to ensure backward binary compatibility,
1383.BR O_DSYNC
1384was defined with the same value as the historical
015221ef 1385.BR O_SYNC ,
015221ef 1386and
6cf19e62 1387.BR O_SYNC
89851a00 1388was defined as a new (two-bit) flag value that includes the
6cf19e62
MK
1389.BR O_DSYNC
1390flag value.
1391This ensures that applications compiled against
1392new headers get at least
1393.BR O_DSYNC
1394semantics on pre-2.6.33 kernels.
5dc8986d
MK
1395.\"
1396.\"
1397.SS NFS
1398There are many infelicities in the protocol underlying NFS, affecting
1399amongst others
1400.BR O_SYNC " and " O_NDELAY .
5355ff82 1401.PP
9ee4a2b6 1402On NFS filesystems with UID mapping enabled,
a1d5f77c
MK
1403.BR open ()
1404may
75b94dc3 1405return a file descriptor but, for example,
a1d5f77c
MK
1406.BR read (2)
1407requests are denied
1408with \fBEACCES\fP.
1409This is because the client performs
1410.BR open ()
1411by checking the
1412permissions, but UID mapping is performed by the server upon
1413read and write requests.
5dc8986d
MK
1414.\"
1415.\"
1bdc161d
MK
1416.SS FIFOs
1417Opening the read or write end of a FIFO blocks until the other
1418end is also opened (by another process or thread).
1419See
1420.BR fifo (7)
1421for further details.
1422.\"
1423.\"
5dc8986d
MK
1424.SS File access mode
1425Unlike the other values that can be specified in
1426.IR flags ,
1427the
1428.I "access mode"
1429values
1430.BR O_RDONLY ", " O_WRONLY ", and " O_RDWR
1431do not specify individual bits.
1432Rather, they define the low order two bits of
1433.IR flags ,
1434and are defined respectively as 0, 1, and 2.
1435In other words, the combination
1436.B "O_RDONLY | O_WRONLY"
1437is a logical error, and certainly does not have the same meaning as
1438.BR O_RDWR .
5355ff82 1439.PP
5dc8986d
MK
1440Linux reserves the special, nonstandard access mode 3 (binary 11) in
1441.I flags
1442to mean:
d9cb0d7d 1443check for read and write permission on the file and return a file descriptor
5dc8986d
MK
1444that can't be used for reading or writing.
1445This nonstandard access mode is used by some Linux drivers to return a
d9cb0d7d 1446file descriptor that is to be used only for device-specific
5dc8986d
MK
1447.BR ioctl (2)
1448operations.
1449.\" See for example util-linux's disk-utils/setfdprm.c
1450.\" For some background on access mode 3, see
1451.\" http://thread.gmane.org/gmane.linux.kernel/653123
1452.\" "[RFC] correct flags to f_mode conversion in __dentry_open"
1453.\" LKML, 12 Mar 2008
7b8ba76c
MK
1454.\"
1455.\"
80d250b4 1456.SS Rationale for openat() and other "directory file descriptor" APIs
7b8ba76c 1457.BR openat ()
80d250b4
MK
1458and the other system calls and library functions that take
1459a directory file descriptor argument
7b8ba76c 1460(i.e.,
c6a16783 1461.BR execveat (2),
7b8ba76c 1462.BR faccessat (2),
80d250b4 1463.BR fanotify_mark (2),
7b8ba76c
MK
1464.BR fchmodat (2),
1465.BR fchownat (2),
1466.BR fstatat (2),
1467.BR futimesat (2),
1468.BR linkat (2),
1469.BR mkdirat (2),
1470.BR mknodat (2),
80d250b4 1471.BR name_to_handle_at (2),
7b8ba76c
MK
1472.BR readlinkat (2),
1473.BR renameat (2),
3f092cef 1474.BR statx (2),
7b8ba76c
MK
1475.BR symlinkat (2),
1476.BR unlinkat (2),
f37759b1 1477.BR utimensat (2),
80d250b4 1478.BR mkfifoat (3),
7b8ba76c 1479and
80d250b4 1480.BR scandirat (3))
a98e0304 1481address two problems with the older interfaces that preceded them.
92692952 1482Here, the explanation is in terms of the
7b8ba76c 1483.BR openat ()
d26f8a31 1484call, but the rationale is analogous for the other interfaces.
5355ff82 1485.PP
7b8ba76c
MK
1486First,
1487.BR openat ()
1488allows an application to avoid race conditions that could
1489occur when using
cadd38ba 1490.BR open ()
7b8ba76c
MK
1491to open files in directories other than the current working directory.
1492These race conditions result from the fact that some component
1493of the directory prefix given to
cadd38ba 1494.BR open ()
7b8ba76c 1495could be changed in parallel with the call to
cadd38ba 1496.BR open ().
54305f5b 1497Suppose, for example, that we wish to create the file
a710e359 1498.I dir1/dir2/xxx.dep
54305f5b 1499if the file
a710e359 1500.I dir1/dir2/xxx
54305f5b
MK
1501exists.
1502The problem is that between the existence check and the file creation step,
a710e359 1503.I dir1
54305f5b 1504or
a710e359 1505.I dir2
54305f5b
MK
1506(which might be symbolic links)
1507could be modified to point to a different location.
7b8ba76c
MK
1508Such races can be avoided by
1509opening a file descriptor for the target directory,
1510and then specifying that file descriptor as the
1511.I dirfd
54305f5b
MK
1512argument of (say)
1513.BR fstatat (2)
1514and
7b8ba76c 1515.BR openat ().
941d2892
MK
1516The use of the
1517.I dirfd
1518file descriptor also has other benefits:
1519.IP * 3
1520the file descriptor is a stable reference to the directory,
1521even if the directory is renamed; and
1522.IP *
1523the open file descriptor prevents the underlying filesystem from
1524being dismounted,
1525just as when a process has a current working directory on a filesystem.
1526.PP
7b8ba76c
MK
1527Second,
1528.BR openat ()
1529allows the implementation of a per-thread "current working
1530directory", via file descriptor(s) maintained by the application.
1531(This functionality can also be obtained by tricks based
1532on the use of
1533.IR /proc/self/fd/ dirfd,
1534but less efficiently.)
1535.\"
1536.\"
ddc4d339 1537.SS O_DIRECT
dd3568a1 1538.PP
ddc4d339
MK
1539The
1540.B O_DIRECT
1541flag may impose alignment restrictions on the length and address
7fac88a9 1542of user-space buffers and the file offset of I/Os.
ddc4d339 1543In Linux alignment
9ee4a2b6 1544restrictions vary by filesystem and kernel version and might be
ddc4d339 1545absent entirely.
9ee4a2b6 1546However there is currently no filesystem\-independent
ddc4d339 1547interface for an application to discover these restrictions for a given
9ee4a2b6
MK
1548file or filesystem.
1549Some filesystems provide their own interfaces
ddc4d339
MK
1550for doing so, for example the
1551.B XFS_IOC_DIOINFO
1552operation in
1553.BR xfsctl (3).
dd3568a1 1554.PP
85c2bdba
MK
1555Under Linux 2.4, transfer sizes, and the alignment of the user buffer
1556and the file offset must all be multiples of the logical block size
9ee4a2b6 1557of the filesystem.
21557928 1558Since Linux 2.6.0, alignment to the logical block size of the
e6042e4a 1559underlying storage (typically 512 bytes) suffices.
21557928 1560The logical block size can be determined using the
e6042e4a
PS
1561.BR ioctl (2)
1562.B BLKSSZGET
21557928 1563operation or from the shell using the command:
5355ff82
MK
1564.PP
1565.EX
21557928 1566 blockdev \-\-getss
5355ff82
MK
1567.EE
1568.PP
1847167b
NP
1569.B O_DIRECT
1570I/Os should never be run concurrently with the
04cd7f64 1571.BR fork (2)
1847167b
NP
1572system call,
1573if the memory buffer is a private mapping
1574(i.e., any mapping created with the
02ace852 1575.BR mmap (2)
1847167b 1576.BR MAP_PRIVATE
0ab8aeec 1577flag;
1847167b
NP
1578this includes memory allocated on the heap and statically allocated buffers).
1579Any such I/Os, whether submitted via an asynchronous I/O interface or from
1580another thread in the process,
1581should be completed before
1582.BR fork (2)
1583is called.
1584Failure to do so can result in data corruption and undefined behavior in
1585parent and child processes.
1586This restriction does not apply when the memory buffer for the
1587.B O_DIRECT
1588I/Os was created using
1589.BR shmat (2)
1590or
1591.BR mmap (2)
1592with the
1593.B MAP_SHARED
1594flag.
1595Nor does this restriction apply when the memory buffer has been advised as
1596.B MADV_DONTFORK
0ab8aeec 1597with
02ace852 1598.BR madvise (2),
1847167b
NP
1599ensuring that it will not be available
1600to the child after
1601.BR fork (2).
dd3568a1 1602.PP
ddc4d339
MK
1603The
1604.B O_DIRECT
1605flag was introduced in SGI IRIX, where it has alignment
1606restrictions similar to those of Linux 2.4.
1607IRIX has also a
1608.BR fcntl (2)
1609call to query appropriate alignments, and sizes.
1610FreeBSD 4.x introduced
1611a flag of the same name, but without alignment restrictions.
dd3568a1 1612.PP
ddc4d339
MK
1613.B O_DIRECT
1614support was added under Linux in kernel version 2.4.10.
1615Older Linux kernels simply ignore this flag.
9ee4a2b6 1616Some filesystems may not implement the flag and
ddc4d339
MK
1617.BR open ()
1618will fail with
1619.B EINVAL
1620if it is used.
dd3568a1 1621.PP
ddc4d339
MK
1622Applications should avoid mixing
1623.B O_DIRECT
1624and normal I/O to the same file,
1625and especially to overlapping byte regions in the same file.
9ee4a2b6 1626Even when the filesystem correctly handles the coherency issues in
ddc4d339
MK
1627this situation, overall I/O throughput is likely to be slower than
1628using either mode alone.
1629Likewise, applications should avoid mixing
1630.BR mmap (2)
1631of files with direct I/O to the same files.
dd3568a1 1632.PP
a1fa36af 1633The behavior of
ddc4d339 1634.B O_DIRECT
9ee4a2b6 1635with NFS will differ from local filesystems.
ddc4d339
MK
1636Older kernels, or
1637kernels configured in certain ways, may not support this combination.
1638The NFS protocol does not support passing the flag to the server, so
1639.B O_DIRECT
33a0ccb2 1640I/O will bypass the page cache only on the client; the server may
ddc4d339
MK
1641still cache the I/O.
1642The client asks the server to make the I/O
1643synchronous to preserve the synchronous semantics of
1644.BR O_DIRECT .
1645Some servers will perform poorly under these circumstances, especially
1646if the I/O size is small.
1647Some servers may also be configured to
1648lie to clients about the I/O having reached stable storage; this
1649will avoid the performance penalty at some risk to data integrity
1650in the event of server power failure.
1651The Linux NFS client places no alignment restrictions on
1652.B O_DIRECT
1653I/O.
1654.PP
1655In summary,
1656.B O_DIRECT
1657is a potentially powerful tool that should be used with caution.
1658It is recommended that applications treat use of
1659.B O_DIRECT
1660as a performance option which is disabled by default.
1661.PP
1662.RS
fea681da
MK
1663"The thing that has always disturbed me about O_DIRECT is that the whole
1664interface is just stupid, and was probably designed by a deranged monkey
5503c85e 1665on some serious mind-controlling substances."\(emLinus
ddc4d339
MK
1666.RE
1667.SH BUGS
b50582eb
MK
1668Currently, it is not possible to enable signal-driven
1669I/O by specifying
1670.B O_ASYNC
c13182ef 1671when calling
b50582eb
MK
1672.BR open ();
1673use
1674.BR fcntl (2)
1675to enable this flag.
0e1ad98c 1676.\" FIXME . Check bugzilla report on open(O_ASYNC)
92057f4d 1677.\" See http://bugzilla.kernel.org/show_bug.cgi?id=5993
5355ff82 1678.PP
0d730fcc
MK
1679One must check for two different error codes,
1680.B EISDIR
1681and
1682.BR ENOENT ,
1683when trying to determine whether the kernel supports
0d55b37f 1684.B O_TMPFILE
0d730fcc 1685functionality.
5355ff82 1686.PP
320f8a8e
MK
1687When both
1688.B O_CREAT
1689and
1690.B O_DIRECTORY
1691are specified in
1692.IR flags
1693and the file specified by
1694.I pathname
1695does not exist,
1696.BR open ()
1697will create a regular file (i.e.,
1698.B O_DIRECTORY
1699is ignored).
47297adb 1700.SH SEE ALSO
a3bf8022
MK
1701.BR chmod (2),
1702.BR chown (2),
fea681da 1703.BR close (2),
e366dbc4 1704.BR dup (2),
fea681da
MK
1705.BR fcntl (2),
1706.BR link (2),
1f6ceb40 1707.BR lseek (2),
fea681da 1708.BR mknod (2),
e366dbc4 1709.BR mmap (2),
f0c34053 1710.BR mount (2),
fa5d243f 1711.BR open_by_handle_at (2),
fea681da
MK
1712.BR read (2),
1713.BR socket (2),
1714.BR stat (2),
1715.BR umask (2),
1716.BR unlink (2),
1717.BR write (2),
1718.BR fopen (3),
b31056e3 1719.BR acl (5),
f0c34053 1720.BR fifo (7),
3b363b62 1721.BR inode (7),
a9cfde1d
MK
1722.BR path_resolution (7),
1723.BR symlink (7)