]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/open.2
open.2: Update note on alignment of user buffer and file offset for O_DIRECT
[thirdparty/man-pages.git] / man2 / open.2
CommitLineData
fea681da 1.\" This manpage is Copyright (C) 1992 Drew Eckhardt;
fd185f58
MK
2.\" and Copyright (C) 1993 Michael Haardt, Ian Jackson.
3.\" and Copyright (C) 2008 Greg Banks
7b8ba76c 4.\" and Copyright (C) 2006, 2008, 2013, 2014 Michael Kerrisk <mtk.manpages@gmail.com>
fea681da 5.\"
93015253 6.\" %%%LICENSE_START(VERBATIM)
fea681da
MK
7.\" Permission is granted to make and distribute verbatim copies of this
8.\" manual provided the copyright notice and this permission notice are
9.\" preserved on all copies.
10.\"
11.\" Permission is granted to copy and distribute modified versions of this
12.\" manual under the conditions for verbatim copying, provided that the
13.\" entire resulting derived work is distributed under the terms of a
14.\" permission notice identical to this one.
c13182ef 15.\"
fea681da
MK
16.\" Since the Linux kernel and libraries are constantly changing, this
17.\" manual page may be incorrect or out-of-date. The author(s) assume no
18.\" responsibility for errors or omissions, or for damages resulting from
19.\" the use of the information contained herein. The author(s) may not
20.\" have taken the same level of care in the production of this manual,
21.\" which is licensed free of charge, as they might when working
22.\" professionally.
c13182ef 23.\"
fea681da
MK
24.\" Formatted or processed versions of this manual, if unaccompanied by
25.\" the source, must acknowledge the copyright and authors of this work.
4b72fb64 26.\" %%%LICENSE_END
fea681da
MK
27.\"
28.\" Modified 1993-07-21 by Rik Faith <faith@cs.unc.edu>
29.\" Modified 1994-08-21 by Michael Haardt
30.\" Modified 1996-04-13 by Andries Brouwer <aeb@cwi.nl>
31.\" Modified 1996-05-13 by Thomas Koenig
32.\" Modified 1996-12-20 by Michael Haardt
33.\" Modified 1999-02-19 by Andries Brouwer <aeb@cwi.nl>
34.\" Modified 1998-11-28 by Joseph S. Myers <jsm28@hermes.cam.ac.uk>
35.\" Modified 1999-06-03 by Michael Haardt
c11b1abf
MK
36.\" Modified 2002-05-07 by Michael Kerrisk <mtk.manpages@gmail.com>
37.\" Modified 2004-06-23 by Michael Kerrisk <mtk.manpages@gmail.com>
1c1e15ed
MK
38.\" 2004-12-08, mtk, reordered flags list alphabetically
39.\" 2004-12-08, Martin Pool <mbp@sourcefrog.net> (& mtk), added O_NOATIME
fe75ec04 40.\" 2007-09-18, mtk, Added description of O_CLOEXEC + other minor edits
447bb15e 41.\" 2008-01-03, mtk, with input from Trond Myklebust
f4b9d6a5
MK
42.\" <trond.myklebust@fys.uio.no> and Timo Sirainen <tss@iki.fi>
43.\" Rewrite description of O_EXCL.
ddc4d339
MK
44.\" 2008-01-11, Greg Banks <gnb@melbourne.sgi.com>: add more detail
45.\" on O_DIRECT.
d77eb764 46.\" 2008-02-26, Michael Haardt: Reorganized text for O_CREAT and mode
fea681da 47.\"
61b7c1e1 48.\" FIXME . Apr 08: The next POSIX revision has O_EXEC, O_SEARCH, and
9f91e36c
MK
49.\" O_TTYINIT. Eventually these may need to be documented. --mtk
50.\"
7756d157 51.TH OPEN 2 2014-04-20 "Linux" "Linux Programmer's Manual"
fea681da 52.SH NAME
7b8ba76c 53open, openat, creat \- open and possibly create a file
fea681da
MK
54.SH SYNOPSIS
55.nf
56.B #include <sys/types.h>
57.B #include <sys/stat.h>
58.B #include <fcntl.h>
59.sp
60.BI "int open(const char *" pathname ", int " flags );
61.BI "int open(const char *" pathname ", int " flags ", mode_t " mode );
5895e7eb 62
fea681da 63.BI "int creat(const char *" pathname ", mode_t " mode );
7b8ba76c
MK
64.sp
65.BI "int openat(int " dirfd ", const char *" pathname ", int " flags );
66.BI "int openat(int " dirfd ", const char *" pathname ", int " flags \
67", mode_t " mode );
fea681da 68.fi
7b8ba76c
MK
69.sp
70.in -4n
71Feature Test Macro Requirements for glibc (see
72.BR feature_test_macros (7)):
73.in
74.sp
75.BR openat ():
76.PD 0
77.ad l
78.RS 4
79.TP 4
80Since glibc 2.10:
81_XOPEN_SOURCE\ >=\ 700 || _POSIX_C_SOURCE\ >=\ 200809L
82.TP
83Before glibc 2.10:
84_ATFILE_SOURCE
85.RE
86.ad
87.PD
fea681da 88.SH DESCRIPTION
e366dbc4 89Given a
0daa9e92 90.I pathname
e366dbc4 91for a file,
1f6ceb40 92.BR open ()
2fda57bd 93returns a file descriptor, a small, nonnegative integer
e366dbc4
MK
94for use in subsequent system calls
95.RB ( read "(2), " write "(2), " lseek "(2), " fcntl "(2), etc.)."
96The file descriptor returned by a successful call will be
2c4bff36 97the lowest-numbered file descriptor not currently open for the process.
e366dbc4 98.PP
fe75ec04 99By default, the new file descriptor is set to remain open across an
e366dbc4 100.BR execve (2)
1f6ceb40
MK
101(i.e., the
102.B FD_CLOEXEC
103file descriptor flag described in
104.BR fcntl (2)
fd3ac440 105is initially disabled; the
fe75ec04
MK
106.B O_CLOEXEC
107flag, described below, can be used to change this default).
1f6ceb40 108The file offset is set to the beginning of the file (see
c13182ef 109.BR lseek (2)).
e366dbc4
MK
110.PP
111A call to
112.BR open ()
113creates a new
114.IR "open file description" ,
115an entry in the system-wide table of open files.
61b12e2b
MK
116(This object is variously also called an "open file object",
117a "file handle", an "open file table entry",
118or\(emin kernel-developer parlance\(ema
119.IR "struct file" .
120The term "open file description" is used by POSIX.)
121The open file description records the file offset and the file status flags
20ee63c1 122(see below).
61b12e2b 123A file descriptor is a reference to an open file description;
2c4bff36
MK
124this reference is unaffected if
125.I pathname
126is subsequently removed or modified to refer to a different file.
e366dbc4 127The new open file description is initially not shared
2c4bff36
MK
128with any other process,
129but sharing may arise via
130.BR fork (2).
e366dbc4 131.PP
c4bb193f 132The argument
fea681da 133.I flags
e366dbc4
MK
134must include one of the following
135.IR "access modes" :
c7992edc 136.BR O_RDONLY ", " O_WRONLY ", or " O_RDWR .
e366dbc4
MK
137These request opening the file read-only, write-only, or read/write,
138respectively.
bfe9ba67
MK
139
140In addition, zero or more file creation flags and file status flags
c13182ef 141can be
fea681da 142.RI bitwise- or 'd
e366dbc4 143in
bfe9ba67 144.IR flags .
c13182ef
MK
145The
146.I file creation flags
147are
0e40804c 148.BR O_CLOEXEC ,
b072a788 149.BR O_CREAT ,
0e40804c
MK
150.BR O_DIRECTORY ,
151.BR O_EXCL ,
152.BR O_NOCTTY ,
153.BR O_NOFOLLOW ,
f2698a42 154.BR O_TMPFILE ,
0e40804c
MK
155.BR O_TRUNC ,
156and
157.BR O_TTY_INIT .
c13182ef
MK
158The
159.I file status flags
bfe9ba67 160are all of the remaining flags listed below.
0e40804c 161.\" SUSv4 divides the flags into:
93ee8f96
MK
162.\" * Access mode
163.\" * File creation
164.\" * File status
165.\" * Other (O_CLOEXEC, O_DIRECTORY, O_NOFOLLOW)
166.\" though it's not clear what the difference between "other" and
0e40804c
MK
167.\" "File creation" flags is. I raised an Aardvark to see if this
168.\" can be clarified in SUSv4; 10 Oct 2008.
169.\" http://thread.gmane.org/gmane.comp.standards.posix.austin.general/64/focus=67
170.\" TC1 (balloted in 2013), resolved this, so that those three constants
171.\" are also categorized" as file status flags.
172.\"
bfe9ba67
MK
173The distinction between these two groups of flags is that
174the file status flags can be retrieved and (in some cases)
566b427d
MK
175modified; see
176.BR fcntl (2)
177for details.
178
bfe9ba67 179The full list of file creation flags and file status flags is as follows:
fea681da 180.TP
1c1e15ed 181.B O_APPEND
c13182ef
MK
182The file is opened in append mode.
183Before each
0bfa087b 184.BR write (2),
1e568304 185the file offset is positioned at the end of the file,
1c1e15ed 186as if with
0bfa087b 187.BR lseek (2).
1c1e15ed 188.B O_APPEND
9ee4a2b6 189may lead to corrupted files on NFS filesystems if more than one process
c13182ef 190appends data to a file at once.
a4391429
MK
191.\" For more background, see
192.\" http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=453946
193.\" http://nfs.sourceforge.net/
c13182ef 194This is because NFS does not support
1c1e15ed
MK
195appending to a file, so the client kernel has to simulate it, which
196can't be done without a race condition.
197.TP
198.B O_ASYNC
b50582eb 199Enable signal-driven I/O:
8bd58774
MK
200generate a signal
201.RB ( SIGIO
202by default, but this can be changed via
1c1e15ed
MK
203.BR fcntl (2))
204when input or output becomes possible on this file descriptor.
33a0ccb2 205This feature is available only for terminals, pseudoterminals,
1f6ceb40
MK
206sockets, and (since Linux 2.6) pipes and FIFOs.
207See
1c1e15ed
MK
208.BR fcntl (2)
209for further details.
9bde4908 210See also BUGS, below.
fe75ec04 211.TP
31c1f2b0 212.BR O_CLOEXEC " (since Linux 2.6.23)"
7756d157 213.\" NOTE! several otehr man pages refer to this text
fe75ec04 214Enable the close-on-exec flag for the new file descriptor.
24ec631f 215Specifying this flag permits a program to avoid additional
fe75ec04
MK
216.BR fcntl (2)
217.B F_SETFD
24ec631f 218operations to set the
0daa9e92 219.B FD_CLOEXEC
fe75ec04 220flag.
7756d157
MK
221
222Note that the use of this flag is essential in some multithreaded programs,
223because using a separate
fe75ec04
MK
224.BR fcntl (2)
225.B F_SETFD
226operation to set the
0daa9e92 227.B FD_CLOEXEC
fe75ec04 228flag does not suffice to avoid race conditions
7756d157
MK
229where one thread opens a file descriptor and
230attempts to set its close-on-exec flag using
231.BR fcntl (2)
232at the same time as another thread does a
fe75ec04
MK
233.BR fork (2)
234plus
235.BR execve (2).
7756d157
MK
236Depending on the order of execution,
237the race may lead to the file desriptor returned by
238.BR open ()
239being unintentionally leaked to the program executed by the child process
240created by
241.BR fork (2).
242(This kind of race is in principle possible for any system call
243that creates a file descriptor whose close-on-exec flag should be set,
244and various other Linux system calls provide an equivalent of the
245.BR O_CLOEXEC
246flag to deal with this problem.)
fe75ec04
MK
247.\" This flag fixes only one form of the race condition;
248.\" The race can also occur with, for example, descriptors
249.\" returned by accept(), pipe(), etc.
1c1e15ed 250.TP
fea681da 251.B O_CREAT
f1ad56a6 252If the file does not exist, it will be created.
fea681da 253The owner (user ID) of the file is set to the effective user ID
c13182ef
MK
254of the process.
255The group ownership (group ID) is set either to
fea681da 256the effective group ID of the process or to the group ID of the
9ee4a2b6 257parent directory (depending on filesystem type and mount options,
0fb83d00 258and the mode of the parent directory; see the mount options
fea681da
MK
259.I bsdgroups
260and
261.I sysvgroups
8b39ad66 262described in
fea681da 263.BR mount (8)).
8b39ad66
MK
264.\" As at 2.6.25, bsdgroups is supported by ext2, ext3, ext4, and
265.\" XFS (since 2.6.14).
4e698277
MK
266.RS
267.PP
268.I mode
269specifies the permissions to use in case a new file is created.
270This argument must be supplied when
271.B O_CREAT
f2698a42
AL
272or
273.B O_TMPFILE
4e698277
MK
274is specified in
275.IR flags ;
f2698a42 276if neither
4e698277 277.B O_CREAT
f2698a42
AL
278nor
279.B O_TMPFILE
280is specified, then
4e698277
MK
281.I mode
282is ignored.
283The effective permissions are modified by
284the process's
285.I umask
286in the usual way: The permissions of the created file are
84a275c4 287.IR "(mode\ &\ ~umask)" .
33a0ccb2 288Note that this mode applies only to future accesses of the
4e698277
MK
289newly created file; the
290.BR open ()
291call that creates a read-only file may well return a read/write
292file descriptor.
293.PP
294The following symbolic constants are provided for
295.IR mode :
296.TP 9
297.B S_IRWXU
29800700 user (file owner) has read, write and execute permission
299.TP
300.B S_IRUSR
30100400 user has read permission
302.TP
303.B S_IWUSR
30400200 user has write permission
305.TP
306.B S_IXUSR
30700100 user has execute permission
308.TP
309.B S_IRWXG
31000070 group has read, write and execute permission
311.TP
312.B S_IRGRP
31300040 group has read permission
314.TP
315.B S_IWGRP
31600020 group has write permission
317.TP
318.B S_IXGRP
31900010 group has execute permission
320.TP
321.B S_IRWXO
32200007 others have read, write and execute permission
323.TP
324.B S_IROTH
32500004 others have read permission
326.TP
327.B S_IWOTH
32800002 others have write permission
329.TP
330.B S_IXOTH
33100001 others have execute permission
332.RE
fea681da 333.TP
31c1f2b0 334.BR O_DIRECT " (since Linux 2.4.10)"
1c1e15ed
MK
335Try to minimize cache effects of the I/O to and from this file.
336In general this will degrade performance, but it is useful in
337special situations, such as when applications do their own caching.
bce0482f 338File I/O is done directly to/from user-space buffers.
015221ef
CH
339The
340.B O_DIRECT
0deb3ce9 341flag on its own makes an effort to transfer data synchronously,
015221ef
CH
342but does not give the guarantees of the
343.B O_SYNC
0deb3ce9
JM
344flag that data and necessary metadata are transferred.
345To guarantee synchronous I/O,
015221ef
CH
346.B O_SYNC
347must be used in addition to
348.BR O_DIRECT .
be02e49f 349See NOTES below for further discussion.
9b54d4fa 350.sp
c13182ef 351A semantically similar (but deprecated) interface for block devices
9b54d4fa 352is described in
1c1e15ed
MK
353.BR raw (8).
354.TP
355.B O_DIRECTORY
a8d55537 356If \fIpathname\fP is not a directory, cause the open to fail.
9f8d688a
MK
357.\" But see the following and its replies:
358.\" http://marc.theaimsgroup.com/?t=112748702800001&r=1&w=2
359.\" [PATCH] open: O_DIRECTORY and O_CREAT together should fail
360.\" O_DIRECTORY | O_CREAT causes O_DIRECTORY to be ignored.
65496644 361This flag was added in kernel version 2.1.126, to
60a90ecd
MK
362avoid denial-of-service problems if
363.BR opendir (3)
364is called on a
a3041a58 365FIFO or tape device.
1c1e15ed 366.TP
6cf19e62
MK
367.B O_DSYNC
368Write operations on the file will complete according to the requirements of
369synchronized I/O
370.I data
371integrity completion.
372
373By the time
374.BR write (2)
375(and similar)
376return, the output data
377has been transferred to the underlying hardware,
378along with any file metadata that would be required to retrieve that data
379(i.e., as though each
380.BR write (2)
381was followed by a call to
382.BR fdatasync (2)).
383.IR "See NOTES below" .
384.TP
fea681da 385.B O_EXCL
f4b9d6a5
MK
386Ensure that this call creates the file:
387if this flag is specified in conjunction with
fea681da 388.BR O_CREAT ,
f4b9d6a5
MK
389and
390.I pathname
391already exists, then
1c1e15ed 392.BR open ()
c13182ef 393will fail.
f4b9d6a5
MK
394
395When these two flags are specified, symbolic links are not followed:
396.\" POSIX.1-2001 explicitly requires this behavior.
397if
398.I pathname
399is a symbolic link, then
400.BR open ()
401fails regardless of where the symbolic link points to.
402
10b7a945
IHV
403In general, the behavior of
404.B O_EXCL
405is undefined if it is used without
406.BR O_CREAT .
407There is one exception: on Linux 2.6 and later,
408.B O_EXCL
409can be used without
410.B O_CREAT
411if
412.I pathname
413refers to a block device.
6303d401
DB
414If the block device is in use by the system (e.g., mounted),
415.BR open ()
10b7a945
IHV
416fails with the error
417.BR EBUSY .
418
efe08656 419On NFS,
f4b9d6a5 420.B O_EXCL
33a0ccb2 421is supported only when using NFSv3 or later on kernel 2.6 or later.
efe08656 422In NFS environments where
fea681da 423.B O_EXCL
f4b9d6a5
MK
424support is not provided, programs that rely on it
425for performing locking tasks will contain a race condition.
426Portable programs that want to perform atomic file locking using a lockfile,
427and need to avoid reliance on NFS support for
428.BR O_EXCL ,
429can create a unique file on
9ee4a2b6 430the same filesystem (e.g., incorporating hostname and PID), and use
fea681da 431.BR link (2)
c13182ef 432to make a link to the lockfile.
60a90ecd
MK
433If
434.BR link (2)
f4b9d6a5 435returns 0, the lock is successful.
c13182ef 436Otherwise, use
fea681da
MK
437.BR stat (2)
438on the unique file to check if its link count has increased to 2,
439in which case the lock is also successful.
440.TP
1c1e15ed
MK
441.B O_LARGEFILE
442(LFS)
443Allow files whose sizes cannot be represented in an
8478ee02 444.I off_t
1c1e15ed 445(but can be represented in an
8478ee02 446.IR off64_t )
1c1e15ed 447to be opened.
c13182ef 448The
bcdd964e 449.B _LARGEFILE64_SOURCE
e417acb0
MK
450macro must be defined
451(before including
452.I any
453header files)
454in order to obtain this definition.
c13182ef 455Setting the
bcdd964e 456.B _FILE_OFFSET_BITS
9f3d8b28
MK
457feature test macro to 64 (rather than using
458.BR O_LARGEFILE )
12e263f1 459is the preferred
9f3d8b28 460method of accessing large files on 32-bit systems (see
2dcbf4f7 461.BR feature_test_macros (7)).
1c1e15ed 462.TP
31c1f2b0 463.BR O_NOATIME " (since Linux 2.6.8)"
1bb72c96
MK
464Do not update the file last access time
465.RI ( st_atime
466in the inode)
310b7919 467when the file is
1c1e15ed
MK
468.BR read (2).
469This flag is intended for use by indexing or backup programs,
470where its use can significantly reduce the amount of disk activity.
9ee4a2b6 471This flag may not be effective on all filesystems.
1c1e15ed 472One example is NFS, where the server maintains the access time.
0e1ad98c 473.\" The O_NOATIME flag also affects the treatment of st_atime
92057f4d 474.\" by mmap() and readdir(2), MTK, Dec 04.
1c1e15ed 475.TP
fea681da
MK
476.B O_NOCTTY
477If
478.I pathname
5503c85e 479refers to a terminal device\(emsee
1bb72c96
MK
480.BR tty (4)\(emit
481will not become the process's controlling terminal even if the
fea681da
MK
482process does not have one.
483.TP
1c1e15ed 484.B O_NOFOLLOW
a8d55537 485If \fIpathname\fP is a symbolic link, then the open fails.
c13182ef 486This is a FreeBSD extension, which was added to Linux in version 2.1.126.
1c1e15ed 487Symbolic links in earlier components of the pathname will still be
e366dbc4 488followed.
1135dbe1 489See also
843068bd 490.BR O_PATH
1135dbe1 491below.
e366dbc4
MK
492.\" The headers from glibc 2.0.100 and later include a
493.\" definition of this flag; \fIkernels before 2.1.126 will ignore it if
a8d55537 494.\" used\fP.
fea681da
MK
495.TP
496.BR O_NONBLOCK " or " O_NDELAY
ff40dbb3 497When possible, the file is opened in nonblocking mode.
c13182ef 498Neither the
1c1e15ed 499.BR open ()
fea681da
MK
500nor any subsequent operations on the file descriptor which is
501returned will cause the calling process to wait.
502For the handling of FIFOs (named pipes), see also
af5b2ef2 503.BR fifo (7).
db28bfac 504For a discussion of the effect of
0daa9e92 505.B O_NONBLOCK
db28bfac
MK
506in conjunction with mandatory file locks and with file leases, see
507.BR fcntl (2).
fea681da 508.TP
1135dbe1
MK
509.BR O_PATH " (since Linux 2.6.39)"
510.\" commit 1abf0c718f15a56a0a435588d1b104c7a37dc9bd
511.\" commit 326be7b484843988afe57566b627fb7a70beac56
512.\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d
513.\"
514.\" http://thread.gmane.org/gmane.linux.man/2790/focus=3496
515.\" Subject: Re: [PATCH] open(2): document O_PATH
516.\" Newsgroups: gmane.linux.man, gmane.linux.kernel
517.\"
1135dbe1 518Obtain a file descriptor that can be used for two purposes:
9ee4a2b6 519to indicate a location in the filesystem tree and
1135dbe1
MK
520to perform operations that act purely at the file descriptor level.
521The file itself is not opened, and other file operations (e.g.,
522.BR read (2),
523.BR write (2),
524.BR fchmod (2),
525.BR fchown (2),
2510e4e5
RH
526.BR fgetxattr (2),
527.BR mmap (2))
1135dbe1
MK
528fail with the error
529.BR EBADF .
530
531The following operations
532.I can
533be performed on the resulting file descriptor:
534.RS
535.IP * 3
536.BR close (2);
537.BR fchdir (2)
538(since Linux 3.5);
539.\" commit 332a2e1244bd08b9e3ecd378028513396a004a24
540.BR fstat (2)
541(since Linux 3.6).
542.\" fstat(): commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2
543.IP *
544Duplicating the file descriptor
545.RB ( dup (2),
546.BR fcntl (2)
547.BR F_DUPFD ,
548etc.).
549.IP *
550Getting and setting file descriptor flags
551.RB ( fcntl (2)
552.BR F_GETFD
553and
554.BR F_SETFD ).
09f677a3
MK
555.IP *
556Retrieving open file status flags using the
557.BR fcntl (2)
13a082cb 558.BR F_GETFL
09f677a3
MK
559operation: the returned flags will include the bit
560.BR O_PATH .
561
1135dbe1
MK
562.IP *
563Passing the file descriptor as the
564.IR dirfd
565argument of
566.BR openat (2)
567and the other "*at()" system calls.
568.IP *
569Passing the file descriptor to another process via a UNIX domain socket
570(see
571.BR SCM_RIGHTS
572in
573.BR unix (7)).
574.RE
575.IP
576When
577.B O_PATH
578is specified in
579.IR flags ,
580flag bits other than
581.BR O_DIRECTORY
582and
583.BR O_NOFOLLOW
584are ignored.
585
d30344ab
MK
586If
587.I pathname
588is a symbolic link and the
1135dbe1
MK
589.BR O_NOFOLLOW
590flag is also specified,
591then the call returns a file descriptor referring to the symbolic link.
592This file descriptor can be used as the
593.I dirfd
594argument in calls to
595.BR fchownat (2),
596.BR fstatat (2),
597.BR linkat (2),
598and
599.BR readlinkat (2)
600with an empty pathname to have the calls operate on the symbolic link.
601.TP
fea681da 602.B O_SYNC
6cf19e62
MK
603Write operations on the file will complete according to the requirements of
604synchronized I/O
605.I file
606integrity completion
607(by contrast with contrast with the
608synchronized I/O
609.I data
610integrity completion
611provided by
612.BR O_DSYNC .)
613
614By the time
615.BR write (2)
616(and similar)
617return, the output data and associated file metadata
618have been transferred to the underlying hardware
619(i.e., as though each
620.BR write (2)
621was followed by a call to
622.BR fsync (2)).
623.IR "See NOTES below" .
fea681da 624.TP
40398c1a
MK
625.BR O_TMPFILE " (since Linux 3.11)"
626.\" commit 60545d0d4610b02e55f65d141c95b18ccf855b6e
627.\" commit f4e0c30c191f87851c4a53454abb55ee276f4a7e
628.\" commit bb458c644a59dbba3a1fe59b27106c5e68e1c4bd
629Create an unnamed temporary file.
630The
631.I pathname
632argument specifies a directory;
633an unnamed inode will be created in that directory's filesystem.
634Anything written to the resulting file will be lost when
635the last file descriptor is closed, unless the file is given a name.
636
637.B O_TMPFILE
638must be specified with one of
639.B O_RDWR
640or
641.B O_WRONLY
642and, optionally,
643.BR O_EXCL .
644If
645.B O_EXCL
646is not specified, then
647.BR linkat (2)
648can be used to link the temporary file into the filesystem, making it
649permanent, using code like the following:
650
651.in +4n
652.nf
653char path[PATH_MAX];
654fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
0fb83d00
MK
655 S_IRUSR | S_IWUSR);
656
40398c1a 657/* File I/O on 'fd'... */
0fb83d00 658
40398c1a 659snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
e1252130 660linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
0fb83d00 661 AT_SYMLINK_FOLLOW);
40398c1a
MK
662.fi
663.in
664
665In this case,
666the
667.BR open ()
668.I mode
669argument determines the file permission mode, as with
670.BR O_CREAT .
671
0115aaed
MK
672Specifying
673.B O_EXCL
674in conjunction with
675.B O_TMPFILE
676prevents a temporary file from being linked into the filesystem
677in the above manner.
678(Note that the meaning of
679.B O_EXCL
680in this case is different from the meaning of
681.B O_EXCL
682otherwise.)
683
684
40398c1a
MK
685There are two main use cases for
686.\" Inspired by http://lwn.net/Articles/559147/
687.BR O_TMPFILE :
688.RS
689.IP * 3
690Improved
691.BR tmpfile (3)
692functionality: race-free creation of temporary files that
693(1) are automatically deleted when closed;
694(2) can never be reached via any pathname;
695(3) are not subject to symlink attacks; and
696(4) do not require the caller to devise unique names.
697.IP *
698Creating a file that is initially invisible, which is then populated
8b04592d 699with data and adjusted to have appropriate filesystem attributes
40398c1a
MK
700.RB ( chown (2),
701.BR chmod (2),
702.BR fsetxattr (2),
703etc.)
704before being atomically linked into the filesystem
705in a fully formed state (using
706.BR linkat (2)
707as described above).
708.RE
709.IP
710.B O_TMPFILE
711requires support by the underlying filesystem;
40398c1a 712only a subset of Linux filesystems provide that support.
cde2074a
MK
713In the initial implementation, support was provided in
714the ex2, ext3, ext4, UDF, Minix, and shmem filesystems.
715XFS support was added
716.\" commit 99b6436bc29e4f10e4388c27a3e4810191cc4788
717.\" commit ab29743117f9f4c22ac44c13c1647fb24fb2bafe
718in Linux 3.15.
40398c1a 719.TP
1c1e15ed 720.B O_TRUNC
4d61d36a 721If the file already exists and is a regular file and the access mode allows
682edefb
MK
722writing (i.e., is
723.B O_RDWR
724or
725.BR O_WRONLY )
726it will be truncated to length 0.
727If the file is a FIFO or terminal device file, the
728.B O_TRUNC
c13182ef 729flag is ignored.
682edefb
MK
730Otherwise the effect of
731.B O_TRUNC
732is unspecified.
7b8ba76c 733.SS creat()
1c1e15ed 734.BR creat ()
fea681da 735is equivalent to
1c1e15ed 736.BR open ()
fea681da
MK
737with
738.I flags
739equal to
740.BR O_CREAT|O_WRONLY|O_TRUNC .
7b8ba76c
MK
741.SS openat()
742The
743.BR openat ()
744system call operates in exactly the same way as
cadd38ba 745.BR open (),
7b8ba76c
MK
746except for the differences described here.
747
748If the pathname given in
749.I pathname
750is relative, then it is interpreted relative to the directory
d30344ab 751relative to by the file descriptor
7b8ba76c
MK
752.I dirfd
753(rather than relative to the current working directory of
754the calling process, as is done by
cadd38ba 755.BR open ()
7b8ba76c
MK
756for a relative pathname).
757
758If
759.I pathname
760is relative and
761.I dirfd
762is the special value
763.BR AT_FDCWD ,
764then
765.I pathname
766is interpreted relative to the current working
767directory of the calling process (like
cadd38ba 768.BR open ()).
7b8ba76c
MK
769
770If
771.I pathname
772is absolute, then
773.I dirfd
774is ignored.
47297adb 775.SH RETURN VALUE
7b8ba76c
MK
776.BR open (),
777.BR openat (),
c13182ef 778and
e1d6264d 779.BR creat ()
1c1e15ed
MK
780return the new file descriptor, or \-1 if an error occurred
781(in which case,
fea681da
MK
782.I errno
783is set appropriately).
fea681da 784.SH ERRORS
7b8ba76c
MK
785.BR open (),
786.BR openat (),
787and
788.BR creat ()
789can fail with the following errors:
fea681da
MK
790.TP
791.B EACCES
792The requested access to the file is not allowed, or search permission
793is denied for one of the directories in the path prefix of
794.IR pathname ,
795or the file did not exist yet and write access to the parent directory
796is not allowed.
797(See also
ad7cc990 798.BR path_resolution (7).)
fea681da 799.TP
a1f01685
MH
800.B EDQUOT
801Where
802.B O_CREAT
803is specified, the file does not exist, and the user's quota of disk
9ee4a2b6 804blocks or inodes on the filesystem has been exhausted.
a1f01685 805.TP
fea681da
MK
806.B EEXIST
807.I pathname
808already exists and
809.BR O_CREAT " and " O_EXCL
810were used.
811.TP
812.B EFAULT
0daa9e92 813.I pathname
e1d6264d 814points outside your accessible address space.
fea681da 815.TP
9f5773f7 816.B EFBIG
7c7fb552
MK
817See
818.BR EOVERFLOW .
9f5773f7 819.TP
e51412ea
MK
820.B EINTR
821While blocked waiting to complete an open of a slow device
822(e.g., a FIFO; see
823.BR fifo (7)),
824the call was interrupted by a signal handler; see
825.BR signal (7).
826.TP
ef490193
DG
827.B EINVAL
828The filesystem does not support the
829.BR O_DIRECT
e6f89ed2
MK
830flag.
831See
ef490193
DG
832.BR NOTES
833for more information.
834.TP
8e335391
MK
835.B EINVAL
836Invalid value in
837.\" In particular, __O_TMPFILE instead of O_TMPFILE
838.IR flags .
839.TP
840.B EINVAL
841.B O_TMPFILE
842was specified in
843.IR flags ,
844but neither
845.B O_WRONLY
846nor
847.B O_RDWR
848was specified.
849.TP
fea681da
MK
850.B EISDIR
851.I pathname
852refers to a directory and the access requested involved writing
853(that is,
854.B O_WRONLY
855or
856.B O_RDWR
857is set).
858.TP
8e335391 859.B EISDIR
843068bd
MK
860.I pathname
861refers to an existing directory,
8e335391
MK
862.B O_TMPFILE
863and one of
864.B O_WRONLY
865or
866.B O_RDWR
867were specified in
868.IR flags ,
869but this kernel version does not provide the
870.B O_TMPFILE
871functionality.
872.TP
fea681da
MK
873.B ELOOP
874Too many symbolic links were encountered in resolving
289f7907
MK
875.IR pathname .
876.TP
877.B ELOOP
fea681da 878.I pathname
289f7907
MK
879was a symbolic link, and
880.I flags
881specified
882.BR O_NOFOLLOW
883but not
884.BR O_PATH .
fea681da
MK
885.TP
886.B EMFILE
887The process already has the maximum number of files open.
888.TP
889.B ENAMETOOLONG
0daa9e92 890.I pathname
e1d6264d 891was too long.
fea681da
MK
892.TP
893.B ENFILE
894The system limit on the total number of open files has been reached.
895.TP
896.B ENODEV
897.I pathname
898refers to a device special file and no corresponding device exists.
682edefb
MK
899(This is a Linux kernel bug; in this situation
900.B ENXIO
901must be returned.)
fea681da
MK
902.TP
903.B ENOENT
682edefb
MK
904.B O_CREAT
905is not set and the named file does not exist.
fea681da
MK
906Or, a directory component in
907.I pathname
908does not exist or is a dangling symbolic link.
909.TP
ba03011f
MK
910.B ENOENT
911.I pathname
912refers to a nonexistent directory,
913.B O_TMPFILE
914and one of
915.B O_WRONLY
916or
917.B O_RDWR
918were specified in
919.IR flags ,
920but this kernel version does not provide the
921.B O_TMPFILE
922functionality.
923.TP
fea681da
MK
924.B ENOMEM
925Insufficient kernel memory was available.
926.TP
927.B ENOSPC
928.I pathname
929was to be created but the device containing
930.I pathname
931has no room for the new file.
932.TP
933.B ENOTDIR
934A component used as a directory in
935.I pathname
a8d55537 936is not, in fact, a directory, or \fBO_DIRECTORY\fP was specified and
fea681da
MK
937.I pathname
938was not a directory.
939.TP
940.B ENXIO
682edefb
MK
941.BR O_NONBLOCK " | " O_WRONLY
942is set, the named file is a FIFO and
fea681da
MK
943no process has the file open for reading.
944Or, the file is a device special file and no corresponding device exists.
945.TP
bbe02b45
MK
946.BR EOPNOTSUPP
947The filesystem containing
948.I pathname
949does not support
950.BR O_TMPFILE .
951.TP
7c7fb552
MK
952.B EOVERFLOW
953.I pathname
954refers to a regular file that is too large to be opened.
955The usual scenario here is that an application compiled
956on a 32-bit platform without
5e4dc269 957.I -D_FILE_OFFSET_BITS=64
7c7fb552
MK
958tried to open a file whose size exceeds
959.I (2<<31)-1
960bits;
961see also
962.B O_LARGEFILE
963above.
964This is the error specified by POSIX.1-2001;
965in kernels before 2.6.24, Linux gave the error
966.B EFBIG
967for this case.
968.\" See http://bugzilla.kernel.org/show_bug.cgi?id=7253
969.\" "Open of a large file on 32-bit fails with EFBIG, should be EOVERFLOW"
970.\" Reported 2006-10-03
971.TP
1c1e15ed
MK
972.B EPERM
973The
974.B O_NOATIME
975flag was specified, but the effective user ID of the caller
9ee4a2b6 976.\" Strictly speaking, it's the filesystem UID... (MTK)
1c1e15ed
MK
977did not match the owner of the file and the caller was not privileged
978.RB ( CAP_FOWNER ).
979.TP
fea681da
MK
980.B EROFS
981.I pathname
9ee4a2b6 982refers to a file on a read-only filesystem and write access was
fea681da
MK
983requested.
984.TP
985.B ETXTBSY
986.I pathname
987refers to an executable image which is currently being executed and
988write access was requested.
d3952311
MK
989.TP
990.B EWOULDBLOCK
991The
992.B O_NONBLOCK
993flag was specified, and an incompatible lease was held on the file
994(see
995.BR fcntl (2)).
7b8ba76c
MK
996.PP
997The following additional errors can occur for
998.BR openat ():
999.TP
1000.B EBADF
1001.I dirfd
1002is not a valid file descriptor.
1003.TP
1004.B ENOTDIR
1005.I pathname
1006is relative and
1007.I dirfd
1008is a file descriptor referring to a file other than a directory.
1009.SH VERSIONS
1010.BR openat ()
1011was added to Linux in kernel 2.6.16;
1012library support was added to glibc in version 2.4.
47297adb 1013.SH CONFORMING TO
7b8ba76c
MK
1014.BR open (),
1015.BR creat ()
72ac7268
MK
1016SVr4, 4.3BSD, POSIX.1-2001, POSIX.1-2008.
1017
7b8ba76c
MK
1018.BR openat ():
1019POSIX.1-2008.
7b8ba76c 1020
fea681da 1021The
72ac7268 1022.BR O_DIRECT ,
1c1e15ed 1023.BR O_NOATIME ,
72ac7268 1024.BR O_PATH ,
fea681da 1025and
72ac7268
MK
1026.BR O_TMPFILE
1027flags are Linux-specific.
1028One must define
61b7c1e1
MK
1029.B _GNU_SOURCE
1030to obtain their definitions.
9f91e36c
MK
1031
1032The
72ac7268
MK
1033.BR O_CLOEXEC ,
1034.BR O_DIRECTORY ,
1035and
1036.BR O_NOFOLLOW
1037flags are not specified in POSIX.1-2001,
1038but are specified in POSIX.1-2008.
1039Since glibc 2.12, one can obtain their definitions by defining either
1040.B _POSIX_C_SOURCE
1041with a value greater than or equal to 200809L or
1042.BR _XOPEN_SOURCE
1043with a value greater than or equal to 700.
1044In glibc 2.11 and earlier, one obtains the definitions by defining
1045.BR _GNU_SOURCE .
9f91e36c 1046
72ac7268
MK
1047As noted in
1048.BR feature_test_macros (7),
84fc2a6e 1049feature test macros such as
72ac7268
MK
1050.BR _POSIX_C_SOURCE ,
1051.BR _XOPEN_SOURCE ,
1052and
fe75ec04 1053.B _GNU_SOURCE
72ac7268 1054must be defined before including
e417acb0 1055.I any
72ac7268 1056header files.
a1d5f77c 1057.SH NOTES
988db661 1058Under Linux, the
a1d5f77c
MK
1059.B O_NONBLOCK
1060flag indicates that one wants to open
1061but does not necessarily have the intention to read or write.
1062This is typically used to open devices in order to get a file descriptor
1063for use with
1064.BR ioctl (2).
c734b9f2 1065
fea681da
MK
1066.LP
1067The (undefined) effect of
1068.B O_RDONLY | O_TRUNC
c13182ef 1069varies among implementations.
bcdd964e 1070On many systems the file is actually truncated.
fea681da
MK
1071.\" Linux 2.0, 2.5: truncate
1072.\" Solaris 5.7, 5.8: truncate
1073.\" Irix 6.5: truncate
1074.\" Tru64 5.1B: truncate
1075.\" HP-UX 11.22: truncate
1076.\" FreeBSD 4.7: truncate
a1d5f77c 1077
5dc8986d
MK
1078Note that
1079.BR open ()
1080can open device special files, but
1081.BR creat ()
1082cannot create them; use
1083.BR mknod (2)
1084instead.
1085
1086If the file is newly created, its
1087.IR st_atime ,
1088.IR st_ctime ,
1089.I st_mtime
1090fields
1091(respectively, time of last access, time of last status change, and
1092time of last modification; see
1093.BR stat (2))
1094are set
1095to the current time, and so are the
1096.I st_ctime
1097and
1098.I st_mtime
1099fields of the
1100parent directory.
1101Otherwise, if the file is modified because of the
1102.B O_TRUNC
1103flag, its st_ctime and st_mtime fields are set to the current time.
1104.\"
1105.\"
1106.SS Synchronized I/O
6cf19e62
MK
1107The POSIX.1-2008 "synchronized I/O" option
1108specifies different variants of synchronized I/O,
1109and specifies the
1110.BR open ()
1111flags
015221ef
CH
1112.BR O_SYNC ,
1113.BR O_DSYNC ,
1114and
6cf19e62
MK
1115.BR O_RSYNC
1116for controlling the behavior.
1117Regardless of whether an implementation supports this option,
1118it must at least support the use of
1119.BR O_SYNC
1120for regular files.
1121
89851a00 1122Linux implements
6cf19e62
MK
1123.BR O_SYNC
1124and
1125.BR O_DSYNC ,
1126but not
015221ef 1127.BR O_RSYNC .
6cf19e62
MK
1128(Somewhat incorrectly, glibc defines
1129.BR O_RSYNC
1130to have the same value as
1131.BR O_SYNC .)
1132
1133.BR O_SYNC
1134provides synchronized I/O
1135.I file
1136integrity completion,
1137meaning write operations will flush data and all associated metadata
1138to the underlying hardware.
1139.BR O_DSYNC
1140provides synchronized I/O
1141.I data
1142integrity completion,
1143meaning write operations will flush data
1144to the underlying hardware,
1145but will only flush metadata updates that are required
1146to allow a subsequent read operation to complete successfully.
1147Data integrity completion can reduce the number of disk operations
1148that are required for applications that don't need the guarantees
1149of file integrity completion.
1150
1151To understand the difference between the the two types of completion,
1152consider two pieces of file metadata:
1153the file last modification timestamp
1154.RI ( st_mtime )
1155and the file length.
1156All write operations will update the last file modification timestamp,
1157but only writes that add data to the end of the
1158file will change the file length.
1159The last modification timestamp is not needed to ensure that
1160a read completes successfully, but the file length is.
1161Thus,
1162.BR O_DSYNC
1163would only guarantee to flush updates to the file length metadata
1164(whereas
1165.BR O_SYNC
1166would also always flush the last modification timestamp metadata).
1167
1168Before Linux 2.6.33, Linux implemented only the
1169.BR O_SYNC
89851a00 1170flag for
6cf19e62
MK
1171.BR open ().
1172However, when that flag was specified,
1173most filesystems actually provided the equivalent of synchronized I/O
1174.I data
1175integrity completion (i.e.,
1176.BR O_SYNC
1177was actually implemented as the equivalent of
1178.BR O_DSYNC ).
1179
1180Since Linux 2.6.33, proper
1181.BR O_SYNC
1182support is provided.
1183However, to ensure backward binary compatibility,
1184.BR O_DSYNC
1185was defined with the same value as the historical
015221ef 1186.BR O_SYNC ,
015221ef 1187and
6cf19e62 1188.BR O_SYNC
89851a00 1189was defined as a new (two-bit) flag value that includes the
6cf19e62
MK
1190.BR O_DSYNC
1191flag value.
1192This ensures that applications compiled against
1193new headers get at least
1194.BR O_DSYNC
1195semantics on pre-2.6.33 kernels.
5dc8986d
MK
1196.\"
1197.\"
1198.SS NFS
1199There are many infelicities in the protocol underlying NFS, affecting
1200amongst others
1201.BR O_SYNC " and " O_NDELAY .
a1d5f77c 1202
9ee4a2b6 1203On NFS filesystems with UID mapping enabled,
a1d5f77c
MK
1204.BR open ()
1205may
75b94dc3 1206return a file descriptor but, for example,
a1d5f77c
MK
1207.BR read (2)
1208requests are denied
1209with \fBEACCES\fP.
1210This is because the client performs
1211.BR open ()
1212by checking the
1213permissions, but UID mapping is performed by the server upon
1214read and write requests.
5dc8986d
MK
1215.\"
1216.\"
1217.SS File access mode
1218Unlike the other values that can be specified in
1219.IR flags ,
1220the
1221.I "access mode"
1222values
1223.BR O_RDONLY ", " O_WRONLY ", and " O_RDWR
1224do not specify individual bits.
1225Rather, they define the low order two bits of
1226.IR flags ,
1227and are defined respectively as 0, 1, and 2.
1228In other words, the combination
1229.B "O_RDONLY | O_WRONLY"
1230is a logical error, and certainly does not have the same meaning as
1231.BR O_RDWR .
a1d5f77c 1232
5dc8986d
MK
1233Linux reserves the special, nonstandard access mode 3 (binary 11) in
1234.I flags
1235to mean:
1236check for read and write permission on the file and return a descriptor
1237that can't be used for reading or writing.
1238This nonstandard access mode is used by some Linux drivers to return a
1239descriptor that is to be used only for device-specific
1240.BR ioctl (2)
1241operations.
1242.\" See for example util-linux's disk-utils/setfdprm.c
1243.\" For some background on access mode 3, see
1244.\" http://thread.gmane.org/gmane.linux.kernel/653123
1245.\" "[RFC] correct flags to f_mode conversion in __dentry_open"
1246.\" LKML, 12 Mar 2008
7b8ba76c
MK
1247.\"
1248.\"
80d250b4 1249.SS Rationale for openat() and other "directory file descriptor" APIs
7b8ba76c 1250.BR openat ()
80d250b4
MK
1251and the other system calls and library functions that take
1252a directory file descriptor argument
7b8ba76c
MK
1253(i.e.,
1254.BR faccessat (2),
80d250b4 1255.BR fanotify_mark (2),
7b8ba76c
MK
1256.BR fchmodat (2),
1257.BR fchownat (2),
1258.BR fstatat (2),
1259.BR futimesat (2),
1260.BR linkat (2),
1261.BR mkdirat (2),
1262.BR mknodat (2),
80d250b4 1263.BR name_to_handle_at (2),
7b8ba76c
MK
1264.BR readlinkat (2),
1265.BR renameat (2),
1266.BR symlinkat (2),
1267.BR unlinkat (2),
1268.BR utimensat (2)
80d250b4 1269.BR mkfifoat (3),
7b8ba76c 1270and
80d250b4 1271.BR scandirat (3))
7b8ba76c
MK
1272are supported
1273for two reasons.
92692952 1274Here, the explanation is in terms of the
7b8ba76c 1275.BR openat ()
d26f8a31 1276call, but the rationale is analogous for the other interfaces.
7b8ba76c
MK
1277
1278First,
1279.BR openat ()
1280allows an application to avoid race conditions that could
1281occur when using
cadd38ba 1282.BR open ()
7b8ba76c
MK
1283to open files in directories other than the current working directory.
1284These race conditions result from the fact that some component
1285of the directory prefix given to
cadd38ba 1286.BR open ()
7b8ba76c 1287could be changed in parallel with the call to
cadd38ba 1288.BR open ().
7b8ba76c
MK
1289Such races can be avoided by
1290opening a file descriptor for the target directory,
1291and then specifying that file descriptor as the
1292.I dirfd
1293argument of
1294.BR openat ().
1295
1296Second,
1297.BR openat ()
1298allows the implementation of a per-thread "current working
1299directory", via file descriptor(s) maintained by the application.
1300(This functionality can also be obtained by tricks based
1301on the use of
1302.IR /proc/self/fd/ dirfd,
1303but less efficiently.)
1304.\"
1305.\"
ddc4d339
MK
1306.SS O_DIRECT
1307.LP
1308The
1309.B O_DIRECT
1310flag may impose alignment restrictions on the length and address
7fac88a9 1311of user-space buffers and the file offset of I/Os.
ddc4d339 1312In Linux alignment
9ee4a2b6 1313restrictions vary by filesystem and kernel version and might be
ddc4d339 1314absent entirely.
9ee4a2b6 1315However there is currently no filesystem\-independent
ddc4d339 1316interface for an application to discover these restrictions for a given
9ee4a2b6
MK
1317file or filesystem.
1318Some filesystems provide their own interfaces
ddc4d339
MK
1319for doing so, for example the
1320.B XFS_IOC_DIOINFO
1321operation in
1322.BR xfsctl (3).
1323.LP
85c2bdba
MK
1324Under Linux 2.4, transfer sizes, and the alignment of the user buffer
1325and the file offset must all be multiples of the logical block size
9ee4a2b6 1326of the filesystem.
e6042e4a
PS
1327Under Linux 2.6 and newer, alignment to the logical block size of the
1328underlying storage (typically 512 bytes) suffices.
1329Logical block size can be determined with
1330.BR ioctl (2)
1331.B BLKSSZGET
1332system call or
1333.BR blockdev (8)
1334command with
1335.B --getss
1336parameter.
1847167b
NP
1337.LP
1338.B O_DIRECT
1339I/Os should never be run concurrently with the
04cd7f64 1340.BR fork (2)
1847167b
NP
1341system call,
1342if the memory buffer is a private mapping
1343(i.e., any mapping created with the
02ace852 1344.BR mmap (2)
1847167b 1345.BR MAP_PRIVATE
0ab8aeec 1346flag;
1847167b
NP
1347this includes memory allocated on the heap and statically allocated buffers).
1348Any such I/Os, whether submitted via an asynchronous I/O interface or from
1349another thread in the process,
1350should be completed before
1351.BR fork (2)
1352is called.
1353Failure to do so can result in data corruption and undefined behavior in
1354parent and child processes.
1355This restriction does not apply when the memory buffer for the
1356.B O_DIRECT
1357I/Os was created using
1358.BR shmat (2)
1359or
1360.BR mmap (2)
1361with the
1362.B MAP_SHARED
1363flag.
1364Nor does this restriction apply when the memory buffer has been advised as
1365.B MADV_DONTFORK
0ab8aeec 1366with
02ace852 1367.BR madvise (2),
1847167b
NP
1368ensuring that it will not be available
1369to the child after
1370.BR fork (2).
ddc4d339
MK
1371.LP
1372The
1373.B O_DIRECT
1374flag was introduced in SGI IRIX, where it has alignment
1375restrictions similar to those of Linux 2.4.
1376IRIX has also a
1377.BR fcntl (2)
1378call to query appropriate alignments, and sizes.
1379FreeBSD 4.x introduced
1380a flag of the same name, but without alignment restrictions.
1381.LP
1382.B O_DIRECT
1383support was added under Linux in kernel version 2.4.10.
1384Older Linux kernels simply ignore this flag.
9ee4a2b6 1385Some filesystems may not implement the flag and
ddc4d339
MK
1386.BR open ()
1387will fail with
1388.B EINVAL
1389if it is used.
1390.LP
1391Applications should avoid mixing
1392.B O_DIRECT
1393and normal I/O to the same file,
1394and especially to overlapping byte regions in the same file.
9ee4a2b6 1395Even when the filesystem correctly handles the coherency issues in
ddc4d339
MK
1396this situation, overall I/O throughput is likely to be slower than
1397using either mode alone.
1398Likewise, applications should avoid mixing
1399.BR mmap (2)
1400of files with direct I/O to the same files.
1401.LP
1402The behaviour of
1403.B O_DIRECT
9ee4a2b6 1404with NFS will differ from local filesystems.
ddc4d339
MK
1405Older kernels, or
1406kernels configured in certain ways, may not support this combination.
1407The NFS protocol does not support passing the flag to the server, so
1408.B O_DIRECT
33a0ccb2 1409I/O will bypass the page cache only on the client; the server may
ddc4d339
MK
1410still cache the I/O.
1411The client asks the server to make the I/O
1412synchronous to preserve the synchronous semantics of
1413.BR O_DIRECT .
1414Some servers will perform poorly under these circumstances, especially
1415if the I/O size is small.
1416Some servers may also be configured to
1417lie to clients about the I/O having reached stable storage; this
1418will avoid the performance penalty at some risk to data integrity
1419in the event of server power failure.
1420The Linux NFS client places no alignment restrictions on
1421.B O_DIRECT
1422I/O.
1423.PP
1424In summary,
1425.B O_DIRECT
1426is a potentially powerful tool that should be used with caution.
1427It is recommended that applications treat use of
1428.B O_DIRECT
1429as a performance option which is disabled by default.
1430.PP
1431.RS
fea681da
MK
1432"The thing that has always disturbed me about O_DIRECT is that the whole
1433interface is just stupid, and was probably designed by a deranged monkey
5503c85e 1434on some serious mind-controlling substances."\(emLinus
ddc4d339
MK
1435.RE
1436.SH BUGS
b50582eb
MK
1437Currently, it is not possible to enable signal-driven
1438I/O by specifying
1439.B O_ASYNC
c13182ef 1440when calling
b50582eb
MK
1441.BR open ();
1442use
1443.BR fcntl (2)
1444to enable this flag.
0e1ad98c 1445.\" FIXME . Check bugzilla report on open(O_ASYNC)
92057f4d 1446.\" See http://bugzilla.kernel.org/show_bug.cgi?id=5993
0d730fcc
MK
1447
1448One must check for two different error codes,
1449.B EISDIR
1450and
1451.BR ENOENT ,
1452when trying to determine whether the kernel supports
0d55b37f 1453.B O_TMPFILE
0d730fcc 1454functionality.
47297adb 1455.SH SEE ALSO
a3bf8022
MK
1456.BR chmod (2),
1457.BR chown (2),
fea681da 1458.BR close (2),
e366dbc4 1459.BR dup (2),
fea681da
MK
1460.BR fcntl (2),
1461.BR link (2),
1f6ceb40 1462.BR lseek (2),
fea681da 1463.BR mknod (2),
e366dbc4 1464.BR mmap (2),
f0c34053 1465.BR mount (2),
b088c3ca 1466.BR open_by_name_at (2),
fea681da
MK
1467.BR read (2),
1468.BR socket (2),
1469.BR stat (2),
1470.BR umask (2),
1471.BR unlink (2),
1472.BR write (2),
1473.BR fopen (3),
f0c34053 1474.BR fifo (7),
a9cfde1d
MK
1475.BR path_resolution (7),
1476.BR symlink (7)