]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/open.2
exec.3: SEE ALSO: add execveat(2)
[thirdparty/man-pages.git] / man2 / open.2
CommitLineData
fea681da 1.\" This manpage is Copyright (C) 1992 Drew Eckhardt;
fd185f58
MK
2.\" and Copyright (C) 1993 Michael Haardt, Ian Jackson.
3.\" and Copyright (C) 2008 Greg Banks
7b8ba76c 4.\" and Copyright (C) 2006, 2008, 2013, 2014 Michael Kerrisk <mtk.manpages@gmail.com>
fea681da 5.\"
93015253 6.\" %%%LICENSE_START(VERBATIM)
fea681da
MK
7.\" Permission is granted to make and distribute verbatim copies of this
8.\" manual provided the copyright notice and this permission notice are
9.\" preserved on all copies.
10.\"
11.\" Permission is granted to copy and distribute modified versions of this
12.\" manual under the conditions for verbatim copying, provided that the
13.\" entire resulting derived work is distributed under the terms of a
14.\" permission notice identical to this one.
c13182ef 15.\"
fea681da
MK
16.\" Since the Linux kernel and libraries are constantly changing, this
17.\" manual page may be incorrect or out-of-date. The author(s) assume no
18.\" responsibility for errors or omissions, or for damages resulting from
19.\" the use of the information contained herein. The author(s) may not
20.\" have taken the same level of care in the production of this manual,
21.\" which is licensed free of charge, as they might when working
22.\" professionally.
c13182ef 23.\"
fea681da
MK
24.\" Formatted or processed versions of this manual, if unaccompanied by
25.\" the source, must acknowledge the copyright and authors of this work.
4b72fb64 26.\" %%%LICENSE_END
fea681da
MK
27.\"
28.\" Modified 1993-07-21 by Rik Faith <faith@cs.unc.edu>
29.\" Modified 1994-08-21 by Michael Haardt
30.\" Modified 1996-04-13 by Andries Brouwer <aeb@cwi.nl>
31.\" Modified 1996-05-13 by Thomas Koenig
32.\" Modified 1996-12-20 by Michael Haardt
33.\" Modified 1999-02-19 by Andries Brouwer <aeb@cwi.nl>
34.\" Modified 1998-11-28 by Joseph S. Myers <jsm28@hermes.cam.ac.uk>
35.\" Modified 1999-06-03 by Michael Haardt
c11b1abf
MK
36.\" Modified 2002-05-07 by Michael Kerrisk <mtk.manpages@gmail.com>
37.\" Modified 2004-06-23 by Michael Kerrisk <mtk.manpages@gmail.com>
1c1e15ed
MK
38.\" 2004-12-08, mtk, reordered flags list alphabetically
39.\" 2004-12-08, Martin Pool <mbp@sourcefrog.net> (& mtk), added O_NOATIME
fe75ec04 40.\" 2007-09-18, mtk, Added description of O_CLOEXEC + other minor edits
447bb15e 41.\" 2008-01-03, mtk, with input from Trond Myklebust
f4b9d6a5
MK
42.\" <trond.myklebust@fys.uio.no> and Timo Sirainen <tss@iki.fi>
43.\" Rewrite description of O_EXCL.
ddc4d339
MK
44.\" 2008-01-11, Greg Banks <gnb@melbourne.sgi.com>: add more detail
45.\" on O_DIRECT.
d77eb764 46.\" 2008-02-26, Michael Haardt: Reorganized text for O_CREAT and mode
fea681da 47.\"
61b7c1e1 48.\" FIXME . Apr 08: The next POSIX revision has O_EXEC, O_SEARCH, and
9f91e36c
MK
49.\" O_TTYINIT. Eventually these may need to be documented. --mtk
50.\"
0649afd4 51.TH OPEN 2 2014-12-31 "Linux" "Linux Programmer's Manual"
fea681da 52.SH NAME
7b8ba76c 53open, openat, creat \- open and possibly create a file
fea681da
MK
54.SH SYNOPSIS
55.nf
56.B #include <sys/types.h>
57.B #include <sys/stat.h>
58.B #include <fcntl.h>
59.sp
60.BI "int open(const char *" pathname ", int " flags );
61.BI "int open(const char *" pathname ", int " flags ", mode_t " mode );
5895e7eb 62
fea681da 63.BI "int creat(const char *" pathname ", mode_t " mode );
7b8ba76c
MK
64.sp
65.BI "int openat(int " dirfd ", const char *" pathname ", int " flags );
66.BI "int openat(int " dirfd ", const char *" pathname ", int " flags \
67", mode_t " mode );
fea681da 68.fi
7b8ba76c
MK
69.sp
70.in -4n
71Feature Test Macro Requirements for glibc (see
72.BR feature_test_macros (7)):
73.in
74.sp
75.BR openat ():
76.PD 0
77.ad l
78.RS 4
79.TP 4
80Since glibc 2.10:
81_XOPEN_SOURCE\ >=\ 700 || _POSIX_C_SOURCE\ >=\ 200809L
82.TP
83Before glibc 2.10:
84_ATFILE_SOURCE
85.RE
86.ad
87.PD
fea681da 88.SH DESCRIPTION
e366dbc4 89Given a
0daa9e92 90.I pathname
e366dbc4 91for a file,
1f6ceb40 92.BR open ()
2fda57bd 93returns a file descriptor, a small, nonnegative integer
e366dbc4
MK
94for use in subsequent system calls
95.RB ( read "(2), " write "(2), " lseek "(2), " fcntl "(2), etc.)."
96The file descriptor returned by a successful call will be
2c4bff36 97the lowest-numbered file descriptor not currently open for the process.
e366dbc4 98.PP
fe75ec04 99By default, the new file descriptor is set to remain open across an
e366dbc4 100.BR execve (2)
1f6ceb40
MK
101(i.e., the
102.B FD_CLOEXEC
103file descriptor flag described in
31d79098
SP
104.BR fcntl (2)
105is initially disabled); the
fe75ec04 106.B O_CLOEXEC
d6a74b95 107flag, described below, can be used to change this default.
1f6ceb40 108The file offset is set to the beginning of the file (see
c13182ef 109.BR lseek (2)).
e366dbc4
MK
110.PP
111A call to
112.BR open ()
113creates a new
114.IR "open file description" ,
115an entry in the system-wide table of open files.
61b12e2b 116The open file description records the file offset and the file status flags
20ee63c1 117(see below).
61b12e2b 118A file descriptor is a reference to an open file description;
2c4bff36
MK
119this reference is unaffected if
120.I pathname
121is subsequently removed or modified to refer to a different file.
d20d9d33 122For further details on open file descriptions, see NOTES.
e366dbc4 123.PP
c4bb193f 124The argument
fea681da 125.I flags
e366dbc4
MK
126must include one of the following
127.IR "access modes" :
c7992edc 128.BR O_RDONLY ", " O_WRONLY ", or " O_RDWR .
e366dbc4
MK
129These request opening the file read-only, write-only, or read/write,
130respectively.
bfe9ba67
MK
131
132In addition, zero or more file creation flags and file status flags
c13182ef 133can be
fea681da 134.RI bitwise- or 'd
e366dbc4 135in
bfe9ba67 136.IR flags .
c13182ef
MK
137The
138.I file creation flags
139are
0e40804c 140.BR O_CLOEXEC ,
b072a788 141.BR O_CREAT ,
0e40804c
MK
142.BR O_DIRECTORY ,
143.BR O_EXCL ,
144.BR O_NOCTTY ,
145.BR O_NOFOLLOW ,
f2698a42 146.BR O_TMPFILE ,
0e40804c
MK
147.BR O_TRUNC ,
148and
149.BR O_TTY_INIT .
c13182ef
MK
150The
151.I file status flags
bfe9ba67 152are all of the remaining flags listed below.
0e40804c 153.\" SUSv4 divides the flags into:
93ee8f96
MK
154.\" * Access mode
155.\" * File creation
156.\" * File status
157.\" * Other (O_CLOEXEC, O_DIRECTORY, O_NOFOLLOW)
158.\" though it's not clear what the difference between "other" and
0e40804c
MK
159.\" "File creation" flags is. I raised an Aardvark to see if this
160.\" can be clarified in SUSv4; 10 Oct 2008.
161.\" http://thread.gmane.org/gmane.comp.standards.posix.austin.general/64/focus=67
162.\" TC1 (balloted in 2013), resolved this, so that those three constants
163.\" are also categorized" as file status flags.
164.\"
bfe9ba67
MK
165The distinction between these two groups of flags is that
166the file status flags can be retrieved and (in some cases)
566b427d
MK
167modified; see
168.BR fcntl (2)
169for details.
170
bfe9ba67 171The full list of file creation flags and file status flags is as follows:
fea681da 172.TP
1c1e15ed 173.B O_APPEND
c13182ef
MK
174The file is opened in append mode.
175Before each
0bfa087b 176.BR write (2),
1e568304 177the file offset is positioned at the end of the file,
1c1e15ed 178as if with
0bfa087b 179.BR lseek (2).
1c1e15ed 180.B O_APPEND
9ee4a2b6 181may lead to corrupted files on NFS filesystems if more than one process
c13182ef 182appends data to a file at once.
a4391429
MK
183.\" For more background, see
184.\" http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=453946
185.\" http://nfs.sourceforge.net/
c13182ef 186This is because NFS does not support
1c1e15ed
MK
187appending to a file, so the client kernel has to simulate it, which
188can't be done without a race condition.
189.TP
190.B O_ASYNC
b50582eb 191Enable signal-driven I/O:
8bd58774
MK
192generate a signal
193.RB ( SIGIO
194by default, but this can be changed via
1c1e15ed
MK
195.BR fcntl (2))
196when input or output becomes possible on this file descriptor.
33a0ccb2 197This feature is available only for terminals, pseudoterminals,
1f6ceb40
MK
198sockets, and (since Linux 2.6) pipes and FIFOs.
199See
1c1e15ed
MK
200.BR fcntl (2)
201for further details.
9bde4908 202See also BUGS, below.
fe75ec04 203.TP
31c1f2b0 204.BR O_CLOEXEC " (since Linux 2.6.23)"
7fdec065 205.\" NOTE! several other man pages refer to this text
fe75ec04 206Enable the close-on-exec flag for the new file descriptor.
24ec631f 207Specifying this flag permits a program to avoid additional
fe75ec04
MK
208.BR fcntl (2)
209.B F_SETFD
24ec631f 210operations to set the
0daa9e92 211.B FD_CLOEXEC
fe75ec04 212flag.
7756d157
MK
213
214Note that the use of this flag is essential in some multithreaded programs,
215because using a separate
fe75ec04
MK
216.BR fcntl (2)
217.B F_SETFD
218operation to set the
0daa9e92 219.B FD_CLOEXEC
fe75ec04 220flag does not suffice to avoid race conditions
7756d157
MK
221where one thread opens a file descriptor and
222attempts to set its close-on-exec flag using
223.BR fcntl (2)
224at the same time as another thread does a
fe75ec04
MK
225.BR fork (2)
226plus
227.BR execve (2).
7756d157 228Depending on the order of execution,
30821db8 229the race may lead to the file descriptor returned by
7756d157
MK
230.BR open ()
231being unintentionally leaked to the program executed by the child process
232created by
233.BR fork (2).
234(This kind of race is in principle possible for any system call
235that creates a file descriptor whose close-on-exec flag should be set,
236and various other Linux system calls provide an equivalent of the
237.BR O_CLOEXEC
238flag to deal with this problem.)
fe75ec04
MK
239.\" This flag fixes only one form of the race condition;
240.\" The race can also occur with, for example, descriptors
241.\" returned by accept(), pipe(), etc.
1c1e15ed 242.TP
fea681da 243.B O_CREAT
f1ad56a6 244If the file does not exist, it will be created.
fea681da 245The owner (user ID) of the file is set to the effective user ID
c13182ef
MK
246of the process.
247The group ownership (group ID) is set either to
fea681da 248the effective group ID of the process or to the group ID of the
9ee4a2b6 249parent directory (depending on filesystem type and mount options,
0fb83d00 250and the mode of the parent directory; see the mount options
fea681da
MK
251.I bsdgroups
252and
253.I sysvgroups
8b39ad66 254described in
fea681da 255.BR mount (8)).
8b39ad66
MK
256.\" As at 2.6.25, bsdgroups is supported by ext2, ext3, ext4, and
257.\" XFS (since 2.6.14).
4e698277
MK
258.RS
259.PP
260.I mode
261specifies the permissions to use in case a new file is created.
262This argument must be supplied when
263.B O_CREAT
f2698a42
AL
264or
265.B O_TMPFILE
4e698277
MK
266is specified in
267.IR flags ;
f2698a42 268if neither
4e698277 269.B O_CREAT
f2698a42
AL
270nor
271.B O_TMPFILE
272is specified, then
4e698277
MK
273.I mode
274is ignored.
275The effective permissions are modified by
276the process's
277.I umask
278in the usual way: The permissions of the created file are
84a275c4 279.IR "(mode\ &\ ~umask)" .
33a0ccb2 280Note that this mode applies only to future accesses of the
4e698277
MK
281newly created file; the
282.BR open ()
283call that creates a read-only file may well return a read/write
284file descriptor.
285.PP
286The following symbolic constants are provided for
287.IR mode :
288.TP 9
289.B S_IRWXU
29000700 user (file owner) has read, write and execute permission
291.TP
292.B S_IRUSR
29300400 user has read permission
294.TP
295.B S_IWUSR
29600200 user has write permission
297.TP
298.B S_IXUSR
29900100 user has execute permission
300.TP
301.B S_IRWXG
30200070 group has read, write and execute permission
303.TP
304.B S_IRGRP
30500040 group has read permission
306.TP
307.B S_IWGRP
30800020 group has write permission
309.TP
310.B S_IXGRP
31100010 group has execute permission
312.TP
313.B S_IRWXO
31400007 others have read, write and execute permission
315.TP
316.B S_IROTH
31700004 others have read permission
318.TP
319.B S_IWOTH
32000002 others have write permission
321.TP
322.B S_IXOTH
32300001 others have execute permission
324.RE
fea681da 325.TP
31c1f2b0 326.BR O_DIRECT " (since Linux 2.4.10)"
1c1e15ed
MK
327Try to minimize cache effects of the I/O to and from this file.
328In general this will degrade performance, but it is useful in
329special situations, such as when applications do their own caching.
bce0482f 330File I/O is done directly to/from user-space buffers.
015221ef
CH
331The
332.B O_DIRECT
0deb3ce9 333flag on its own makes an effort to transfer data synchronously,
015221ef
CH
334but does not give the guarantees of the
335.B O_SYNC
0deb3ce9
JM
336flag that data and necessary metadata are transferred.
337To guarantee synchronous I/O,
015221ef
CH
338.B O_SYNC
339must be used in addition to
340.BR O_DIRECT .
be02e49f 341See NOTES below for further discussion.
9b54d4fa 342.sp
c13182ef 343A semantically similar (but deprecated) interface for block devices
9b54d4fa 344is described in
1c1e15ed
MK
345.BR raw (8).
346.TP
347.B O_DIRECTORY
a8d55537 348If \fIpathname\fP is not a directory, cause the open to fail.
9f8d688a
MK
349.\" But see the following and its replies:
350.\" http://marc.theaimsgroup.com/?t=112748702800001&r=1&w=2
351.\" [PATCH] open: O_DIRECTORY and O_CREAT together should fail
352.\" O_DIRECTORY | O_CREAT causes O_DIRECTORY to be ignored.
65496644 353This flag was added in kernel version 2.1.126, to
60a90ecd
MK
354avoid denial-of-service problems if
355.BR opendir (3)
356is called on a
a3041a58 357FIFO or tape device.
1c1e15ed 358.TP
6cf19e62
MK
359.B O_DSYNC
360Write operations on the file will complete according to the requirements of
361synchronized I/O
362.I data
363integrity completion.
364
365By the time
366.BR write (2)
367(and similar)
368return, the output data
369has been transferred to the underlying hardware,
370along with any file metadata that would be required to retrieve that data
371(i.e., as though each
372.BR write (2)
373was followed by a call to
374.BR fdatasync (2)).
375.IR "See NOTES below" .
376.TP
fea681da 377.B O_EXCL
f4b9d6a5
MK
378Ensure that this call creates the file:
379if this flag is specified in conjunction with
fea681da 380.BR O_CREAT ,
f4b9d6a5
MK
381and
382.I pathname
383already exists, then
1c1e15ed 384.BR open ()
c13182ef 385will fail.
f4b9d6a5
MK
386
387When these two flags are specified, symbolic links are not followed:
388.\" POSIX.1-2001 explicitly requires this behavior.
389if
390.I pathname
391is a symbolic link, then
392.BR open ()
393fails regardless of where the symbolic link points to.
394
10b7a945
IHV
395In general, the behavior of
396.B O_EXCL
397is undefined if it is used without
398.BR O_CREAT .
399There is one exception: on Linux 2.6 and later,
400.B O_EXCL
401can be used without
402.B O_CREAT
403if
404.I pathname
405refers to a block device.
6303d401
DB
406If the block device is in use by the system (e.g., mounted),
407.BR open ()
10b7a945
IHV
408fails with the error
409.BR EBUSY .
410
efe08656 411On NFS,
f4b9d6a5 412.B O_EXCL
33a0ccb2 413is supported only when using NFSv3 or later on kernel 2.6 or later.
efe08656 414In NFS environments where
fea681da 415.B O_EXCL
f4b9d6a5
MK
416support is not provided, programs that rely on it
417for performing locking tasks will contain a race condition.
418Portable programs that want to perform atomic file locking using a lockfile,
419and need to avoid reliance on NFS support for
420.BR O_EXCL ,
421can create a unique file on
9ee4a2b6 422the same filesystem (e.g., incorporating hostname and PID), and use
fea681da 423.BR link (2)
c13182ef 424to make a link to the lockfile.
60a90ecd
MK
425If
426.BR link (2)
f4b9d6a5 427returns 0, the lock is successful.
c13182ef 428Otherwise, use
fea681da
MK
429.BR stat (2)
430on the unique file to check if its link count has increased to 2,
431in which case the lock is also successful.
432.TP
1c1e15ed
MK
433.B O_LARGEFILE
434(LFS)
435Allow files whose sizes cannot be represented in an
8478ee02 436.I off_t
1c1e15ed 437(but can be represented in an
8478ee02 438.IR off64_t )
1c1e15ed 439to be opened.
c13182ef 440The
bcdd964e 441.B _LARGEFILE64_SOURCE
e417acb0
MK
442macro must be defined
443(before including
444.I any
445header files)
446in order to obtain this definition.
c13182ef 447Setting the
bcdd964e 448.B _FILE_OFFSET_BITS
9f3d8b28
MK
449feature test macro to 64 (rather than using
450.BR O_LARGEFILE )
12e263f1 451is the preferred
9f3d8b28 452method of accessing large files on 32-bit systems (see
2dcbf4f7 453.BR feature_test_macros (7)).
1c1e15ed 454.TP
31c1f2b0 455.BR O_NOATIME " (since Linux 2.6.8)"
1bb72c96
MK
456Do not update the file last access time
457.RI ( st_atime
458in the inode)
310b7919 459when the file is
1c1e15ed
MK
460.BR read (2).
461This flag is intended for use by indexing or backup programs,
462where its use can significantly reduce the amount of disk activity.
9ee4a2b6 463This flag may not be effective on all filesystems.
1c1e15ed 464One example is NFS, where the server maintains the access time.
0e1ad98c 465.\" The O_NOATIME flag also affects the treatment of st_atime
92057f4d 466.\" by mmap() and readdir(2), MTK, Dec 04.
1c1e15ed 467.TP
fea681da
MK
468.B O_NOCTTY
469If
470.I pathname
5503c85e 471refers to a terminal device\(emsee
1bb72c96
MK
472.BR tty (4)\(emit
473will not become the process's controlling terminal even if the
fea681da
MK
474process does not have one.
475.TP
1c1e15ed 476.B O_NOFOLLOW
a8d55537 477If \fIpathname\fP is a symbolic link, then the open fails.
c13182ef 478This is a FreeBSD extension, which was added to Linux in version 2.1.126.
1c1e15ed 479Symbolic links in earlier components of the pathname will still be
e366dbc4 480followed.
1135dbe1 481See also
843068bd 482.BR O_PATH
1135dbe1 483below.
e366dbc4
MK
484.\" The headers from glibc 2.0.100 and later include a
485.\" definition of this flag; \fIkernels before 2.1.126 will ignore it if
a8d55537 486.\" used\fP.
fea681da
MK
487.TP
488.BR O_NONBLOCK " or " O_NDELAY
ff40dbb3 489When possible, the file is opened in nonblocking mode.
c13182ef 490Neither the
1c1e15ed 491.BR open ()
fea681da
MK
492nor any subsequent operations on the file descriptor which is
493returned will cause the calling process to wait.
494For the handling of FIFOs (named pipes), see also
af5b2ef2 495.BR fifo (7).
db28bfac 496For a discussion of the effect of
0daa9e92 497.B O_NONBLOCK
db28bfac
MK
498in conjunction with mandatory file locks and with file leases, see
499.BR fcntl (2).
fea681da 500.TP
1135dbe1
MK
501.BR O_PATH " (since Linux 2.6.39)"
502.\" commit 1abf0c718f15a56a0a435588d1b104c7a37dc9bd
503.\" commit 326be7b484843988afe57566b627fb7a70beac56
504.\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d
505.\"
506.\" http://thread.gmane.org/gmane.linux.man/2790/focus=3496
507.\" Subject: Re: [PATCH] open(2): document O_PATH
508.\" Newsgroups: gmane.linux.man, gmane.linux.kernel
509.\"
1135dbe1 510Obtain a file descriptor that can be used for two purposes:
9ee4a2b6 511to indicate a location in the filesystem tree and
1135dbe1
MK
512to perform operations that act purely at the file descriptor level.
513The file itself is not opened, and other file operations (e.g.,
514.BR read (2),
515.BR write (2),
516.BR fchmod (2),
517.BR fchown (2),
2510e4e5
RH
518.BR fgetxattr (2),
519.BR mmap (2))
1135dbe1
MK
520fail with the error
521.BR EBADF .
522
523The following operations
524.I can
525be performed on the resulting file descriptor:
526.RS
527.IP * 3
528.BR close (2);
529.BR fchdir (2)
530(since Linux 3.5);
531.\" commit 332a2e1244bd08b9e3ecd378028513396a004a24
532.BR fstat (2)
533(since Linux 3.6).
534.\" fstat(): commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2
535.IP *
536Duplicating the file descriptor
537.RB ( dup (2),
538.BR fcntl (2)
539.BR F_DUPFD ,
540etc.).
541.IP *
542Getting and setting file descriptor flags
543.RB ( fcntl (2)
544.BR F_GETFD
545and
546.BR F_SETFD ).
09f677a3
MK
547.IP *
548Retrieving open file status flags using the
549.BR fcntl (2)
13a082cb 550.BR F_GETFL
09f677a3
MK
551operation: the returned flags will include the bit
552.BR O_PATH .
1135dbe1
MK
553.IP *
554Passing the file descriptor as the
555.IR dirfd
556argument of
557.BR openat (2)
558and the other "*at()" system calls.
7dee406b
AL
559This includes
560.BR linkat (2)
561with
0da5e58a 562.BR AT_EMPTY_PATH
7dee406b
AL
563(or via procfs using
564.BR AT_SYMLINK_FOLLOW )
565even if the file is not a directory.
1135dbe1
MK
566.IP *
567Passing the file descriptor to another process via a UNIX domain socket
568(see
569.BR SCM_RIGHTS
570in
571.BR unix (7)).
572.RE
573.IP
574When
575.B O_PATH
576is specified in
577.IR flags ,
578flag bits other than
6807fc6f
MK
579.BR O_CLOEXEC ,
580.BR O_DIRECTORY ,
1135dbe1
MK
581and
582.BR O_NOFOLLOW
583are ignored.
584
d30344ab
MK
585If
586.I pathname
587is a symbolic link and the
1135dbe1
MK
588.BR O_NOFOLLOW
589flag is also specified,
590then the call returns a file descriptor referring to the symbolic link.
591This file descriptor can be used as the
592.I dirfd
593argument in calls to
594.BR fchownat (2),
595.BR fstatat (2),
596.BR linkat (2),
597and
598.BR readlinkat (2)
599with an empty pathname to have the calls operate on the symbolic link.
600.TP
fea681da 601.B O_SYNC
6cf19e62
MK
602Write operations on the file will complete according to the requirements of
603synchronized I/O
604.I file
605integrity completion
f36a1468 606(by contrast with the
6cf19e62
MK
607synchronized I/O
608.I data
609integrity completion
610provided by
611.BR O_DSYNC .)
612
613By the time
614.BR write (2)
615(and similar)
616return, the output data and associated file metadata
617have been transferred to the underlying hardware
618(i.e., as though each
619.BR write (2)
620was followed by a call to
621.BR fsync (2)).
622.IR "See NOTES below" .
fea681da 623.TP
40398c1a
MK
624.BR O_TMPFILE " (since Linux 3.11)"
625.\" commit 60545d0d4610b02e55f65d141c95b18ccf855b6e
626.\" commit f4e0c30c191f87851c4a53454abb55ee276f4a7e
627.\" commit bb458c644a59dbba3a1fe59b27106c5e68e1c4bd
628Create an unnamed temporary file.
629The
630.I pathname
631argument specifies a directory;
632an unnamed inode will be created in that directory's filesystem.
633Anything written to the resulting file will be lost when
634the last file descriptor is closed, unless the file is given a name.
635
636.B O_TMPFILE
637must be specified with one of
638.B O_RDWR
639or
640.B O_WRONLY
641and, optionally,
642.BR O_EXCL .
643If
644.B O_EXCL
645is not specified, then
646.BR linkat (2)
647can be used to link the temporary file into the filesystem, making it
648permanent, using code like the following:
649
650.in +4n
651.nf
652char path[PATH_MAX];
653fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
0fb83d00
MK
654 S_IRUSR | S_IWUSR);
655
40398c1a 656/* File I/O on 'fd'... */
0fb83d00 657
40398c1a 658snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
e1252130 659linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
0fb83d00 660 AT_SYMLINK_FOLLOW);
40398c1a
MK
661.fi
662.in
663
664In this case,
665the
666.BR open ()
667.I mode
668argument determines the file permission mode, as with
669.BR O_CREAT .
670
0115aaed
MK
671Specifying
672.B O_EXCL
673in conjunction with
674.B O_TMPFILE
675prevents a temporary file from being linked into the filesystem
676in the above manner.
677(Note that the meaning of
678.B O_EXCL
679in this case is different from the meaning of
680.B O_EXCL
681otherwise.)
682
683
40398c1a
MK
684There are two main use cases for
685.\" Inspired by http://lwn.net/Articles/559147/
686.BR O_TMPFILE :
687.RS
688.IP * 3
689Improved
690.BR tmpfile (3)
691functionality: race-free creation of temporary files that
692(1) are automatically deleted when closed;
693(2) can never be reached via any pathname;
694(3) are not subject to symlink attacks; and
695(4) do not require the caller to devise unique names.
696.IP *
697Creating a file that is initially invisible, which is then populated
8b04592d 698with data and adjusted to have appropriate filesystem attributes
40398c1a
MK
699.RB ( chown (2),
700.BR chmod (2),
701.BR fsetxattr (2),
702etc.)
703before being atomically linked into the filesystem
704in a fully formed state (using
705.BR linkat (2)
706as described above).
707.RE
708.IP
709.B O_TMPFILE
710requires support by the underlying filesystem;
40398c1a 711only a subset of Linux filesystems provide that support.
cde2074a 712In the initial implementation, support was provided in
9af6b115 713the ext2, ext3, ext4, UDF, Minix, and shmem filesystems.
cde2074a
MK
714XFS support was added
715.\" commit 99b6436bc29e4f10e4388c27a3e4810191cc4788
716.\" commit ab29743117f9f4c22ac44c13c1647fb24fb2bafe
717in Linux 3.15.
40398c1a 718.TP
1c1e15ed 719.B O_TRUNC
4d61d36a 720If the file already exists and is a regular file and the access mode allows
682edefb
MK
721writing (i.e., is
722.B O_RDWR
723or
724.BR O_WRONLY )
725it will be truncated to length 0.
726If the file is a FIFO or terminal device file, the
727.B O_TRUNC
c13182ef 728flag is ignored.
2b9b829d 729Otherwise, the effect of
682edefb
MK
730.B O_TRUNC
731is unspecified.
7b8ba76c 732.SS creat()
1c1e15ed 733.BR creat ()
fea681da 734is equivalent to
1c1e15ed 735.BR open ()
fea681da
MK
736with
737.I flags
738equal to
739.BR O_CREAT|O_WRONLY|O_TRUNC .
7b8ba76c
MK
740.SS openat()
741The
742.BR openat ()
743system call operates in exactly the same way as
cadd38ba 744.BR open (),
7b8ba76c
MK
745except for the differences described here.
746
747If the pathname given in
748.I pathname
749is relative, then it is interpreted relative to the directory
3ad65ff0 750referred to by the file descriptor
7b8ba76c
MK
751.I dirfd
752(rather than relative to the current working directory of
753the calling process, as is done by
cadd38ba 754.BR open ()
7b8ba76c
MK
755for a relative pathname).
756
757If
758.I pathname
759is relative and
760.I dirfd
761is the special value
762.BR AT_FDCWD ,
763then
764.I pathname
765is interpreted relative to the current working
766directory of the calling process (like
cadd38ba 767.BR open ()).
7b8ba76c
MK
768
769If
770.I pathname
771is absolute, then
772.I dirfd
773is ignored.
47297adb 774.SH RETURN VALUE
7b8ba76c
MK
775.BR open (),
776.BR openat (),
c13182ef 777and
e1d6264d 778.BR creat ()
1c1e15ed
MK
779return the new file descriptor, or \-1 if an error occurred
780(in which case,
fea681da
MK
781.I errno
782is set appropriately).
fea681da 783.SH ERRORS
7b8ba76c
MK
784.BR open (),
785.BR openat (),
786and
787.BR creat ()
788can fail with the following errors:
fea681da
MK
789.TP
790.B EACCES
791The requested access to the file is not allowed, or search permission
792is denied for one of the directories in the path prefix of
793.IR pathname ,
794or the file did not exist yet and write access to the parent directory
795is not allowed.
796(See also
ad7cc990 797.BR path_resolution (7).)
fea681da 798.TP
a1f01685
MH
799.B EDQUOT
800Where
801.B O_CREAT
802is specified, the file does not exist, and the user's quota of disk
9ee4a2b6 803blocks or inodes on the filesystem has been exhausted.
a1f01685 804.TP
fea681da
MK
805.B EEXIST
806.I pathname
807already exists and
808.BR O_CREAT " and " O_EXCL
809were used.
810.TP
811.B EFAULT
0daa9e92 812.I pathname
e1d6264d 813points outside your accessible address space.
fea681da 814.TP
9f5773f7 815.B EFBIG
7c7fb552
MK
816See
817.BR EOVERFLOW .
9f5773f7 818.TP
e51412ea
MK
819.B EINTR
820While blocked waiting to complete an open of a slow device
821(e.g., a FIFO; see
822.BR fifo (7)),
823the call was interrupted by a signal handler; see
824.BR signal (7).
825.TP
ef490193
DG
826.B EINVAL
827The filesystem does not support the
828.BR O_DIRECT
e6f89ed2
MK
829flag.
830See
ef490193
DG
831.BR NOTES
832for more information.
833.TP
8e335391
MK
834.B EINVAL
835Invalid value in
836.\" In particular, __O_TMPFILE instead of O_TMPFILE
837.IR flags .
838.TP
839.B EINVAL
840.B O_TMPFILE
841was specified in
842.IR flags ,
843but neither
844.B O_WRONLY
845nor
846.B O_RDWR
847was specified.
848.TP
fea681da
MK
849.B EISDIR
850.I pathname
851refers to a directory and the access requested involved writing
852(that is,
853.B O_WRONLY
854or
855.B O_RDWR
856is set).
857.TP
8e335391 858.B EISDIR
843068bd
MK
859.I pathname
860refers to an existing directory,
8e335391
MK
861.B O_TMPFILE
862and one of
863.B O_WRONLY
864or
865.B O_RDWR
866were specified in
867.IR flags ,
868but this kernel version does not provide the
869.B O_TMPFILE
870functionality.
871.TP
fea681da
MK
872.B ELOOP
873Too many symbolic links were encountered in resolving
289f7907
MK
874.IR pathname .
875.TP
876.B ELOOP
fea681da 877.I pathname
289f7907
MK
878was a symbolic link, and
879.I flags
880specified
881.BR O_NOFOLLOW
882but not
883.BR O_PATH .
fea681da
MK
884.TP
885.B EMFILE
12c21590
MK
886The process already has the maximum number of files open
887(see the description of
888.BR RLIMIT_NOFILE
889in
890.BR getrlimit (2)).
fea681da
MK
891.TP
892.B ENAMETOOLONG
0daa9e92 893.I pathname
e1d6264d 894was too long.
fea681da
MK
895.TP
896.B ENFILE
897The system limit on the total number of open files has been reached.
898.TP
899.B ENODEV
900.I pathname
901refers to a device special file and no corresponding device exists.
682edefb
MK
902(This is a Linux kernel bug; in this situation
903.B ENXIO
904must be returned.)
fea681da
MK
905.TP
906.B ENOENT
682edefb
MK
907.B O_CREAT
908is not set and the named file does not exist.
fea681da
MK
909Or, a directory component in
910.I pathname
911does not exist or is a dangling symbolic link.
912.TP
ba03011f
MK
913.B ENOENT
914.I pathname
915refers to a nonexistent directory,
916.B O_TMPFILE
917and one of
918.B O_WRONLY
919or
920.B O_RDWR
921were specified in
922.IR flags ,
923but this kernel version does not provide the
924.B O_TMPFILE
925functionality.
926.TP
fea681da
MK
927.B ENOMEM
928Insufficient kernel memory was available.
929.TP
930.B ENOSPC
931.I pathname
932was to be created but the device containing
933.I pathname
934has no room for the new file.
935.TP
936.B ENOTDIR
937A component used as a directory in
938.I pathname
a8d55537 939is not, in fact, a directory, or \fBO_DIRECTORY\fP was specified and
fea681da
MK
940.I pathname
941was not a directory.
942.TP
943.B ENXIO
682edefb 944.BR O_NONBLOCK " | " O_WRONLY
103ea4f6
MK
945is set, the named file is a FIFO, and
946no process has the FIFO open for reading.
fea681da
MK
947Or, the file is a device special file and no corresponding device exists.
948.TP
bbe02b45
MK
949.BR EOPNOTSUPP
950The filesystem containing
951.I pathname
952does not support
953.BR O_TMPFILE .
954.TP
7c7fb552
MK
955.B EOVERFLOW
956.I pathname
957refers to a regular file that is too large to be opened.
958The usual scenario here is that an application compiled
959on a 32-bit platform without
5e4dc269 960.I -D_FILE_OFFSET_BITS=64
7c7fb552 961tried to open a file whose size exceeds
4e1a4d72
MK
962.I (1<<31)-1
963bytes;
7c7fb552
MK
964see also
965.B O_LARGEFILE
966above.
967This is the error specified by POSIX.1-2001;
968in kernels before 2.6.24, Linux gave the error
969.B EFBIG
970for this case.
971.\" See http://bugzilla.kernel.org/show_bug.cgi?id=7253
972.\" "Open of a large file on 32-bit fails with EFBIG, should be EOVERFLOW"
973.\" Reported 2006-10-03
974.TP
1c1e15ed
MK
975.B EPERM
976The
977.B O_NOATIME
978flag was specified, but the effective user ID of the caller
9ee4a2b6 979.\" Strictly speaking, it's the filesystem UID... (MTK)
1c1e15ed
MK
980did not match the owner of the file and the caller was not privileged
981.RB ( CAP_FOWNER ).
982.TP
fea681da
MK
983.B EROFS
984.I pathname
9ee4a2b6 985refers to a file on a read-only filesystem and write access was
fea681da
MK
986requested.
987.TP
988.B ETXTBSY
989.I pathname
990refers to an executable image which is currently being executed and
991write access was requested.
d3952311
MK
992.TP
993.B EWOULDBLOCK
994The
995.B O_NONBLOCK
996flag was specified, and an incompatible lease was held on the file
997(see
998.BR fcntl (2)).
7b8ba76c
MK
999.PP
1000The following additional errors can occur for
1001.BR openat ():
1002.TP
1003.B EBADF
1004.I dirfd
1005is not a valid file descriptor.
1006.TP
1007.B ENOTDIR
1008.I pathname
2feae602 1009is a relative pathname and
7b8ba76c
MK
1010.I dirfd
1011is a file descriptor referring to a file other than a directory.
1012.SH VERSIONS
1013.BR openat ()
1014was added to Linux in kernel 2.6.16;
1015library support was added to glibc in version 2.4.
47297adb 1016.SH CONFORMING TO
7b8ba76c
MK
1017.BR open (),
1018.BR creat ()
72ac7268
MK
1019SVr4, 4.3BSD, POSIX.1-2001, POSIX.1-2008.
1020
7b8ba76c
MK
1021.BR openat ():
1022POSIX.1-2008.
7b8ba76c 1023
fea681da 1024The
72ac7268 1025.BR O_DIRECT ,
1c1e15ed 1026.BR O_NOATIME ,
72ac7268 1027.BR O_PATH ,
fea681da 1028and
72ac7268
MK
1029.BR O_TMPFILE
1030flags are Linux-specific.
1031One must define
61b7c1e1
MK
1032.B _GNU_SOURCE
1033to obtain their definitions.
9f91e36c
MK
1034
1035The
72ac7268
MK
1036.BR O_CLOEXEC ,
1037.BR O_DIRECTORY ,
1038and
1039.BR O_NOFOLLOW
1040flags are not specified in POSIX.1-2001,
1041but are specified in POSIX.1-2008.
1042Since glibc 2.12, one can obtain their definitions by defining either
1043.B _POSIX_C_SOURCE
1044with a value greater than or equal to 200809L or
1045.BR _XOPEN_SOURCE
1046with a value greater than or equal to 700.
1047In glibc 2.11 and earlier, one obtains the definitions by defining
1048.BR _GNU_SOURCE .
9f91e36c 1049
72ac7268
MK
1050As noted in
1051.BR feature_test_macros (7),
84fc2a6e 1052feature test macros such as
72ac7268
MK
1053.BR _POSIX_C_SOURCE ,
1054.BR _XOPEN_SOURCE ,
1055and
fe75ec04 1056.B _GNU_SOURCE
72ac7268 1057must be defined before including
e417acb0 1058.I any
72ac7268 1059header files.
a1d5f77c 1060.SH NOTES
988db661 1061Under Linux, the
a1d5f77c
MK
1062.B O_NONBLOCK
1063flag indicates that one wants to open
1064but does not necessarily have the intention to read or write.
1065This is typically used to open devices in order to get a file descriptor
1066for use with
1067.BR ioctl (2).
c734b9f2 1068
fea681da
MK
1069.LP
1070The (undefined) effect of
1071.B O_RDONLY | O_TRUNC
c13182ef 1072varies among implementations.
bcdd964e 1073On many systems the file is actually truncated.
fea681da
MK
1074.\" Linux 2.0, 2.5: truncate
1075.\" Solaris 5.7, 5.8: truncate
1076.\" Irix 6.5: truncate
1077.\" Tru64 5.1B: truncate
1078.\" HP-UX 11.22: truncate
1079.\" FreeBSD 4.7: truncate
a1d5f77c 1080
5dc8986d
MK
1081Note that
1082.BR open ()
1083can open device special files, but
1084.BR creat ()
1085cannot create them; use
1086.BR mknod (2)
1087instead.
1088
1089If the file is newly created, its
1090.IR st_atime ,
1091.IR st_ctime ,
1092.I st_mtime
1093fields
1094(respectively, time of last access, time of last status change, and
1095time of last modification; see
1096.BR stat (2))
1097are set
1098to the current time, and so are the
1099.I st_ctime
1100and
1101.I st_mtime
1102fields of the
1103parent directory.
1104Otherwise, if the file is modified because of the
1105.B O_TRUNC
1106flag, its st_ctime and st_mtime fields are set to the current time.
1107.\"
1108.\"
d20d9d33
MK
1109.SS Open file descriptions
1110The term open file description is the one used by POSIX to refer to the
1111entries in the system-wide table of open files.
91085d85 1112In other contexts, this object is
d20d9d33
MK
1113variously also called an "open file object",
1114a "file handle", an "open file table entry",
1115or\(emin kernel-developer parlance\(ema
1116.IR "struct file" .
1117
1118When a file descriptor is duplicated (using
1119.BR dup (2)
1120or similar),
1121the duplicate refers to the same open file description
1122as the original file descriptor,
1123and the two file descriptors consequently share
1124the file offset and file status flags.
1125Such sharing can also occur between processes:
1126a child process created via
91085d85 1127.BR fork (2)
d20d9d33
MK
1128inherits duplicates of its parent's file descriptors,
1129and those duplicates refer to the same open file descriptions.
1130
1131Each
1132.BR open (2)
1133of a file creates a new open file description;
1134thus, there may be multiple open file descriptions
1135corresponding to a file inode.
1136.\"
1137.\"
5dc8986d 1138.SS Synchronized I/O
6cf19e62
MK
1139The POSIX.1-2008 "synchronized I/O" option
1140specifies different variants of synchronized I/O,
1141and specifies the
1142.BR open ()
1143flags
015221ef
CH
1144.BR O_SYNC ,
1145.BR O_DSYNC ,
1146and
6cf19e62
MK
1147.BR O_RSYNC
1148for controlling the behavior.
1149Regardless of whether an implementation supports this option,
1150it must at least support the use of
1151.BR O_SYNC
1152for regular files.
1153
89851a00 1154Linux implements
6cf19e62
MK
1155.BR O_SYNC
1156and
1157.BR O_DSYNC ,
1158but not
015221ef 1159.BR O_RSYNC .
6cf19e62
MK
1160(Somewhat incorrectly, glibc defines
1161.BR O_RSYNC
1162to have the same value as
1163.BR O_SYNC .)
1164
1165.BR O_SYNC
1166provides synchronized I/O
1167.I file
1168integrity completion,
1169meaning write operations will flush data and all associated metadata
1170to the underlying hardware.
1171.BR O_DSYNC
1172provides synchronized I/O
1173.I data
1174integrity completion,
1175meaning write operations will flush data
1176to the underlying hardware,
1177but will only flush metadata updates that are required
1178to allow a subsequent read operation to complete successfully.
1179Data integrity completion can reduce the number of disk operations
1180that are required for applications that don't need the guarantees
1181of file integrity completion.
1182
a83923ca 1183To understand the difference between the two types of completion,
6cf19e62
MK
1184consider two pieces of file metadata:
1185the file last modification timestamp
1186.RI ( st_mtime )
1187and the file length.
1188All write operations will update the last file modification timestamp,
1189but only writes that add data to the end of the
1190file will change the file length.
1191The last modification timestamp is not needed to ensure that
1192a read completes successfully, but the file length is.
1193Thus,
1194.BR O_DSYNC
1195would only guarantee to flush updates to the file length metadata
1196(whereas
1197.BR O_SYNC
1198would also always flush the last modification timestamp metadata).
1199
1200Before Linux 2.6.33, Linux implemented only the
1201.BR O_SYNC
89851a00 1202flag for
6cf19e62
MK
1203.BR open ().
1204However, when that flag was specified,
1205most filesystems actually provided the equivalent of synchronized I/O
1206.I data
1207integrity completion (i.e.,
1208.BR O_SYNC
1209was actually implemented as the equivalent of
1210.BR O_DSYNC ).
1211
1212Since Linux 2.6.33, proper
1213.BR O_SYNC
1214support is provided.
1215However, to ensure backward binary compatibility,
1216.BR O_DSYNC
1217was defined with the same value as the historical
015221ef 1218.BR O_SYNC ,
015221ef 1219and
6cf19e62 1220.BR O_SYNC
89851a00 1221was defined as a new (two-bit) flag value that includes the
6cf19e62
MK
1222.BR O_DSYNC
1223flag value.
1224This ensures that applications compiled against
1225new headers get at least
1226.BR O_DSYNC
1227semantics on pre-2.6.33 kernels.
5dc8986d
MK
1228.\"
1229.\"
1230.SS NFS
1231There are many infelicities in the protocol underlying NFS, affecting
1232amongst others
1233.BR O_SYNC " and " O_NDELAY .
a1d5f77c 1234
9ee4a2b6 1235On NFS filesystems with UID mapping enabled,
a1d5f77c
MK
1236.BR open ()
1237may
75b94dc3 1238return a file descriptor but, for example,
a1d5f77c
MK
1239.BR read (2)
1240requests are denied
1241with \fBEACCES\fP.
1242This is because the client performs
1243.BR open ()
1244by checking the
1245permissions, but UID mapping is performed by the server upon
1246read and write requests.
5dc8986d
MK
1247.\"
1248.\"
1249.SS File access mode
1250Unlike the other values that can be specified in
1251.IR flags ,
1252the
1253.I "access mode"
1254values
1255.BR O_RDONLY ", " O_WRONLY ", and " O_RDWR
1256do not specify individual bits.
1257Rather, they define the low order two bits of
1258.IR flags ,
1259and are defined respectively as 0, 1, and 2.
1260In other words, the combination
1261.B "O_RDONLY | O_WRONLY"
1262is a logical error, and certainly does not have the same meaning as
1263.BR O_RDWR .
a1d5f77c 1264
5dc8986d
MK
1265Linux reserves the special, nonstandard access mode 3 (binary 11) in
1266.I flags
1267to mean:
1268check for read and write permission on the file and return a descriptor
1269that can't be used for reading or writing.
1270This nonstandard access mode is used by some Linux drivers to return a
1271descriptor that is to be used only for device-specific
1272.BR ioctl (2)
1273operations.
1274.\" See for example util-linux's disk-utils/setfdprm.c
1275.\" For some background on access mode 3, see
1276.\" http://thread.gmane.org/gmane.linux.kernel/653123
1277.\" "[RFC] correct flags to f_mode conversion in __dentry_open"
1278.\" LKML, 12 Mar 2008
7b8ba76c
MK
1279.\"
1280.\"
80d250b4 1281.SS Rationale for openat() and other "directory file descriptor" APIs
7b8ba76c 1282.BR openat ()
80d250b4
MK
1283and the other system calls and library functions that take
1284a directory file descriptor argument
7b8ba76c
MK
1285(i.e.,
1286.BR faccessat (2),
80d250b4 1287.BR fanotify_mark (2),
7b8ba76c
MK
1288.BR fchmodat (2),
1289.BR fchownat (2),
1290.BR fstatat (2),
1291.BR futimesat (2),
1292.BR linkat (2),
1293.BR mkdirat (2),
1294.BR mknodat (2),
80d250b4 1295.BR name_to_handle_at (2),
7b8ba76c
MK
1296.BR readlinkat (2),
1297.BR renameat (2),
1298.BR symlinkat (2),
1299.BR unlinkat (2),
1300.BR utimensat (2)
80d250b4 1301.BR mkfifoat (3),
7b8ba76c 1302and
80d250b4 1303.BR scandirat (3))
7b8ba76c
MK
1304are supported
1305for two reasons.
92692952 1306Here, the explanation is in terms of the
7b8ba76c 1307.BR openat ()
d26f8a31 1308call, but the rationale is analogous for the other interfaces.
7b8ba76c
MK
1309
1310First,
1311.BR openat ()
1312allows an application to avoid race conditions that could
1313occur when using
cadd38ba 1314.BR open ()
7b8ba76c
MK
1315to open files in directories other than the current working directory.
1316These race conditions result from the fact that some component
1317of the directory prefix given to
cadd38ba 1318.BR open ()
7b8ba76c 1319could be changed in parallel with the call to
cadd38ba 1320.BR open ().
54305f5b
MK
1321Suppose, for example, that we wish to create the file
1322.I path/to/xxx.dep
1323if the file
1324.I path/to/xxx
1325exists.
1326The problem is that between the existence check and the file creation step,
1327.I path
1328or
1329.I to
1330(which might be symbolic links)
1331could be modified to point to a different location.
7b8ba76c
MK
1332Such races can be avoided by
1333opening a file descriptor for the target directory,
1334and then specifying that file descriptor as the
1335.I dirfd
54305f5b
MK
1336argument of (say)
1337.BR fstatat (2)
1338and
7b8ba76c
MK
1339.BR openat ().
1340
1341Second,
1342.BR openat ()
1343allows the implementation of a per-thread "current working
1344directory", via file descriptor(s) maintained by the application.
1345(This functionality can also be obtained by tricks based
1346on the use of
1347.IR /proc/self/fd/ dirfd,
1348but less efficiently.)
1349.\"
1350.\"
ddc4d339
MK
1351.SS O_DIRECT
1352.LP
1353The
1354.B O_DIRECT
1355flag may impose alignment restrictions on the length and address
7fac88a9 1356of user-space buffers and the file offset of I/Os.
ddc4d339 1357In Linux alignment
9ee4a2b6 1358restrictions vary by filesystem and kernel version and might be
ddc4d339 1359absent entirely.
9ee4a2b6 1360However there is currently no filesystem\-independent
ddc4d339 1361interface for an application to discover these restrictions for a given
9ee4a2b6
MK
1362file or filesystem.
1363Some filesystems provide their own interfaces
ddc4d339
MK
1364for doing so, for example the
1365.B XFS_IOC_DIOINFO
1366operation in
1367.BR xfsctl (3).
1368.LP
85c2bdba
MK
1369Under Linux 2.4, transfer sizes, and the alignment of the user buffer
1370and the file offset must all be multiples of the logical block size
9ee4a2b6 1371of the filesystem.
21557928 1372Since Linux 2.6.0, alignment to the logical block size of the
e6042e4a 1373underlying storage (typically 512 bytes) suffices.
21557928 1374The logical block size can be determined using the
e6042e4a
PS
1375.BR ioctl (2)
1376.B BLKSSZGET
21557928
MK
1377operation or from the shell using the command:
1378
1379 blockdev \-\-getss
1847167b
NP
1380.LP
1381.B O_DIRECT
1382I/Os should never be run concurrently with the
04cd7f64 1383.BR fork (2)
1847167b
NP
1384system call,
1385if the memory buffer is a private mapping
1386(i.e., any mapping created with the
02ace852 1387.BR mmap (2)
1847167b 1388.BR MAP_PRIVATE
0ab8aeec 1389flag;
1847167b
NP
1390this includes memory allocated on the heap and statically allocated buffers).
1391Any such I/Os, whether submitted via an asynchronous I/O interface or from
1392another thread in the process,
1393should be completed before
1394.BR fork (2)
1395is called.
1396Failure to do so can result in data corruption and undefined behavior in
1397parent and child processes.
1398This restriction does not apply when the memory buffer for the
1399.B O_DIRECT
1400I/Os was created using
1401.BR shmat (2)
1402or
1403.BR mmap (2)
1404with the
1405.B MAP_SHARED
1406flag.
1407Nor does this restriction apply when the memory buffer has been advised as
1408.B MADV_DONTFORK
0ab8aeec 1409with
02ace852 1410.BR madvise (2),
1847167b
NP
1411ensuring that it will not be available
1412to the child after
1413.BR fork (2).
ddc4d339
MK
1414.LP
1415The
1416.B O_DIRECT
1417flag was introduced in SGI IRIX, where it has alignment
1418restrictions similar to those of Linux 2.4.
1419IRIX has also a
1420.BR fcntl (2)
1421call to query appropriate alignments, and sizes.
1422FreeBSD 4.x introduced
1423a flag of the same name, but without alignment restrictions.
1424.LP
1425.B O_DIRECT
1426support was added under Linux in kernel version 2.4.10.
1427Older Linux kernels simply ignore this flag.
9ee4a2b6 1428Some filesystems may not implement the flag and
ddc4d339
MK
1429.BR open ()
1430will fail with
1431.B EINVAL
1432if it is used.
1433.LP
1434Applications should avoid mixing
1435.B O_DIRECT
1436and normal I/O to the same file,
1437and especially to overlapping byte regions in the same file.
9ee4a2b6 1438Even when the filesystem correctly handles the coherency issues in
ddc4d339
MK
1439this situation, overall I/O throughput is likely to be slower than
1440using either mode alone.
1441Likewise, applications should avoid mixing
1442.BR mmap (2)
1443of files with direct I/O to the same files.
1444.LP
a1fa36af 1445The behavior of
ddc4d339 1446.B O_DIRECT
9ee4a2b6 1447with NFS will differ from local filesystems.
ddc4d339
MK
1448Older kernels, or
1449kernels configured in certain ways, may not support this combination.
1450The NFS protocol does not support passing the flag to the server, so
1451.B O_DIRECT
33a0ccb2 1452I/O will bypass the page cache only on the client; the server may
ddc4d339
MK
1453still cache the I/O.
1454The client asks the server to make the I/O
1455synchronous to preserve the synchronous semantics of
1456.BR O_DIRECT .
1457Some servers will perform poorly under these circumstances, especially
1458if the I/O size is small.
1459Some servers may also be configured to
1460lie to clients about the I/O having reached stable storage; this
1461will avoid the performance penalty at some risk to data integrity
1462in the event of server power failure.
1463The Linux NFS client places no alignment restrictions on
1464.B O_DIRECT
1465I/O.
1466.PP
1467In summary,
1468.B O_DIRECT
1469is a potentially powerful tool that should be used with caution.
1470It is recommended that applications treat use of
1471.B O_DIRECT
1472as a performance option which is disabled by default.
1473.PP
1474.RS
fea681da
MK
1475"The thing that has always disturbed me about O_DIRECT is that the whole
1476interface is just stupid, and was probably designed by a deranged monkey
5503c85e 1477on some serious mind-controlling substances."\(emLinus
ddc4d339
MK
1478.RE
1479.SH BUGS
b50582eb
MK
1480Currently, it is not possible to enable signal-driven
1481I/O by specifying
1482.B O_ASYNC
c13182ef 1483when calling
b50582eb
MK
1484.BR open ();
1485use
1486.BR fcntl (2)
1487to enable this flag.
0e1ad98c 1488.\" FIXME . Check bugzilla report on open(O_ASYNC)
92057f4d 1489.\" See http://bugzilla.kernel.org/show_bug.cgi?id=5993
0d730fcc
MK
1490
1491One must check for two different error codes,
1492.B EISDIR
1493and
1494.BR ENOENT ,
1495when trying to determine whether the kernel supports
0d55b37f 1496.B O_TMPFILE
0d730fcc 1497functionality.
47297adb 1498.SH SEE ALSO
a3bf8022
MK
1499.BR chmod (2),
1500.BR chown (2),
fea681da 1501.BR close (2),
e366dbc4 1502.BR dup (2),
fea681da
MK
1503.BR fcntl (2),
1504.BR link (2),
1f6ceb40 1505.BR lseek (2),
fea681da 1506.BR mknod (2),
e366dbc4 1507.BR mmap (2),
f0c34053 1508.BR mount (2),
fa5d243f 1509.BR open_by_handle_at (2),
fea681da
MK
1510.BR read (2),
1511.BR socket (2),
1512.BR stat (2),
1513.BR umask (2),
1514.BR unlink (2),
1515.BR write (2),
1516.BR fopen (3),
f0c34053 1517.BR fifo (7),
a9cfde1d
MK
1518.BR path_resolution (7),
1519.BR symlink (7)