1 .\" This manpage is Copyright (C) 1992 Drew Eckhardt;
2 .\" and Copyright (C) 1993 Michael Haardt, Ian Jackson.
3 .\" and Copyright (C) 2008 Greg Banks
4 .\" and Copyright (C) 2006, 2008, 2013, 2014 Michael Kerrisk <mtk.manpages@gmail.com>
6 .\" SPDX-License-Identifier: Linux-man-pages-copyleft
8 .\" Modified 1993-07-21 by Rik Faith <faith@cs.unc.edu>
9 .\" Modified 1994-08-21 by Michael Haardt
10 .\" Modified 1996-04-13 by Andries Brouwer <aeb@cwi.nl>
11 .\" Modified 1996-05-13 by Thomas Koenig
12 .\" Modified 1996-12-20 by Michael Haardt
13 .\" Modified 1999-02-19 by Andries Brouwer <aeb@cwi.nl>
14 .\" Modified 1998-11-28 by Joseph S. Myers <jsm28@hermes.cam.ac.uk>
15 .\" Modified 1999-06-03 by Michael Haardt
16 .\" Modified 2002-05-07 by Michael Kerrisk <mtk.manpages@gmail.com>
17 .\" Modified 2004-06-23 by Michael Kerrisk <mtk.manpages@gmail.com>
18 .\" 2004-12-08, mtk, reordered flags list alphabetically
19 .\" 2004-12-08, Martin Pool <mbp@sourcefrog.net> (& mtk), added O_NOATIME
20 .\" 2007-09-18, mtk, Added description of O_CLOEXEC + other minor edits
21 .\" 2008-01-03, mtk, with input from Trond Myklebust
22 .\" <trond.myklebust@fys.uio.no> and Timo Sirainen <tss@iki.fi>
23 .\" Rewrite description of O_EXCL.
24 .\" 2008-01-11, Greg Banks <gnb@melbourne.sgi.com>: add more detail
26 .\" 2008-02-26, Michael Haardt: Reorganized text for O_CREAT and mode
28 .\" FIXME . Apr 08: The next POSIX revision has O_EXEC, O_SEARCH, and
29 .\" O_TTYINIT. Eventually these may need to be documented. --mtk
31 .TH open 2 (date) "Linux man-pages (unreleased)"
33 open, openat, creat \- open and possibly create a file
36 .RI ( libc ", " \-lc )
41 .BI "int open(const char *" pathname ", int " flags );
42 .BI "int open(const char *" pathname ", int " flags ", mode_t " mode );
44 .BI "int creat(const char *" pathname ", mode_t " mode );
46 .BI "int openat(int " dirfd ", const char *" pathname ", int " flags );
47 .BI "int openat(int " dirfd ", const char *" pathname ", int " flags \
50 /* Documented separately, in \fBopenat2\fP(2): */
51 .BI "int openat2(int " dirfd ", const char *" pathname ,
52 .BI " const struct open_how *" how ", size_t " size ");"
56 Feature Test Macro Requirements for glibc (see
57 .BR feature_test_macros (7)):
63 _POSIX_C_SOURCE >= 200809L
70 system call opens the file specified by
72 If the specified file does not exist,
82 is a file descriptor, a small, nonnegative integer that is an index
83 to an entry in the process's table of open file descriptors.
84 The file descriptor is used
85 in subsequent system calls
86 .RB ( read "(2), " write "(2), " lseek "(2), " fcntl (2),
87 etc.) to refer to the open file.
88 The file descriptor returned by a successful call will be
89 the lowest-numbered file descriptor not currently open for the process.
91 By default, the new file descriptor is set to remain open across an
95 file descriptor flag described in
97 is initially disabled); the
99 flag, described below, can be used to change this default.
100 The file offset is set to the beginning of the file (see
106 .IR "open file description" ,
107 an entry in the system-wide table of open files.
108 The open file description records the file offset and the file status flags
110 A file descriptor is a reference to an open file description;
111 this reference is unaffected if
113 is subsequently removed or modified to refer to a different file.
114 For further details on open file descriptions, see NOTES.
118 must include one of the following
120 .BR O_RDONLY ", " O_WRONLY ", or " O_RDWR .
121 These request opening the file read-only, write-only, or read/write,
124 In addition, zero or more file creation flags and file status flags
130 .I file creation flags
143 are all of the remaining flags listed below.
144 .\" SUSv4 divides the flags into:
148 .\" * Other (O_CLOEXEC, O_DIRECTORY, O_NOFOLLOW)
149 .\" though it's not clear what the difference between "other" and
150 .\" "File creation" flags is. I raised an Aardvark to see if this
151 .\" can be clarified in SUSv4; 10 Oct 2008.
152 .\" http://thread.gmane.org/gmane.comp.standards.posix.austin.general/64/focus=67
153 .\" TC1 (balloted in 2013), resolved this, so that those three constants
154 .\" are also categorized" as file status flags.
156 The distinction between these two groups of flags is that
157 the file creation flags affect the semantics of the open operation itself,
158 while the file status flags affect the semantics of subsequent I/O operations.
159 The file status flags can be retrieved and (in some cases)
164 The full list of file creation flags and file status flags is as follows:
167 The file is opened in append mode.
170 the file offset is positioned at the end of the file,
173 The modification of the file offset and the write operation
174 are performed as a single atomic step.
177 may lead to corrupted files on NFS filesystems if more than one process
178 appends data to a file at once.
179 .\" For more background, see
180 .\" http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=453946
181 .\" http://nfs.sourceforge.net/
182 This is because NFS does not support
183 appending to a file, so the client kernel has to simulate it, which
184 can't be done without a race condition.
187 Enable signal-driven I/O:
190 by default, but this can be changed via
192 when input or output becomes possible on this file descriptor.
193 This feature is available only for terminals, pseudoterminals,
194 sockets, and (since Linux 2.6) pipes and FIFOs.
198 See also BUGS, below.
200 .BR O_CLOEXEC " (since Linux 2.6.23)"
201 .\" NOTE! several other man pages refer to this text
202 Enable the close-on-exec flag for the new file descriptor.
203 .\" FIXME . for later review when Issue 8 is one day released...
204 .\" POSIX proposes to fix many APIs that provide hidden FDs
205 .\" http://austingroupbugs.net/tag_view_page.php?tag_id=8
206 .\" http://austingroupbugs.net/view.php?id=368
207 Specifying this flag permits a program to avoid additional
210 operations to set the
214 Note that the use of this flag is essential in some multithreaded programs,
215 because using a separate
220 flag does not suffice to avoid race conditions
221 where one thread opens a file descriptor and
222 attempts to set its close-on-exec flag using
224 at the same time as another thread does a
228 Depending on the order of execution,
229 the race may lead to the file descriptor returned by
231 being unintentionally leaked to the program executed by the child process
234 (This kind of race is in principle possible for any system call
235 that creates a file descriptor whose close-on-exec flag should be set,
236 and various other Linux system calls provide an equivalent of the
238 flag to deal with this problem.)
239 .\" This flag fixes only one form of the race condition;
240 .\" The race can also occur with, for example, file descriptors
241 .\" returned by accept(), pipe(), etc.
246 does not exist, create it as a regular file.
248 The owner (user ID) of the new file is set to the effective user ID
251 The group ownership (group ID) of the new file is set either to
252 the effective group ID of the process (System V semantics)
253 or to the group ID of the parent directory (BSD semantics).
254 On Linux, the behavior depends on whether the
255 set-group-ID mode bit is set on the parent directory:
256 if that bit is set, then BSD semantics apply;
257 otherwise, System V semantics apply.
258 For some filesystems, the behavior also depends on the
262 mount options described in
264 .\" As at Linux 2.6.25, bsdgroups is supported by ext2, ext3, ext4, and
265 .\" XFS (since Linux 2.6.14).
269 argument specifies the file mode bits to be applied when a new file is created.
278 is ignored (and can thus be specified as 0, or simply omitted).
289 if it is not supplied,
290 some arbitrary bytes from the stack will be applied as the file mode.
292 The effective mode is modified by the process's
294 in the usual way: in the absence of a default ACL, the mode of the
296 .IR "(mode\ &\ \[ti]umask)" .
300 applies only to future accesses of the
301 newly created file; the
303 call that creates a read-only file may well return a read/write
306 The following symbolic constants are provided for
311 00700 user (file owner) has read, write, and execute permission
314 00400 user has read permission
317 00200 user has write permission
320 00100 user has execute permission
323 00070 group has read, write, and execute permission
326 00040 group has read permission
329 00020 group has write permission
332 00010 group has execute permission
335 00007 others have read, write, and execute permission
338 00004 others have read permission
341 00002 others have write permission
344 00001 others have execute permission
347 According to POSIX, the effect when other bits are set in
350 On Linux, the following bits are also honored in
355 0004000 set-user-ID bit
358 0002000 set-group-ID bit (see
362 0001000 sticky bit (see
366 .BR O_DIRECT " (since Linux 2.4.10)"
367 Try to minimize cache effects of the I/O to and from this file.
368 In general this will degrade performance, but it is useful in
369 special situations, such as when applications do their own caching.
370 File I/O is done directly to/from user-space buffers.
373 flag on its own makes an effort to transfer data synchronously,
374 but does not give the guarantees of the
376 flag that data and necessary metadata are transferred.
377 To guarantee synchronous I/O,
379 must be used in addition to
381 See NOTES below for further discussion.
383 A semantically similar (but deprecated) interface for block devices
388 If \fIpathname\fP is not a directory, cause the open to fail.
389 .\" But see the following and its replies:
390 .\" http://marc.theaimsgroup.com/?t=112748702800001&r=1&w=2
391 .\" [PATCH] open: O_DIRECTORY and O_CREAT together should fail
392 .\" O_DIRECTORY | O_CREAT causes O_DIRECTORY to be ignored.
393 This flag was added in Linux 2.1.126, to
394 avoid denial-of-service problems if
400 Write operations on the file will complete according to the requirements of
403 integrity completion.
408 return, the output data
409 has been transferred to the underlying hardware,
410 along with any file metadata that would be required to retrieve that data
411 (i.e., as though each
413 was followed by a call to
415 .IR "See NOTES below" .
418 Ensure that this call creates the file:
419 if this flag is specified in conjunction with
428 When these two flags are specified, symbolic links are not followed:
429 .\" POSIX.1-2001 explicitly requires this behavior.
432 is a symbolic link, then
434 fails regardless of where the symbolic link points.
436 In general, the behavior of
438 is undefined if it is used without
440 There is one exception: on Linux 2.6 and later,
446 refers to a block device.
447 If the block device is in use by the system (e.g., mounted),
454 is supported only when using NFSv3 or later on kernel 2.6 or later.
455 In NFS environments where
457 support is not provided, programs that rely on it
458 for performing locking tasks will contain a race condition.
459 Portable programs that want to perform atomic file locking using a lockfile,
460 and need to avoid reliance on NFS support for
462 can create a unique file on
463 the same filesystem (e.g., incorporating hostname and PID), and use
465 to make a link to the lockfile.
468 returns 0, the lock is successful.
471 on the unique file to check if its link count has increased to 2,
472 in which case the lock is also successful.
476 Allow files whose sizes cannot be represented in an
478 (but can be represented in an
482 .B _LARGEFILE64_SOURCE
483 macro must be defined
487 in order to obtain this definition.
490 feature test macro to 64 (rather than using
493 method of accessing large files on 32-bit systems (see
494 .BR feature_test_macros (7)).
496 .BR O_NOATIME " (since Linux 2.6.8)"
497 Do not update the file last access time
503 This flag can be employed only if one of the following conditions is true:
506 The effective UID of the process
507 .\" Strictly speaking: the filesystem UID
508 matches the owner UID of the file.
510 The calling process has the
512 capability in its user namespace and
513 the owner UID of the file has a mapping in the namespace.
516 This flag is intended for use by indexing or backup programs,
517 where its use can significantly reduce the amount of disk activity.
518 This flag may not be effective on all filesystems.
519 One example is NFS, where the server maintains the access time.
520 .\" The O_NOATIME flag also affects the treatment of st_atime
521 .\" by mmap() and readdir(2), MTK, Dec 04.
526 refers to a terminal device\[em]see
528 will not become the process's controlling terminal even if the
529 process does not have one.
532 If the trailing component (i.e., basename) of
534 is a symbolic link, then the open fails, with the error
536 Symbolic links in earlier components of the pathname will still be
540 error that can occur in this case is indistinguishable from the case where
541 an open fails because there are too many symbolic links found
542 while resolving components in the prefix part of the pathname.)
544 This flag is a FreeBSD extension, which was added in Linux 2.1.126,
545 and has subsequently been standardized in POSIX.1-2008.
550 .\" The headers from glibc 2.0.100 and later include a
551 .\" definition of this flag; \fIkernels before Linux 2.1.126 will ignore it if
554 .BR O_NONBLOCK " or " O_NDELAY
555 When possible, the file is opened in nonblocking mode.
558 nor any subsequent I/O operations on the file descriptor which is
559 returned will cause the calling process to wait.
561 Note that the setting of this flag has no effect on the operation of
566 since those interfaces merely inform the caller about whether
567 a file descriptor is "ready",
568 meaning that an I/O operation performed on
569 the file descriptor with the
575 Note that this flag has no effect for regular files and block devices;
576 that is, I/O operations will (briefly) block when device activity
577 is required, regardless of whether
582 semantics might eventually be implemented,
583 applications should not depend upon blocking behavior
584 when specifying this flag for regular files and block devices.
586 For the handling of FIFOs (named pipes), see also
588 For a discussion of the effect of
590 in conjunction with mandatory file locks and with file leases, see
593 .BR O_PATH " (since Linux 2.6.39)"
594 .\" commit 1abf0c718f15a56a0a435588d1b104c7a37dc9bd
595 .\" commit 326be7b484843988afe57566b627fb7a70beac56
596 .\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d
598 .\" http://thread.gmane.org/gmane.linux.man/2790/focus=3496
599 .\" Subject: Re: [PATCH] open(2): document O_PATH
600 .\" Newsgroups: gmane.linux.man, gmane.linux.kernel
602 Obtain a file descriptor that can be used for two purposes:
603 to indicate a location in the filesystem tree and
604 to perform operations that act purely at the file descriptor level.
605 The file itself is not opened, and other file operations (e.g.,
616 The following operations
618 be performed on the resulting file descriptor:
624 if the file descriptor refers to a directory
626 .\" commit 332a2e1244bd08b9e3ecd378028513396a004a24
631 .\" fstat(): commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2
634 .\" fstatfs(): commit 9d05746e7b16d8565dddbe3200faa1e669d23bbf
636 Duplicating the file descriptor
642 Getting and setting file descriptor flags
648 Retrieving open file status flags using the
651 operation: the returned flags will include the bit
654 Passing the file descriptor as the
658 and the other "*at()" system calls.
664 .BR AT_SYMLINK_FOLLOW )
665 even if the file is not a directory.
667 Passing the file descriptor to another process via a UNIX domain socket
685 Opening a file or directory with the
687 flag requires no permissions on the object itself
688 (but does require execute permission on the directories in the path prefix).
689 Depending on the subsequent operation,
690 a check for suitable file permissions may be performed (e.g.,
692 requires execute permission on the directory referred to
693 by its file descriptor argument).
695 obtaining a reference to a filesystem object by opening it with the
697 flag requires that the caller have read permission on the object,
698 even when the subsequent operation (e.g.,
701 does not require read permission on the object.
705 is a symbolic link and the
707 flag is also specified,
708 then the call returns a file descriptor referring to the symbolic link.
709 This file descriptor can be used as the
717 with an empty pathname to have the calls operate on the symbolic link.
721 refers to an automount point that has not yet been triggered, so no
722 other filesystem is mounted on it, then the call returns a file
723 descriptor referring to the automount directory without triggering a mount.
725 can then be used to determine if it is, in fact, an untriggered
727 .RB ( ".f_type == AUTOFS_SUPER_MAGIC" ).
731 for regular files is to provide the equivalent of POSIX.1's
734 This permits us to open a file for which we have execute
735 permission but not read permission, and then execute that file,
736 with steps something like the following:
741 fd = open("some_prog", O_PATH);
742 snprintf(buf, PATH_MAX, "/proc/self/fd/%d", fd);
743 execl(buf, "some_prog", (char *) NULL);
749 file descriptor can also be passed as the argument of
753 Write operations on the file will complete according to the requirements of
757 (by contrast with the
767 returns, the output data and associated file metadata
768 have been transferred to the underlying hardware
769 (i.e., as though each
771 was followed by a call to
773 .IR "See NOTES below" .
775 .BR O_TMPFILE " (since Linux 3.11)"
776 .\" commit 60545d0d4610b02e55f65d141c95b18ccf855b6e
777 .\" commit f4e0c30c191f87851c4a53454abb55ee276f4a7e
778 .\" commit bb458c644a59dbba3a1fe59b27106c5e68e1c4bd
779 Create an unnamed temporary regular file.
782 argument specifies a directory;
783 an unnamed inode will be created in that directory's filesystem.
784 Anything written to the resulting file will be lost when
785 the last file descriptor is closed, unless the file is given a name.
788 must be specified with one of
796 is not specified, then
798 can be used to link the temporary file into the filesystem, making it
799 permanent, using code like the following:
804 fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
807 /* File I/O on \[aq]fd\[aq]... */
809 linkat(fd, "", AT_FDCWD, "/path/for/file", AT_EMPTY_PATH);
811 /* If the caller doesn\[aq]t have the CAP_DAC_READ_SEARCH
812 capability (needed to use AT_EMPTY_PATH with linkat(2)),
813 and there is a proc(5) filesystem mounted, then the
814 linkat(2) call above can be replaced with:
816 snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
817 linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
827 argument determines the file permission mode, as with
834 prevents a temporary file from being linked into the filesystem
836 (Note that the meaning of
838 in this case is different from the meaning of
842 There are two main use cases for
843 .\" Inspired by http://lwn.net/Articles/559147/
849 functionality: race-free creation of temporary files that
850 (1) are automatically deleted when closed;
851 (2) can never be reached via any pathname;
852 (3) are not subject to symlink attacks; and
853 (4) do not require the caller to devise unique names.
855 Creating a file that is initially invisible, which is then populated
856 with data and adjusted to have appropriate filesystem attributes
861 before being atomically linked into the filesystem
862 in a fully formed state (using
868 requires support by the underlying filesystem;
869 only a subset of Linux filesystems provide that support.
870 In the initial implementation, support was provided in
871 the ext2, ext3, ext4, UDF, Minix, and tmpfs filesystems.
872 .\" To check for support, grep for "tmpfile" in kernel sources
873 Support for other filesystems has subsequently been added as follows:
875 .\" commit 99b6436bc29e4f10e4388c27a3e4810191cc4788
876 .\" commit ab29743117f9f4c22ac44c13c1647fb24fb2bafe
878 .\" commit ef3b9af50bfa6a1f02cd7b3f5124b712b1ba3e3c
880 .\" commit 50732df02eefb39ab414ef655979c2c9b64ad21c
881 and ubifs (Linux 4.9)
884 If the file already exists and is a regular file and the access mode allows
889 it will be truncated to length 0.
890 If the file is a FIFO or terminal device file, the
893 Otherwise, the effect of
899 is equivalent to calling
904 .BR O_CREAT|O_WRONLY|O_TRUNC .
908 system call operates in exactly the same way as
910 except for the differences described here.
914 argument is used in conjunction with the
918 If the pathname given in
924 If the pathname given in
932 is interpreted relative to the current working
933 directory of the calling process (like
936 If the pathname given in
938 is relative, then it is interpreted relative to the directory
939 referred to by the file descriptor
941 (rather than relative to the current working directory of
942 the calling process, as is done by
944 for a relative pathname).
947 must be a directory that was opened for reading
953 If the pathname given in
957 is not a valid file descriptor, an error
960 (Specifying an invalid file descriptor number in
962 can be used as a means to ensure that
969 system call is an extension of
971 and provides a superset of the features of
973 It is documented separately, in
981 return the new file descriptor (a nonnegative integer).
982 On error, \-1 is returned and
984 is set to indicate the error.
990 can fail with the following errors:
993 The requested access to the file is not allowed, or search permission
994 is denied for one of the directories in the path prefix of
996 or the file did not exist yet and write access to the parent directory
999 .BR path_resolution (7).)
1002 .\" commit 30aba6656f61ed44cba445a3c0d38b296fa9e8f5
1008 .I protected_regular
1009 sysctl is enabled, the file already exists and is a FIFO or regular file, the
1010 owner of the file is neither the current user nor the owner of the
1011 containing directory, and the containing directory is both world- or
1012 group-writable and sticky.
1013 For details, see the descriptions of
1014 .I /proc/sys/fs/protected_fifos
1016 .I /proc/sys/fs/protected_regular
1027 nor a valid file descriptor.
1035 refers to a block device that is in use by the system (e.g., it is mounted).
1040 is specified, the file does not exist, and the user's quota of disk
1041 blocks or inodes on the filesystem has been exhausted.
1046 .BR O_CREAT " and " O_EXCL
1051 points outside your accessible address space.
1058 While blocked waiting to complete an open of a slow device
1061 the call was interrupted by a signal handler; see
1065 The filesystem does not support the
1070 for more information.
1074 .\" In particular, __O_TMPFILE instead of O_TMPFILE
1091 and the final component ("basename") of the new file's
1094 (e.g., it contains characters not permitted by the underlying filesystem).
1097 The final component ("basename") of
1100 (e.g., it contains characters not permitted by the underlying filesystem).
1104 refers to a directory and the access requested involved writing
1113 refers to an existing directory,
1121 but this kernel version does not provide the
1126 Too many symbolic links were encountered in resolving
1131 was a symbolic link, and
1139 The per-process limit on the number of open file descriptors has been reached
1140 (see the description of
1150 The system-wide limit on the total number of open files has been reached.
1154 refers to a device special file and no corresponding device exists.
1155 (This is a Linux kernel bug; in this situation
1161 is not set and the named file does not exist.
1164 A directory component in
1166 does not exist or is a dangling symbolic link.
1170 refers to a nonexistent directory,
1178 but this kernel version does not provide the
1183 The named file is a FIFO,
1184 but memory for the FIFO buffer can't be allocated because
1185 the per-user hard limit on memory allocation for pipes has been reached
1186 and the caller is not privileged; see
1190 Insufficient kernel memory was available.
1194 was to be created but the device containing
1196 has no room for the new file.
1199 A component used as a directory in
1201 is not, in fact, a directory, or \fBO_DIRECTORY\fP was specified and
1203 was not a directory.
1208 is a relative pathname and
1210 is a file descriptor referring to a file other than a directory.
1213 .BR O_NONBLOCK " | " O_WRONLY
1214 is set, the named file is a FIFO, and
1215 no process has the FIFO open for reading.
1218 The file is a device special file and no corresponding device exists.
1221 The file is a UNIX domain socket.
1224 The filesystem containing
1231 refers to a regular file that is too large to be opened.
1232 The usual scenario here is that an application compiled
1233 on a 32-bit platform without
1234 .I \-D_FILE_OFFSET_BITS=64
1235 tried to open a file whose size exceeds
1241 This is the error specified by POSIX.1;
1242 before Linux 2.6.24, Linux gave the error
1245 .\" See http://bugzilla.kernel.org/show_bug.cgi?id=7253
1246 .\" "Open of a large file on 32-bit fails with EFBIG, should be EOVERFLOW"
1247 .\" Reported 2006-10-03
1252 flag was specified, but the effective user ID of the caller
1253 .\" Strictly speaking, it's the filesystem UID... (MTK)
1254 did not match the owner of the file and the caller was not privileged.
1257 The operation was prevented by a file seal; see
1262 refers to a file on a read-only filesystem and write access was
1267 refers to an executable image which is currently being executed and
1268 write access was requested.
1272 refers to a file that is currently in use as a swap file, and the
1278 refers to a file that is currently being read by the kernel (e.g., for
1279 module/firmware loading), and write access was requested.
1284 flag was specified, and an incompatible lease was held on the file
1288 The (undefined) effect of
1289 .B O_RDONLY | O_TRUNC
1290 varies among implementations.
1291 On many systems the file is actually truncated.
1292 .\" Linux 2.0, 2.5: truncate
1293 .\" Solaris 5.7, 5.8: truncate
1294 .\" Irix 6.5: truncate
1295 .\" Tru64 5.1B: truncate
1296 .\" HP-UX 11.22: truncate
1297 .\" FreeBSD 4.7: truncate
1298 .SS Synchronized I/O
1299 The POSIX.1-2008 "synchronized I/O" option
1300 specifies different variants of synchronized I/O,
1308 for controlling the behavior.
1309 Regardless of whether an implementation supports this option,
1310 it must at least support the use of
1320 Somewhat incorrectly, glibc defines
1322 to have the same value as
1325 is defined in the Linux header file
1327 on HP PA-RISC, but it is not used.)
1330 provides synchronized I/O
1332 integrity completion,
1333 meaning write operations will flush data and all associated metadata
1334 to the underlying hardware.
1336 provides synchronized I/O
1338 integrity completion,
1339 meaning write operations will flush data
1340 to the underlying hardware,
1341 but will only flush metadata updates that are required
1342 to allow a subsequent read operation to complete successfully.
1343 Data integrity completion can reduce the number of disk operations
1344 that are required for applications that don't need the guarantees
1345 of file integrity completion.
1347 To understand the difference between the two types of completion,
1348 consider two pieces of file metadata:
1349 the file last modification timestamp
1351 and the file length.
1352 All write operations will update the last file modification timestamp,
1353 but only writes that add data to the end of the
1354 file will change the file length.
1355 The last modification timestamp is not needed to ensure that
1356 a read completes successfully, but the file length is.
1359 would only guarantee to flush updates to the file length metadata
1362 would also always flush the last modification timestamp metadata).
1364 Before Linux 2.6.33, Linux implemented only the
1368 However, when that flag was specified,
1369 most filesystems actually provided the equivalent of synchronized I/O
1371 integrity completion (i.e.,
1373 was actually implemented as the equivalent of
1376 Since Linux 2.6.33, proper
1378 support is provided.
1379 However, to ensure backward binary compatibility,
1381 was defined with the same value as the historical
1385 was defined as a new (two-bit) flag value that includes the
1388 This ensures that applications compiled against
1389 new headers get at least
1391 semantics before Linux 2.6.33.
1393 .SS C library/kernel differences
1395 the glibc wrapper function for
1399 system call, rather than the kernel's
1402 For certain architectures, this is also true before glibc 2.26.
1422 flags are Linux-specific.
1425 to obtain their definitions.
1432 flags are not specified in POSIX.1-2001,
1433 but are specified in POSIX.1-2008.
1434 Since glibc 2.12, one can obtain their definitions by defining either
1436 with a value greater than or equal to 200809L or
1438 with a value greater than or equal to 700.
1439 In glibc 2.11 and earlier, one obtains the definitions by defining
1446 SVr4, 4.3BSD, POSIX.1-2001.
1455 flag is sometimes used in cases where one wants to open
1456 but does not necessarily have the intention to read or write.
1458 this may be used to open a device in order to get a file descriptor
1464 can open device special files, but
1466 cannot create them; use
1470 If the file is newly created, its
1475 (respectively, time of last access, time of last status change, and
1476 time of last modification; see
1479 to the current time, and so are the
1485 Otherwise, if the file is modified because of the
1491 fields are set to the current time.
1495 directory show the open file descriptors of the process with the PID
1498 .IR /proc/ pid /fdinfo
1499 directory show even more information about these file descriptors.
1502 for further details of both of these directories.
1504 The Linux header file
1510 synonym is defined instead.
1513 .SS Open file descriptions
1514 The term open file description is the one used by POSIX to refer to the
1515 entries in the system-wide table of open files.
1516 In other contexts, this object is
1517 variously also called an "open file object",
1518 a "file handle", an "open file table entry",
1519 or\[em]in kernel-developer parlance\[em]a
1522 When a file descriptor is duplicated (using
1525 the duplicate refers to the same open file description
1526 as the original file descriptor,
1527 and the two file descriptors consequently share
1528 the file offset and file status flags.
1529 Such sharing can also occur between processes:
1530 a child process created via
1532 inherits duplicates of its parent's file descriptors,
1533 and those duplicates refer to the same open file descriptions.
1537 of a file creates a new open file description;
1538 thus, there may be multiple open file descriptions
1539 corresponding to a file inode.
1541 On Linux, one can use the
1544 operation to test whether two file descriptors
1545 (in the same process or in two different processes)
1546 refer to the same open file description.
1549 There are many infelicities in the protocol underlying NFS, affecting
1551 .BR O_SYNC " and " O_NDELAY .
1553 On NFS filesystems with UID mapping enabled,
1556 return a file descriptor but, for example,
1561 This is because the client performs
1564 permissions, but UID mapping is performed by the server upon
1565 read and write requests.
1569 Opening the read or write end of a FIFO blocks until the other
1570 end is also opened (by another process or thread).
1573 for further details.
1576 .SS File access mode
1577 Unlike the other values that can be specified in
1582 .BR O_RDONLY ", " O_WRONLY ", and " O_RDWR
1583 do not specify individual bits.
1584 Rather, they define the low order two bits of
1586 and are defined respectively as 0, 1, and 2.
1587 In other words, the combination
1588 .B "O_RDONLY | O_WRONLY"
1589 is a logical error, and certainly does not have the same meaning as
1592 Linux reserves the special, nonstandard access mode 3 (binary 11) in
1595 check for read and write permission on the file and return a file descriptor
1596 that can't be used for reading or writing.
1597 This nonstandard access mode is used by some Linux drivers to return a
1598 file descriptor that is to be used only for device-specific
1601 .\" See for example util-linux's disk-utils/setfdprm.c
1602 .\" For some background on access mode 3, see
1603 .\" http://thread.gmane.org/gmane.linux.kernel/653123
1604 .\" "[RFC] correct flags to f_mode conversion in __dentry_open"
1605 .\" LKML, 12 Mar 2008
1608 .SS Rationale for openat() and other "directory file descriptor" APIs
1610 and the other system calls and library functions that take
1611 a directory file descriptor argument
1615 .BR fanotify_mark (2),
1624 .BR mount_setattr (2),
1626 .BR name_to_handle_at (2),
1639 address two problems with the older interfaces that preceded them.
1640 Here, the explanation is in terms of the
1642 call, but the rationale is analogous for the other interfaces.
1646 allows an application to avoid race conditions that could
1649 to open files in directories other than the current working directory.
1650 These race conditions result from the fact that some component
1651 of the directory prefix given to
1653 could be changed in parallel with the call to
1655 Suppose, for example, that we wish to create the file
1656 .I dir1/dir2/xxx.dep
1660 The problem is that between the existence check and the file-creation step,
1664 (which might be symbolic links)
1665 could be modified to point to a different location.
1666 Such races can be avoided by
1667 opening a file descriptor for the target directory,
1668 and then specifying that file descriptor as the
1676 file descriptor also has other benefits:
1678 the file descriptor is a stable reference to the directory,
1679 even if the directory is renamed; and
1681 the open file descriptor prevents the underlying filesystem from
1683 just as when a process has a current working directory on a filesystem.
1687 allows the implementation of a per-thread "current working
1688 directory", via file descriptor(s) maintained by the application.
1689 (This functionality can also be obtained by tricks based
1691 .IR /proc/self/fd/ dirfd,
1692 but less efficiently.)
1696 argument for these APIs can be obtained by using
1700 to open a directory (with either the
1705 Alternatively, such a file descriptor can be obtained by applying
1707 to a directory stream created using
1710 When these APIs are given a
1714 or the specified pathname is absolute,
1715 then they handle their pathname argument in the same way as
1716 the corresponding conventional APIs.
1717 However, in this case, several of the APIs have a
1719 argument that provides access to functionality that is not available with
1720 the corresponding conventional APIs.
1726 flag may impose alignment restrictions on the length and address
1727 of user-space buffers and the file offset of I/Os.
1729 restrictions vary by filesystem and kernel version and might be
1731 The handling of misaligned
1734 they can either fail with
1736 or fall back to buffered I/O.
1740 support and alignment restrictions for a file can be queried using
1747 varies by filesystem;
1751 Some filesystems provide their own interfaces for querying
1753 alignment restrictions,
1759 should be used instead when it is available.
1761 If none of the above is available,
1762 then direct I/O support and alignment restrictions
1763 can only be assumed from known characteristics of the filesystem,
1764 the individual file,
1765 the underlying storage device(s),
1766 and the kernel version.
1768 most filesystems based on block devices require that
1769 the file offset and the length and memory address of all I/O segments
1770 be multiples of the filesystem block size
1771 (typically 4096 bytes).
1773 this was relaxed to the logical block size of the block device
1774 (typically 512 bytes).
1775 A block device's logical block size can be determined using the
1778 operation or from the shell using the command:
1787 I/Os should never be run concurrently with the
1790 if the memory buffer is a private mapping
1791 (i.e., any mapping created with the
1795 this includes memory allocated on the heap and statically allocated buffers).
1796 Any such I/Os, whether submitted via an asynchronous I/O interface or from
1797 another thread in the process,
1798 should be completed before
1801 Failure to do so can result in data corruption and undefined behavior in
1802 parent and child processes.
1803 This restriction does not apply when the memory buffer for the
1805 I/Os was created using
1812 Nor does this restriction apply when the memory buffer has been advised as
1816 ensuring that it will not be available
1822 flag was introduced in SGI IRIX, where it has alignment
1823 restrictions similar to those of Linux 2.4.
1826 call to query appropriate alignments, and sizes.
1827 FreeBSD 4.x introduced
1828 a flag of the same name, but without alignment restrictions.
1831 support was added in Linux 2.4.10.
1832 Older Linux kernels simply ignore this flag.
1833 Some filesystems may not implement the flag, in which case
1835 fails with the error
1839 Applications should avoid mixing
1841 and normal I/O to the same file,
1842 and especially to overlapping byte regions in the same file.
1843 Even when the filesystem correctly handles the coherency issues in
1844 this situation, overall I/O throughput is likely to be slower than
1845 using either mode alone.
1846 Likewise, applications should avoid mixing
1848 of files with direct I/O to the same files.
1852 with NFS will differ from local filesystems.
1854 kernels configured in certain ways, may not support this combination.
1855 The NFS protocol does not support passing the flag to the server, so
1857 I/O will bypass the page cache only on the client; the server may
1858 still cache the I/O.
1859 The client asks the server to make the I/O
1860 synchronous to preserve the synchronous semantics of
1862 Some servers will perform poorly under these circumstances, especially
1863 if the I/O size is small.
1864 Some servers may also be configured to
1865 lie to clients about the I/O having reached stable storage; this
1866 will avoid the performance penalty at some risk to data integrity
1867 in the event of server power failure.
1868 The Linux NFS client places no alignment restrictions on
1874 is a potentially powerful tool that should be used with caution.
1875 It is recommended that applications treat use of
1877 as a performance option which is disabled by default.
1879 Currently, it is not possible to enable signal-driven
1886 to enable this flag.
1887 .\" FIXME . Check bugzilla report on open(O_ASYNC)
1888 .\" See http://bugzilla.kernel.org/show_bug.cgi?id=5993
1890 One must check for two different error codes,
1894 when trying to determine whether the kernel supports
1904 and the file specified by
1908 will create a regular file (i.e.,
1922 .BR open_by_handle_at (2),
1934 .BR path_resolution (7),