]> git.ipfire.org Git - thirdparty/glibc.git/blame - manual/llio.texi
Update.
[thirdparty/glibc.git] / manual / llio.texi
CommitLineData
28f540f4 1@node Low-Level I/O, File System Interface, I/O on Streams, Top
7a68c94a 2@c %MENU% Low-level, less portable I/O
28f540f4
RM
3@chapter Low-Level Input/Output
4
5This chapter describes functions for performing low-level input/output
6operations on file descriptors. These functions include the primitives
7for the higher-level I/O functions described in @ref{I/O on Streams}, as
8well as functions for performing low-level control operations for which
9there are no equivalents on streams.
10
11Stream-level I/O is more flexible and usually more convenient;
12therefore, programmers generally use the descriptor-level functions only
13when necessary. These are some of the usual reasons:
14
15@itemize @bullet
16@item
17For reading binary files in large chunks.
18
19@item
20For reading an entire file into core before parsing it.
21
22@item
23To perform operations other than data transfer, which can only be done
24with a descriptor. (You can use @code{fileno} to get the descriptor
25corresponding to a stream.)
26
27@item
28To pass descriptors to a child process. (The child can create its own
29stream to use a descriptor that it inherits, but cannot inherit a stream
30directly.)
31@end itemize
32
33@menu
34* Opening and Closing Files:: How to open and close file
2c6fe0bd 35 descriptors.
dfd2257a 36* Truncating Files:: Change the size of a file.
28f540f4
RM
37* I/O Primitives:: Reading and writing data.
38* File Position Primitive:: Setting a descriptor's file
2c6fe0bd 39 position.
28f540f4
RM
40* Descriptors and Streams:: Converting descriptor to stream
41 or vice-versa.
42* Stream/Descriptor Precautions:: Precautions needed if you use both
43 descriptors and streams.
07435eb4
UD
44* Scatter-Gather:: Fast I/O to discontinous buffers.
45* Memory-mapped I/O:: Using files like memory.
28f540f4
RM
46* Waiting for I/O:: How to check for input or output
47 on multiple file descriptors.
dfd2257a 48* Synchronizing I/O:: Making sure all I/O actions completed.
b07d03e0 49* Asynchronous I/O:: Perform I/O in parallel.
28f540f4
RM
50* Control Operations:: Various other operations on file
51 descriptors.
52* Duplicating Descriptors:: Fcntl commands for duplicating
53 file descriptors.
54* Descriptor Flags:: Fcntl commands for manipulating
55 flags associated with file
2c6fe0bd 56 descriptors.
28f540f4
RM
57* File Status Flags:: Fcntl commands for manipulating
58 flags associated with open files.
59* File Locks:: Fcntl commands for implementing
60 file locking.
61* Interrupt Input:: Getting an asynchronous signal when
62 input arrives.
07435eb4 63* IOCTLs:: Generic I/O Control operations.
28f540f4
RM
64@end menu
65
66
67@node Opening and Closing Files
68@section Opening and Closing Files
69
70@cindex opening a file descriptor
71@cindex closing a file descriptor
72This section describes the primitives for opening and closing files
73using file descriptors. The @code{open} and @code{creat} functions are
74declared in the header file @file{fcntl.h}, while @code{close} is
75declared in @file{unistd.h}.
76@pindex unistd.h
77@pindex fcntl.h
78
79@comment fcntl.h
80@comment POSIX.1
81@deftypefun int open (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}])
82The @code{open} function creates and returns a new file descriptor
83for the file named by @var{filename}. Initially, the file position
84indicator for the file is at the beginning of the file. The argument
85@var{mode} is used only when a file is created, but it doesn't hurt
86to supply the argument in any case.
87
88The @var{flags} argument controls how the file is to be opened. This is
89a bit mask; you create the value by the bitwise OR of the appropriate
90parameters (using the @samp{|} operator in C).
91@xref{File Status Flags}, for the parameters available.
92
93The normal return value from @code{open} is a non-negative integer file
07435eb4 94descriptor. In the case of an error, a value of @math{-1} is returned
28f540f4
RM
95instead. In addition to the usual file name errors (@pxref{File
96Name Errors}), the following @code{errno} error conditions are defined
97for this function:
98
99@table @code
100@item EACCES
101The file exists but is not readable/writable as requested by the @var{flags}
102argument, the file does not exist and the directory is unwritable so
103it cannot be created.
104
105@item EEXIST
106Both @code{O_CREAT} and @code{O_EXCL} are set, and the named file already
107exists.
108
109@item EINTR
110The @code{open} operation was interrupted by a signal.
111@xref{Interrupted Primitives}.
112
113@item EISDIR
114The @var{flags} argument specified write access, and the file is a directory.
115
116@item EMFILE
117The process has too many files open.
118The maximum number of file descriptors is controlled by the
119@code{RLIMIT_NOFILE} resource limit; @pxref{Limits on Resources}.
120
121@item ENFILE
122The entire system, or perhaps the file system which contains the
123directory, cannot support any additional open files at the moment.
124(This problem cannot happen on the GNU system.)
125
126@item ENOENT
127The named file does not exist, and @code{O_CREAT} is not specified.
128
129@item ENOSPC
130The directory or file system that would contain the new file cannot be
131extended, because there is no disk space left.
132
133@item ENXIO
134@code{O_NONBLOCK} and @code{O_WRONLY} are both set in the @var{flags}
135argument, the file named by @var{filename} is a FIFO (@pxref{Pipes and
136FIFOs}), and no process has the file open for reading.
137
138@item EROFS
139The file resides on a read-only file system and any of @w{@code{O_WRONLY}},
140@code{O_RDWR}, and @code{O_TRUNC} are set in the @var{flags} argument,
141or @code{O_CREAT} is set and the file does not already exist.
142@end table
143
144@c !!! umask
145
b07d03e0
UD
146If on a 32 bits machine the sources are translated with
147@code{_FILE_OFFSET_BITS == 64} the function @code{open} returns a file
148descriptor opened in the large file mode which enables the file handling
fed8f7f7 149functions to use files up to @math{2^63} bytes in size and offset from
b07d03e0
UD
150@math{-2^63} to @math{2^63}. This happens transparently for the user
151since all of the lowlevel file handling functions are equally replaced.
152
dfd2257a
UD
153This function is a cancelation point in multi-threaded programs. This
154is a problem if the thread allocates some resources (like memory, file
155descriptors, semaphores or whatever) at the time @code{open} is
156called. If the thread gets canceled these resources stay allocated
157until the program ends. To avoid this calls to @code{open} should be
158protected using cancelation handlers.
159@c ref pthread_cleanup_push / pthread_cleanup_pop
160
28f540f4
RM
161The @code{open} function is the underlying primitive for the @code{fopen}
162and @code{freopen} functions, that create streams.
163@end deftypefun
164
b07d03e0 165@comment fcntl.h
a3a4a74e 166@comment Unix98
b07d03e0
UD
167@deftypefun int open64 (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}])
168This function is similar to @code{open}. It returns a file descriptor
169which can be used to access the file named by @var{filename}. The only
170the difference is that on 32 bits systems the file is opened in the
171large file mode. I.e., file length and file offsets can exceed 31 bits.
172
b07d03e0
UD
173When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this
174function is actually available under the name @code{open}. I.e., the
175new, extended API using 64 bit file sizes and offsets transparently
176replaces the old API.
177@end deftypefun
178
28f540f4
RM
179@comment fcntl.h
180@comment POSIX.1
181@deftypefn {Obsolete function} int creat (const char *@var{filename}, mode_t @var{mode})
182This function is obsolete. The call:
183
184@smallexample
185creat (@var{filename}, @var{mode})
186@end smallexample
187
188@noindent
189is equivalent to:
190
191@smallexample
192open (@var{filename}, O_WRONLY | O_CREAT | O_TRUNC, @var{mode})
193@end smallexample
b07d03e0
UD
194
195If on a 32 bits machine the sources are translated with
196@code{_FILE_OFFSET_BITS == 64} the function @code{creat} returns a file
197descriptor opened in the large file mode which enables the file handling
198functions to use files up to @math{2^63} in size and offset from
199@math{-2^63} to @math{2^63}. This happens transparently for the user
200since all of the lowlevel file handling functions are equally replaced.
201@end deftypefn
202
203@comment fcntl.h
a3a4a74e 204@comment Unix98
b07d03e0
UD
205@deftypefn {Obsolete function} int creat64 (const char *@var{filename}, mode_t @var{mode})
206This function is similar to @code{creat}. It returns a file descriptor
207which can be used to access the file named by @var{filename}. The only
208the difference is that on 32 bits systems the file is opened in the
209large file mode. I.e., file length and file offsets can exceed 31 bits.
210
211To use this file descriptor one must not use the normal operations but
212instead the counterparts named @code{*64}, e.g., @code{read64}.
213
214When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this
215function is actually available under the name @code{open}. I.e., the
216new, extended API using 64 bit file sizes and offsets transparently
217replaces the old API.
28f540f4
RM
218@end deftypefn
219
220@comment unistd.h
221@comment POSIX.1
222@deftypefun int close (int @var{filedes})
223The function @code{close} closes the file descriptor @var{filedes}.
224Closing a file has the following consequences:
225
226@itemize @bullet
2c6fe0bd 227@item
28f540f4
RM
228The file descriptor is deallocated.
229
230@item
231Any record locks owned by the process on the file are unlocked.
232
233@item
234When all file descriptors associated with a pipe or FIFO have been closed,
235any unread data is discarded.
236@end itemize
237
dfd2257a
UD
238This function is a cancelation point in multi-threaded programs. This
239is a problem if the thread allocates some resources (like memory, file
240descriptors, semaphores or whatever) at the time @code{close} is
241called. If the thread gets canceled these resources stay allocated
242until the program ends. To avoid this calls to @code{close} should be
243protected using cancelation handlers.
244@c ref pthread_cleanup_push / pthread_cleanup_pop
245
07435eb4 246The normal return value from @code{close} is @math{0}; a value of @math{-1}
28f540f4
RM
247is returned in case of failure. The following @code{errno} error
248conditions are defined for this function:
249
250@table @code
251@item EBADF
252The @var{filedes} argument is not a valid file descriptor.
253
254@item EINTR
255The @code{close} call was interrupted by a signal.
256@xref{Interrupted Primitives}.
257Here is an example of how to handle @code{EINTR} properly:
258
259@smallexample
260TEMP_FAILURE_RETRY (close (desc));
261@end smallexample
262
263@item ENOSPC
264@itemx EIO
265@itemx EDQUOT
2c6fe0bd 266When the file is accessed by NFS, these errors from @code{write} can sometimes
28f540f4
RM
267not be detected until @code{close}. @xref{I/O Primitives}, for details
268on their meaning.
269@end table
b07d03e0
UD
270
271Please note that there is @emph{no} separate @code{close64} function.
272This is not necessary since this function does not determine nor depend
fed8f7f7 273on the mode of the file. The kernel which performs the @code{close}
b07d03e0
UD
274operation knows for which mode the descriptor is used and can handle
275this situation.
28f540f4
RM
276@end deftypefun
277
278To close a stream, call @code{fclose} (@pxref{Closing Streams}) instead
279of trying to close its underlying file descriptor with @code{close}.
280This flushes any buffered output and updates the stream object to
281indicate that it is closed.
282
dfd2257a
UD
283
284@node Truncating Files
285@section Change the size of a file
286
287In some situations it is useful to explicitly determine the size of a
288file. Since the 4.2BSD days there is a function to truncate a file to
289at most a given number of bytes and POSIX defines one additional
290function. The prototypes for these functions are in @file{unistd.h}.
291
292@comment unistd.h
293@comment X/Open
b07d03e0 294@deftypefun int truncate (const char *@var{name}, off_t @var{length})
dfd2257a
UD
295The @code{truncation} function truncates the file named by @var{name} to
296at most @var{length} bytes. I.e., if the file was larger before the
297extra bytes are stripped of. If the file was small or equal to
298@var{length} in size before nothing is done. The file must be writable
299by the user to perform this operation.
300
b07d03e0
UD
301When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the
302@code{truncate} function is in fact @code{truncate64} and the type
303@code{off_t} has 64 bits which makes it possible to handle files up to
c756c71c 304@math{2^63} bytes in length.
b07d03e0 305
f2ea0f5b 306The return value is zero is everything went ok. Otherwise the return
dfd2257a
UD
307value is @math{-1} and the global variable @var{errno} is set to:
308@table @code
309@item EACCES
310The file is not accessible to the user.
311@item EINVAL
312The @var{length} value is illegal.
313@item EISDIR
314The object named by @var{name} is a directory.
315@item ENOENT
316The file named by @var{name} does not exist.
317@item ENOTDIR
318One part of the @var{name} is not a directory.
319@end table
320
321This function was introduced in 4.2BSD but also was available in later
322@w{System V} systems. It is not added to POSIX since the authors felt
323it is only of marginally additional utility. See below.
324@end deftypefun
325
b07d03e0 326@comment unistd.h
a3a4a74e 327@comment Unix98
b07d03e0
UD
328@deftypefun int truncate64 (const char *@var{name}, off64_t @var{length})
329This function is similar to the @code{truncate} function. The
c756c71c
UD
330difference is that the @var{length} argument is 64 bits wide even on 32
331bits machines which allows to handle file with a size up to @math{2^63}
b07d03e0
UD
332bytes.
333
c756c71c 334When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a
b07d03e0
UD
33532 bits machine this function is actually available under the name
336@code{truncate} and so transparently replaces the 32 bits interface.
337@end deftypefun
338
dfd2257a
UD
339@comment unistd.h
340@comment POSIX
b07d03e0 341@deftypefun int ftruncate (int @var{fd}, off_t @var{length})
dfd2257a
UD
342The @code{ftruncate} function is similar to the @code{truncate}
343function. The main difference is that it takes a descriptor for an
344opened file instead of a file name to identify the object. The file
345must be opened for writing to successfully carry out the operation.
346
347The POSIX standard leaves it implementation defined what happens if the
348specified new @var{length} of the file is bigger than the original size.
349The @code{ftruncate} function might simply leave the file alone and do
350nothing or it can increase the size to the desired size. In this later
351case the extended area should be zero-filled. So using @code{ftruncate}
352is no reliable way to increase the file size but if it is possible it is
353probably the fastest way. The function also operates on POSIX shared
354memory segments if these are implemented by the system.
355
b07d03e0
UD
356When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the
357@code{ftruncate} function is in fact @code{ftruncate64} and the type
358@code{off_t} has 64 bits which makes it possible to handle files up to
c756c71c 359@math{2^63} bytes in length.
b07d03e0 360
dfd2257a
UD
361On success the function returns zero. Otherwise it returns @math{-1}
362and set @var{errno} to one of these values:
363@table @code
364@item EBADF
365@var{fd} is no valid file descriptor or is not opened for writing.
366@item EINVAL
367The object referred to by @var{fd} does not permit this operation.
368@item EROFS
369The file is on a read-only file system.
370@end table
371@end deftypefun
372
b07d03e0 373@comment unistd.h
a3a4a74e 374@comment Unix98
b07d03e0
UD
375@deftypefun int ftruncate64 (int @var{id}, off64_t @var{length})
376This function is similar to the @code{ftruncate} function. The
c756c71c
UD
377difference is that the @var{length} argument is 64 bits wide even on 32
378bits machines which allows to handle file with a size up to @math{2^63}
b07d03e0
UD
379bytes.
380
c756c71c 381When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a
b07d03e0
UD
38232 bits machine this function is actually available under the name
383@code{ftruncate} and so transparently replaces the 32 bits interface.
384@end deftypefun
385
28f540f4
RM
386@node I/O Primitives
387@section Input and Output Primitives
388
389This section describes the functions for performing primitive input and
390output operations on file descriptors: @code{read}, @code{write}, and
391@code{lseek}. These functions are declared in the header file
392@file{unistd.h}.
393@pindex unistd.h
394
395@comment unistd.h
396@comment POSIX.1
397@deftp {Data Type} ssize_t
398This data type is used to represent the sizes of blocks that can be
399read or written in a single operation. It is similar to @code{size_t},
400but must be a signed type.
401@end deftp
402
403@cindex reading from a file descriptor
404@comment unistd.h
405@comment POSIX.1
406@deftypefun ssize_t read (int @var{filedes}, void *@var{buffer}, size_t @var{size})
407The @code{read} function reads up to @var{size} bytes from the file
408with descriptor @var{filedes}, storing the results in the @var{buffer}.
409(This is not necessarily a character string and there is no terminating
410null character added.)
411
412@cindex end-of-file, on a file descriptor
413The return value is the number of bytes actually read. This might be
414less than @var{size}; for example, if there aren't that many bytes left
415in the file or if there aren't that many bytes immediately available.
416The exact behavior depends on what kind of file it is. Note that
417reading less than @var{size} bytes is not an error.
418
419A value of zero indicates end-of-file (except if the value of the
420@var{size} argument is also zero). This is not considered an error.
421If you keep calling @code{read} while at end-of-file, it will keep
422returning zero and doing nothing else.
423
424If @code{read} returns at least one character, there is no way you can
425tell whether end-of-file was reached. But if you did reach the end, the
426next read will return zero.
427
07435eb4 428In case of an error, @code{read} returns @math{-1}. The following
28f540f4
RM
429@code{errno} error conditions are defined for this function:
430
431@table @code
432@item EAGAIN
433Normally, when no input is immediately available, @code{read} waits for
434some input. But if the @code{O_NONBLOCK} flag is set for the file
435(@pxref{File Status Flags}), @code{read} returns immediately without
436reading any data, and reports this error.
437
438@strong{Compatibility Note:} Most versions of BSD Unix use a different
439error code for this: @code{EWOULDBLOCK}. In the GNU library,
440@code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter
441which name you use.
442
443On some systems, reading a large amount of data from a character special
444file can also fail with @code{EAGAIN} if the kernel cannot find enough
445physical memory to lock down the user's pages. This is limited to
446devices that transfer with direct memory access into the user's memory,
447which means it does not include terminals, since they always use
448separate buffers inside the kernel. This problem never happens in the
449GNU system.
450
451Any condition that could result in @code{EAGAIN} can instead result in a
452successful @code{read} which returns fewer bytes than requested.
453Calling @code{read} again immediately would result in @code{EAGAIN}.
454
455@item EBADF
456The @var{filedes} argument is not a valid file descriptor,
457or is not open for reading.
458
459@item EINTR
460@code{read} was interrupted by a signal while it was waiting for input.
461@xref{Interrupted Primitives}. A signal will not necessary cause
462@code{read} to return @code{EINTR}; it may instead result in a
463successful @code{read} which returns fewer bytes than requested.
464
465@item EIO
466For many devices, and for disk files, this error code indicates
467a hardware error.
468
469@code{EIO} also occurs when a background process tries to read from the
470controlling terminal, and the normal action of stopping the process by
471sending it a @code{SIGTTIN} signal isn't working. This might happen if
472signal is being blocked or ignored, or because the process group is
473orphaned. @xref{Job Control}, for more information about job control,
474and @ref{Signal Handling}, for information about signals.
475@end table
476
b07d03e0
UD
477Please note that there is no function named @code{read64}. This is not
478necessary since this function does not directly modify or handle the
479possibly wide file offset. Since the kernel handles this state
480internally the @code{read} function can be used for all cases.
481
dfd2257a
UD
482This function is a cancelation point in multi-threaded programs. This
483is a problem if the thread allocates some resources (like memory, file
484descriptors, semaphores or whatever) at the time @code{read} is
485called. If the thread gets canceled these resources stay allocated
486until the program ends. To avoid this calls to @code{read} should be
487protected using cancelation handlers.
488@c ref pthread_cleanup_push / pthread_cleanup_pop
489
28f540f4
RM
490The @code{read} function is the underlying primitive for all of the
491functions that read from streams, such as @code{fgetc}.
492@end deftypefun
493
a5a0310d
UD
494@comment unistd.h
495@comment Unix98
496@deftypefun ssize_t pread (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off_t @var{offset})
497The @code{pread} function is similar to the @code{read} function. The
498first three arguments are identical and also the return values and error
499codes correspond.
500
501The difference is the fourth argument and its handling. The data block
502is not read from the current position of the file descriptor
503@code{filedes}. Instead the data is read from the file starting at
504position @var{offset}. The position of the file descriptor itself is
505not effected by the operation. The value is the same as before the call.
506
b07d03e0
UD
507When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the
508@code{pread} function is in fact @code{pread64} and the type
509@code{off_t} has 64 bits which makes it possible to handle files up to
c756c71c 510@math{2^63} bytes in length.
b07d03e0 511
a5a0310d
UD
512The return value of @code{pread} describes the number of bytes read.
513In the error case it returns @math{-1} like @code{read} does and the
514error codes are also the same. Only there are a few more error codes:
515@table @code
516@item EINVAL
517The value given for @var{offset} is negative and therefore illegal.
518
519@item ESPIPE
520The file descriptor @var{filedes} is associate with a pipe or a FIFO and
521this device does not allow positioning of the file pointer.
522@end table
523
524The function is an extension defined in the Unix Single Specification
525version 2.
526@end deftypefun
527
b07d03e0 528@comment unistd.h
a3a4a74e 529@comment Unix98
b07d03e0
UD
530@deftypefun ssize_t pread64 (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off64_t @var{offset})
531This function is similar to the @code{pread} function. The difference
532is that the @var{offset} parameter is of type @code{off64_t} instead of
533@code{off_t} which makes it possible on 32 bits machines to address
c756c71c 534files larger than @math{2^31} bytes and up to @math{2^63} bytes. The
b07d03e0
UD
535file descriptor @code{filedes} must be opened using @code{open64} since
536otherwise the large offsets possible with @code{off64_t} will lead to
537errors with a descriptor in small file mode.
538
c756c71c 539When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a
b07d03e0
UD
54032 bits machine this function is actually available under the name
541@code{pread} and so transparently replaces the 32 bits interface.
542@end deftypefun
543
28f540f4
RM
544@cindex writing to a file descriptor
545@comment unistd.h
546@comment POSIX.1
547@deftypefun ssize_t write (int @var{filedes}, const void *@var{buffer}, size_t @var{size})
548The @code{write} function writes up to @var{size} bytes from
549@var{buffer} to the file with descriptor @var{filedes}. The data in
550@var{buffer} is not necessarily a character string and a null character is
551output like any other character.
552
553The return value is the number of bytes actually written. This may be
554@var{size}, but can always be smaller. Your program should always call
555@code{write} in a loop, iterating until all the data is written.
556
557Once @code{write} returns, the data is enqueued to be written and can be
558read back right away, but it is not necessarily written out to permanent
559storage immediately. You can use @code{fsync} when you need to be sure
560your data has been permanently stored before continuing. (It is more
561efficient for the system to batch up consecutive writes and do them all
562at once when convenient. Normally they will always be written to disk
a5a0310d
UD
563within a minute or less.) Modern systems provide another function
564@code{fdatasync} which guarantees integrity only for the file data and
565is therefore faster.
566@c !!! xref fsync, fdatasync
2c6fe0bd 567You can use the @code{O_FSYNC} open mode to make @code{write} always
28f540f4
RM
568store the data to disk before returning; @pxref{Operating Modes}.
569
07435eb4 570In the case of an error, @code{write} returns @math{-1}. The following
28f540f4
RM
571@code{errno} error conditions are defined for this function:
572
573@table @code
574@item EAGAIN
575Normally, @code{write} blocks until the write operation is complete.
576But if the @code{O_NONBLOCK} flag is set for the file (@pxref{Control
577Operations}), it returns immediately without writing any data, and
578reports this error. An example of a situation that might cause the
579process to block on output is writing to a terminal device that supports
580flow control, where output has been suspended by receipt of a STOP
581character.
582
583@strong{Compatibility Note:} Most versions of BSD Unix use a different
584error code for this: @code{EWOULDBLOCK}. In the GNU library,
585@code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter
586which name you use.
587
588On some systems, writing a large amount of data from a character special
589file can also fail with @code{EAGAIN} if the kernel cannot find enough
590physical memory to lock down the user's pages. This is limited to
591devices that transfer with direct memory access into the user's memory,
592which means it does not include terminals, since they always use
593separate buffers inside the kernel. This problem does not arise in the
594GNU system.
595
596@item EBADF
597The @var{filedes} argument is not a valid file descriptor,
598or is not open for writing.
599
600@item EFBIG
601The size of the file would become larger than the implementation can support.
602
603@item EINTR
604The @code{write} operation was interrupted by a signal while it was
605blocked waiting for completion. A signal will not necessary cause
606@code{write} to return @code{EINTR}; it may instead result in a
607successful @code{write} which writes fewer bytes than requested.
608@xref{Interrupted Primitives}.
609
610@item EIO
611For many devices, and for disk files, this error code indicates
612a hardware error.
613
614@item ENOSPC
615The device containing the file is full.
616
617@item EPIPE
618This error is returned when you try to write to a pipe or FIFO that
619isn't open for reading by any process. When this happens, a @code{SIGPIPE}
620signal is also sent to the process; see @ref{Signal Handling}.
621@end table
622
623Unless you have arranged to prevent @code{EINTR} failures, you should
624check @code{errno} after each failing call to @code{write}, and if the
625error was @code{EINTR}, you should simply repeat the call.
626@xref{Interrupted Primitives}. The easy way to do this is with the
627macro @code{TEMP_FAILURE_RETRY}, as follows:
628
629@smallexample
630nbytes = TEMP_FAILURE_RETRY (write (desc, buffer, count));
631@end smallexample
632
b07d03e0
UD
633Please note that there is no function named @code{write64}. This is not
634necessary since this function does not directly modify or handle the
635possibly wide file offset. Since the kernel handles this state
636internally the @code{write} function can be used for all cases.
637
dfd2257a
UD
638This function is a cancelation point in multi-threaded programs. This
639is a problem if the thread allocates some resources (like memory, file
640descriptors, semaphores or whatever) at the time @code{write} is
641called. If the thread gets canceled these resources stay allocated
642until the program ends. To avoid this calls to @code{write} should be
643protected using cancelation handlers.
644@c ref pthread_cleanup_push / pthread_cleanup_pop
645
28f540f4
RM
646The @code{write} function is the underlying primitive for all of the
647functions that write to streams, such as @code{fputc}.
648@end deftypefun
649
a5a0310d
UD
650@comment unistd.h
651@comment Unix98
652@deftypefun ssize_t pwrite (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off_t @var{offset})
653The @code{pwrite} function is similar to the @code{write} function. The
654first three arguments are identical and also the return values and error
655codes correspond.
656
657The difference is the fourth argument and its handling. The data block
658is not written to the current position of the file descriptor
659@code{filedes}. Instead the data is written to the file starting at
660position @var{offset}. The position of the file descriptor itself is
661not effected by the operation. The value is the same as before the call.
662
b07d03e0
UD
663When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the
664@code{pwrite} function is in fact @code{pwrite64} and the type
665@code{off_t} has 64 bits which makes it possible to handle files up to
c756c71c 666@math{2^63} bytes in length.
b07d03e0 667
a5a0310d
UD
668The return value of @code{pwrite} describes the number of written bytes.
669In the error case it returns @math{-1} like @code{write} does and the
670error codes are also the same. Only there are a few more error codes:
671@table @code
672@item EINVAL
673The value given for @var{offset} is negative and therefore illegal.
674
675@item ESPIPE
676The file descriptor @var{filedes} is associate with a pipe or a FIFO and
677this device does not allow positioning of the file pointer.
678@end table
679
680The function is an extension defined in the Unix Single Specification
681version 2.
682@end deftypefun
683
b07d03e0 684@comment unistd.h
a3a4a74e 685@comment Unix98
b07d03e0
UD
686@deftypefun ssize_t pwrite64 (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off64_t @var{offset})
687This function is similar to the @code{pwrite} function. The difference
688is that the @var{offset} parameter is of type @code{off64_t} instead of
689@code{off_t} which makes it possible on 32 bits machines to address
c756c71c 690files larger than @math{2^31} bytes and up to @math{2^63} bytes. The
b07d03e0
UD
691file descriptor @code{filedes} must be opened using @code{open64} since
692otherwise the large offsets possible with @code{off64_t} will lead to
693errors with a descriptor in small file mode.
694
c756c71c 695When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a
b07d03e0
UD
69632 bits machine this function is actually available under the name
697@code{pwrite} and so transparently replaces the 32 bits interface.
698@end deftypefun
699
a5a0310d 700
28f540f4
RM
701@node File Position Primitive
702@section Setting the File Position of a Descriptor
703
704Just as you can set the file position of a stream with @code{fseek}, you
705can set the file position of a descriptor with @code{lseek}. This
706specifies the position in the file for the next @code{read} or
707@code{write} operation. @xref{File Positioning}, for more information
708on the file position and what it means.
709
710To read the current file position value from a descriptor, use
711@code{lseek (@var{desc}, 0, SEEK_CUR)}.
712
713@cindex file positioning on a file descriptor
714@cindex positioning a file descriptor
715@cindex seeking on a file descriptor
716@comment unistd.h
717@comment POSIX.1
718@deftypefun off_t lseek (int @var{filedes}, off_t @var{offset}, int @var{whence})
719The @code{lseek} function is used to change the file position of the
720file with descriptor @var{filedes}.
721
722The @var{whence} argument specifies how the @var{offset} should be
723interpreted in the same way as for the @code{fseek} function, and must be
724one of the symbolic constants @code{SEEK_SET}, @code{SEEK_CUR}, or
725@code{SEEK_END}.
726
727@table @code
728@item SEEK_SET
729Specifies that @var{whence} is a count of characters from the beginning
730of the file.
731
732@item SEEK_CUR
733Specifies that @var{whence} is a count of characters from the current
734file position. This count may be positive or negative.
735
736@item SEEK_END
737Specifies that @var{whence} is a count of characters from the end of
738the file. A negative count specifies a position within the current
739extent of the file; a positive count specifies a position past the
2c6fe0bd 740current end. If you set the position past the current end, and
28f540f4
RM
741actually write data, you will extend the file with zeros up to that
742position.@end table
743
744The return value from @code{lseek} is normally the resulting file
745position, measured in bytes from the beginning of the file.
746You can use this feature together with @code{SEEK_CUR} to read the
747current file position.
748
749If you want to append to the file, setting the file position to the
750current end of file with @code{SEEK_END} is not sufficient. Another
751process may write more data after you seek but before you write,
752extending the file so the position you write onto clobbers their data.
753Instead, use the @code{O_APPEND} operating mode; @pxref{Operating Modes}.
754
755You can set the file position past the current end of the file. This
756does not by itself make the file longer; @code{lseek} never changes the
757file. But subsequent output at that position will extend the file.
758Characters between the previous end of file and the new position are
759filled with zeros. Extending the file in this way can create a
760``hole'': the blocks of zeros are not actually allocated on disk, so the
761file takes up less space than it appears so; it is then called a
762``sparse file''.
763@cindex sparse files
764@cindex holes in files
765
766If the file position cannot be changed, or the operation is in some way
07435eb4 767invalid, @code{lseek} returns a value of @math{-1}. The following
28f540f4
RM
768@code{errno} error conditions are defined for this function:
769
770@table @code
771@item EBADF
772The @var{filedes} is not a valid file descriptor.
773
774@item EINVAL
775The @var{whence} argument value is not valid, or the resulting
776file offset is not valid. A file offset is invalid.
777
778@item ESPIPE
779The @var{filedes} corresponds to an object that cannot be positioned,
780such as a pipe, FIFO or terminal device. (POSIX.1 specifies this error
781only for pipes and FIFOs, but in the GNU system, you always get
782@code{ESPIPE} if the object is not seekable.)
783@end table
784
b07d03e0
UD
785When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the
786@code{lseek} function is in fact @code{lseek64} and the type
787@code{off_t} has 64 bits which makes it possible to handle files up to
c756c71c 788@math{2^63} bytes in length.
b07d03e0 789
dfd2257a
UD
790This function is a cancelation point in multi-threaded programs. This
791is a problem if the thread allocates some resources (like memory, file
792descriptors, semaphores or whatever) at the time @code{lseek} is
793called. If the thread gets canceled these resources stay allocated
794until the program ends. To avoid this calls to @code{lseek} should be
795protected using cancelation handlers.
796@c ref pthread_cleanup_push / pthread_cleanup_pop
797
28f540f4 798The @code{lseek} function is the underlying primitive for the
dfd2257a
UD
799@code{fseek}, @code{fseeko}, @code{ftell}, @code{ftello} and
800@code{rewind} functions, which operate on streams instead of file
801descriptors.
28f540f4
RM
802@end deftypefun
803
b07d03e0 804@comment unistd.h
a3a4a74e 805@comment Unix98
b07d03e0
UD
806@deftypefun off64_t lseek64 (int @var{filedes}, off64_t @var{offset}, int @var{whence})
807This function is similar to the @code{lseek} function. The difference
808is that the @var{offset} parameter is of type @code{off64_t} instead of
809@code{off_t} which makes it possible on 32 bits machines to address
c756c71c 810files larger than @math{2^31} bytes and up to @math{2^63} bytes. The
b07d03e0
UD
811file descriptor @code{filedes} must be opened using @code{open64} since
812otherwise the large offsets possible with @code{off64_t} will lead to
813errors with a descriptor in small file mode.
814
c756c71c 815When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a
b07d03e0
UD
81632 bits machine this function is actually available under the name
817@code{lseek} and so transparently replaces the 32 bits interface.
818@end deftypefun
819
28f540f4 820You can have multiple descriptors for the same file if you open the file
2c6fe0bd 821more than once, or if you duplicate a descriptor with @code{dup}.
28f540f4
RM
822Descriptors that come from separate calls to @code{open} have independent
823file positions; using @code{lseek} on one descriptor has no effect on the
2c6fe0bd 824other. For example,
28f540f4
RM
825
826@smallexample
827@group
828@{
829 int d1, d2;
830 char buf[4];
831 d1 = open ("foo", O_RDONLY);
832 d2 = open ("foo", O_RDONLY);
833 lseek (d1, 1024, SEEK_SET);
834 read (d2, buf, 4);
835@}
836@end group
837@end smallexample
838
839@noindent
840will read the first four characters of the file @file{foo}. (The
841error-checking code necessary for a real program has been omitted here
842for brevity.)
843
844By contrast, descriptors made by duplication share a common file
845position with the original descriptor that was duplicated. Anything
846which alters the file position of one of the duplicates, including
847reading or writing data, affects all of them alike. Thus, for example,
848
849@smallexample
850@{
851 int d1, d2, d3;
852 char buf1[4], buf2[4];
853 d1 = open ("foo", O_RDONLY);
854 d2 = dup (d1);
855 d3 = dup (d2);
856 lseek (d3, 1024, SEEK_SET);
857 read (d1, buf1, 4);
858 read (d2, buf2, 4);
859@}
860@end smallexample
861
862@noindent
863will read four characters starting with the 1024'th character of
864@file{foo}, and then four more characters starting with the 1028'th
865character.
866
867@comment sys/types.h
868@comment POSIX.1
869@deftp {Data Type} off_t
870This is an arithmetic data type used to represent file sizes.
871In the GNU system, this is equivalent to @code{fpos_t} or @code{long int}.
a3a4a74e
UD
872
873If the source is compiled with @code{_FILE_OFFSET_BITS == 64} this type
874is transparently replaced by @code{off64_t}.
28f540f4
RM
875@end deftp
876
b07d03e0 877@comment sys/types.h
a3a4a74e 878@comment Unix98
b07d03e0
UD
879@deftp {Data Type} off64_t
880This type is used similar to @code{off_t}. The difference is that even
c756c71c 881on 32 bits machines, where the @code{off_t} type would have 32 bits,
b07d03e0
UD
882@code{off64_t} has 64 bits and so is able to address files up to
883@math{2^63} bytes in length.
a3a4a74e
UD
884
885When compiling with @code{_FILE_OFFSET_BITS == 64} this type is
886available under the name @code{off_t}.
b07d03e0
UD
887@end deftp
888
28f540f4
RM
889These aliases for the @samp{SEEK_@dots{}} constants exist for the sake
890of compatibility with older BSD systems. They are defined in two
891different header files: @file{fcntl.h} and @file{sys/file.h}.
892
893@table @code
894@item L_SET
895An alias for @code{SEEK_SET}.
896
897@item L_INCR
898An alias for @code{SEEK_CUR}.
899
900@item L_XTND
901An alias for @code{SEEK_END}.
902@end table
903
904@node Descriptors and Streams
905@section Descriptors and Streams
906@cindex streams, and file descriptors
907@cindex converting file descriptor to stream
908@cindex extracting file descriptor from stream
909
910Given an open file descriptor, you can create a stream for it with the
911@code{fdopen} function. You can get the underlying file descriptor for
912an existing stream with the @code{fileno} function. These functions are
913declared in the header file @file{stdio.h}.
914@pindex stdio.h
915
916@comment stdio.h
917@comment POSIX.1
918@deftypefun {FILE *} fdopen (int @var{filedes}, const char *@var{opentype})
919The @code{fdopen} function returns a new stream for the file descriptor
920@var{filedes}.
921
922The @var{opentype} argument is interpreted in the same way as for the
923@code{fopen} function (@pxref{Opening Streams}), except that
924the @samp{b} option is not permitted; this is because GNU makes no
925distinction between text and binary files. Also, @code{"w"} and
926@code{"w+"} do not cause truncation of the file; these have affect only
927when opening a file, and in this case the file has already been opened.
928You must make sure that the @var{opentype} argument matches the actual
929mode of the open file descriptor.
930
931The return value is the new stream. If the stream cannot be created
932(for example, if the modes for the file indicated by the file descriptor
933do not permit the access specified by the @var{opentype} argument), a
934null pointer is returned instead.
935
936In some other systems, @code{fdopen} may fail to detect that the modes
937for file descriptor do not permit the access specified by
938@code{opentype}. The GNU C library always checks for this.
939@end deftypefun
940
941For an example showing the use of the @code{fdopen} function,
942see @ref{Creating a Pipe}.
943
944@comment stdio.h
945@comment POSIX.1
946@deftypefun int fileno (FILE *@var{stream})
947This function returns the file descriptor associated with the stream
948@var{stream}. If an error is detected (for example, if the @var{stream}
949is not valid) or if @var{stream} does not do I/O to a file,
07435eb4 950@code{fileno} returns @math{-1}.
28f540f4
RM
951@end deftypefun
952
953@cindex standard file descriptors
954@cindex file descriptors, standard
955There are also symbolic constants defined in @file{unistd.h} for the
956file descriptors belonging to the standard streams @code{stdin},
957@code{stdout}, and @code{stderr}; see @ref{Standard Streams}.
958@pindex unistd.h
959
960@comment unistd.h
961@comment POSIX.1
962@table @code
963@item STDIN_FILENO
964@vindex STDIN_FILENO
965This macro has value @code{0}, which is the file descriptor for
966standard input.
967@cindex standard input file descriptor
968
969@comment unistd.h
970@comment POSIX.1
971@item STDOUT_FILENO
972@vindex STDOUT_FILENO
973This macro has value @code{1}, which is the file descriptor for
974standard output.
975@cindex standard output file descriptor
976
977@comment unistd.h
978@comment POSIX.1
979@item STDERR_FILENO
980@vindex STDERR_FILENO
981This macro has value @code{2}, which is the file descriptor for
982standard error output.
983@end table
984@cindex standard error file descriptor
985
986@node Stream/Descriptor Precautions
987@section Dangers of Mixing Streams and Descriptors
988@cindex channels
989@cindex streams and descriptors
990@cindex descriptors and streams
991@cindex mixing descriptors and streams
992
993You can have multiple file descriptors and streams (let's call both
994streams and descriptors ``channels'' for short) connected to the same
995file, but you must take care to avoid confusion between channels. There
996are two cases to consider: @dfn{linked} channels that share a single
997file position value, and @dfn{independent} channels that have their own
998file positions.
999
1000It's best to use just one channel in your program for actual data
1001transfer to any given file, except when all the access is for input.
1002For example, if you open a pipe (something you can only do at the file
1003descriptor level), either do all I/O with the descriptor, or construct a
1004stream from the descriptor with @code{fdopen} and then do all I/O with
1005the stream.
1006
1007@menu
1008* Linked Channels:: Dealing with channels sharing a file position.
1009* Independent Channels:: Dealing with separately opened, unlinked channels.
2c6fe0bd 1010* Cleaning Streams:: Cleaning a stream makes it safe to use
28f540f4
RM
1011 another channel.
1012@end menu
1013
1014@node Linked Channels
1015@subsection Linked Channels
1016@cindex linked channels
1017
1018Channels that come from a single opening share the same file position;
1019we call them @dfn{linked} channels. Linked channels result when you
1020make a stream from a descriptor using @code{fdopen}, when you get a
1021descriptor from a stream with @code{fileno}, when you copy a descriptor
1022with @code{dup} or @code{dup2}, and when descriptors are inherited
1023during @code{fork}. For files that don't support random access, such as
1024terminals and pipes, @emph{all} channels are effectively linked. On
1025random-access files, all append-type output streams are effectively
1026linked to each other.
1027
1028@cindex cleaning up a stream
1029If you have been using a stream for I/O, and you want to do I/O using
1030another channel (either a stream or a descriptor) that is linked to it,
1031you must first @dfn{clean up} the stream that you have been using.
1032@xref{Cleaning Streams}.
1033
1034Terminating a process, or executing a new program in the process,
1035destroys all the streams in the process. If descriptors linked to these
1036streams persist in other processes, their file positions become
1037undefined as a result. To prevent this, you must clean up the streams
1038before destroying them.
1039
1040@node Independent Channels
1041@subsection Independent Channels
1042@cindex independent channels
1043
1044When you open channels (streams or descriptors) separately on a seekable
1045file, each channel has its own file position. These are called
1046@dfn{independent channels}.
1047
1048The system handles each channel independently. Most of the time, this
1049is quite predictable and natural (especially for input): each channel
1050can read or write sequentially at its own place in the file. However,
1051if some of the channels are streams, you must take these precautions:
1052
1053@itemize @bullet
1054@item
1055You should clean an output stream after use, before doing anything else
1056that might read or write from the same part of the file.
1057
1058@item
1059You should clean an input stream before reading data that may have been
1060modified using an independent channel. Otherwise, you might read
1061obsolete data that had been in the stream's buffer.
1062@end itemize
1063
1064If you do output to one channel at the end of the file, this will
1065certainly leave the other independent channels positioned somewhere
1066before the new end. You cannot reliably set their file positions to the
1067new end of file before writing, because the file can always be extended
1068by another process between when you set the file position and when you
1069write the data. Instead, use an append-type descriptor or stream; they
1070always output at the current end of the file. In order to make the
1071end-of-file position accurate, you must clean the output channel you
1072were using, if it is a stream.
1073
1074It's impossible for two channels to have separate file pointers for a
1075file that doesn't support random access. Thus, channels for reading or
1076writing such files are always linked, never independent. Append-type
1077channels are also always linked. For these channels, follow the rules
1078for linked channels; see @ref{Linked Channels}.
1079
1080@node Cleaning Streams
1081@subsection Cleaning Streams
1082
1083On the GNU system, you can clean up any stream with @code{fclean}:
1084
1085@comment stdio.h
1086@comment GNU
1087@deftypefun int fclean (FILE *@var{stream})
1088Clean up the stream @var{stream} so that its buffer is empty. If
1089@var{stream} is doing output, force it out. If @var{stream} is doing
1090input, give the data in the buffer back to the system, arranging to
1091reread it.
1092@end deftypefun
1093
1094On other systems, you can use @code{fflush} to clean a stream in most
1095cases.
1096
1097You can skip the @code{fclean} or @code{fflush} if you know the stream
1098is already clean. A stream is clean whenever its buffer is empty. For
1099example, an unbuffered stream is always clean. An input stream that is
1100at end-of-file is clean. A line-buffered stream is clean when the last
1101character output was a newline.
1102
1103There is one case in which cleaning a stream is impossible on most
1104systems. This is when the stream is doing input from a file that is not
1105random-access. Such streams typically read ahead, and when the file is
1106not random access, there is no way to give back the excess data already
1107read. When an input stream reads from a random-access file,
1108@code{fflush} does clean the stream, but leaves the file pointer at an
1109unpredictable place; you must set the file pointer before doing any
1110further I/O. On the GNU system, using @code{fclean} avoids both of
1111these problems.
1112
1113Closing an output-only stream also does @code{fflush}, so this is a
1114valid way of cleaning an output stream. On the GNU system, closing an
1115input stream does @code{fclean}.
1116
1117You need not clean a stream before using its descriptor for control
1118operations such as setting terminal modes; these operations don't affect
1119the file position and are not affected by it. You can use any
1120descriptor for these operations, and all channels are affected
1121simultaneously. However, text already ``output'' to a stream but still
1122buffered by the stream will be subject to the new terminal modes when
1123subsequently flushed. To make sure ``past'' output is covered by the
1124terminal settings that were in effect at the time, flush the output
1125streams for that terminal before setting the modes. @xref{Terminal
1126Modes}.
1127
07435eb4
UD
1128@node Scatter-Gather
1129@section Fast Scatter-Gather I/O
1130@cindex scatter-gather
1131
1132Some applications may need to read or write data to multiple buffers,
1133which are seperated in memory. Although this can be done easily enough
1134with multiple calls to @code{read} and @code{write}, it is inefficent
1135because there is overhead associated with each kernel call.
1136
1137Instead, many platforms provide special high-speed primitives to perform
1138these @dfn{scatter-gather} operations in a single kernel call. The GNU C
1139library will provide an emulation on any system that lacks these
1140primitives, so they are not a portability threat. They are defined in
1141@code{sys/uio.h}.
1142
1143These functions are controlled with arrays of @code{iovec} structures,
1144which describe the location and size of each buffer.
1145
1146@deftp {Data Type} {struct iovec}
1147
1148The @code{iovec} structure describes a buffer. It contains two fields:
1149
1150@table @code
1151
1152@item void *iov_base
1153Contains the address of a buffer.
1154
1155@item size_t iov_len
1156Contains the length of the buffer.
1157
1158@end table
1159@end deftp
1160
1161@deftypefun ssize_t readv (int @var{filedes}, const struct iovec *@var{vector}, int @var{count})
1162
1163The @code{readv} function reads data from @var{filedes} and scatters it
1164into the buffers described in @var{vector}, which is taken to be
1165@var{count} structures long. As each buffer is filled, data is sent to the
1166next.
1167
1168Note that @code{readv} is not guaranteed to fill all the buffers.
1169It may stop at any point, for the same reasons @code{read} would.
1170
1171The return value is a count of bytes (@emph{not} buffers) read, @math{0}
1172indicating end-of-file, or @math{-1} indicating an error. The possible
1173errors are the same as in @code{read}.
1174
1175@end deftypefun
1176
1177@deftypefun ssize_t writev (int @var{filedes}, const struct iovec *@var{vector}, int @var{count})
1178
1179The @code{writev} function gathers data from the buffers described in
1180@var{vector}, which is taken to be @var{count} structures long, and writes
1181them to @code{filedes}. As each buffer is written, it moves on to the
1182next.
1183
1184Like @code{readv}, @code{writev} may stop midstream under the same
1185conditions @code{write} would.
1186
1187The return value is a count of bytes written, or @math{-1} indicating an
1188error. The possible errors are the same as in @code{write}.
1189
1190@end deftypefun
1191
1192@c Note - I haven't read this anywhere. I surmised it from my knowledge
1193@c of computer science. Thus, there could be subtleties I'm missing.
1194
1195Note that if the buffers are small (under about 1kB), high-level streams
1196may be easier to use than these functions. However, @code{readv} and
1197@code{writev} are more efficient when the individual buffers themselves
1198(as opposed to the total output), are large. In that case, a high-level
1199stream would not be able to cache the data effectively.
1200
1201@node Memory-mapped I/O
1202@section Memory-mapped I/O
1203
1204On modern operating systems, it is possible to @dfn{mmap} (pronounced
1205``em-map'') a file to a region of memory. When this is done, the file can
1206be accessed just like an array in the program.
1207
1208This is more efficent than @code{read} or @code{write}, as only regions
1209of the file a program actually accesses are loaded. Accesses to
1210not-yet-loaded parts of the mmapped region are handled in the same way as
1211swapped out pages.
1212
1213Since mmapped pages can be stored back to their file when physical memory
1214is low, it is possible to mmap files orders of magnitude larger than both
1215the physical memory @emph{and} swap space. The only limit is address
1216space. The theoretical limit is 4GB on a 32-bit machine - however, the
1217actual limit will be smaller since some areas will be reserved for other
1218purposes.
1219
1220Memory mapping only works on entire pages of memory. Thus, addresses
1221for mapping must be page-aligned, and length values will be rounded up.
1222To determine the size of a page the machine uses one should use
1223
1224@smallexample
1225size_t page_size = (size_t) sysconf (_SC_PAGESIZE);
1226@end smallexample
1227
1228These functions are declared in @file{sys/mman.h}.
1229
1230@deftypefun {void *} mmap (void *@var{address}, size_t @var{length},int @var{protect}, int @var{flags}, int @var{filedes}, off_t @var{offset})
1231
1232The @code{mmap} function creates a new mapping, connected to bytes
1233(@var{offset}) to (@var{offset} + @var{length}) in the file open on
1234@var{filedes}.
1235
1236@var{address} gives a preferred starting address for the mapping.
1237@code{NULL} expresses no preference. Any previous mapping at that
1238address is automatically removed. The address you give may still be
1239changed, unless you use the @code{MAP_FIXED} flag.
1240
1241@vindex PROT_READ
1242@vindex PROT_WRITE
1243@vindex PROT_EXEC
1244@var{protect} contains flags that control what kind of access is
1245permitted. They include @code{PROT_READ}, @code{PROT_WRITE}, and
1246@code{PROT_EXEC}, which permit reading, writing, and execution,
1247respectively. Inappropriate access will cause a segfault (@pxref{Program
1248Error Signals}).
1249
1250Note that most hardware designs cannot support write permission without
1251read permission, and many do not distinguish read and execute permission.
1252Thus, you may recieve wider permissions than you ask for, and mappings of
1253write-only files may be denied even if you do not use @code{PROT_READ}.
1254
1255@var{flags} contains flags that control the nature of the map.
1256One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified.
1257
1258They include:
1259
1260@vtable @code
1261@item MAP_PRIVATE
1262This specifies that writes to the region should never be written back
1263to the attached file. Instead, a copy is made for the process, and the
1264region will be swapped normally if memory runs low. No other process will
1265see the changes.
1266
1267Since private mappings effectively revert to ordinary memory
1268when written to, you must have enough virtual memory for a copy of
1269the entire mmapped region if you use this mode with @code{PROT_WRITE}.
1270
1271@item MAP_SHARED
1272This specifies that writes to the region will be written back to the
1273file. Changes made will be shared immediately with other processes
1274mmaping the same file.
1275
1276Note that actual writing may take place at any time. You need to use
1277@code{msync}, described below, if it is important that other processes
1278using conventional I/O get a consistent view of the file.
1279
1280@item MAP_FIXED
1281This forces the system to use the exact mapping address specified in
1282@var{address} and fail if it can't.
1283
1284@c One of these is official - the other is obviously an obsolete synonym
1285@c Which is which?
1286@item MAP_ANONYMOUS
1287@itemx MAP_ANON
1288This flag tells the system to create an anonymous mapping, not connected
1289to a file. @var{filedes} and @var{off} are ignored, and the region is
1290initialized with zeros.
1291
1292Anonymous maps are used as the basic primitive to extend the heap on some
1293systems. They are also useful to share data between multiple tasks
1294without creating a file.
1295
1296On some systems using private anonymous mmaps is more efficent than using
1297@code{malloc} for large blocks. This is not an issue with the GNU C library,
1298as the included @code{malloc} automatically uses @code{mmap} where appropriate.
1299
1300@c Linux has some other MAP_ options, which I have not discussed here.
1301@c MAP_DENYWRITE, MAP_EXECUTABLE and MAP_GROWSDOWN don't seem applicable to
1302@c user programs (and I don't understand the last two). MAP_LOCKED does
1303@c not appear to be implemented.
1304
1305@end vtable
1306
1307@code{mmap} returns the address of the new mapping, or @math{-1} for an
1308error.
1309
1310Possible errors include:
1311
1312@table @code
1313
1314@item EINVAL
1315
1316Either @var{address} was unusable, or inconsistent @var{flags} were
1317given.
1318
1319@item EACCES
1320
1321@var{filedes} was not open for the type of access specified in @var{protect}.
1322
1323@item ENOMEM
1324
1325Either there is not enough memory for the operation, or the process is
1326out of address space.
1327
1328@item ENODEV
1329
1330This file is of a type that doesn't support mapping.
1331
1332@item ENOEXEC
1333
1334The file is on a filesystem that doesn't support mapping.
1335
1336@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
1337@c However mandatory locks are not discussed in this manual.
1338@c
1339@c Similarly, ETXTBSY will occur if the MAP_DENYWRITE flag (not documented
1340@c here) is used and the file is already open for writing.
1341
1342@end table
1343
1344@end deftypefun
1345
1346@deftypefun int munmap (void *@var{addr}, size_t @var{length})
1347
1348@code{munmap} removes any memory maps from (@var{addr}) to (@var{addr} +
1349@var{length}). @var{length} should be the length of the mapping.
1350
1351It is safe to un-map multiple mappings in one command, or include unmapped
1352space in the range. It is also possible to unmap only part of an existing
1353mapping, however only entire pages can be removed. If @var{length} is not
1354an even number of pages, it will be rounded up.
1355
1356It returns @math{0} for success and @math{-1} for an error.
1357
1358One error is possible:
1359
1360@table @code
1361
1362@item EINVAL
1363The memory range given was outside the user mmap range, or wasn't page
1364aligned.
1365
1366@end table
1367
1368@end deftypefun
1369
1370@deftypefun int msync (void *@var{address}, size_t @var{length}, int @var{flags})
1371
1372When using shared mappings, the kernel can write the file at any time
1373before the mapping is removed. To be certain data has actually been
1374written to the file and will be accessable to non-memory-mapped I/O, it
1375is neccessary to use this function.
1376
1377It operates on the region @var{address} to (@var{address} + @var{length}).
1378It may be used on part of a mapping or multiple mappings, however the
1379region given should not contain any unmapped space.
1380
1381@var{flags} can contain some options:
1382
1383@vtable @code
1384
1385@item MS_SYNC
1386
1387This flag makes sure the data is actually written @emph{to disk}.
1388Normally @code{msync} only makes sure that accesses to a file with
1389conventional I/O reflect the recent changes.
1390
1391@item MS_ASYNC
1392
1393This tells @code{msync} to begin the synchronization, but not to wait for
1394it to complete.
1395
1396@c Linux also has MS_INVALIDATE, which I don't understand.
1397
1398@end vtable
1399
1400@code{msync} returns @math{0} for success and @math{-1} for
1401error. Errors include:
1402
1403@table @code
1404
1405@item EINVAL
1406An invalid region was given, or the @var{flags} were invalid.
1407
1408@item EFAULT
1409There is no existing mapping in at least part of the given region.
1410
1411@end table
1412
1413@end deftypefun
1414
1415@deftypefun {void *} mremap (void *@var{address}, size_t @var{length}, size_t @var{new_length}, int @var{flag})
1416
1417This function can be used to change the size of an existing memory
1418area. @var{address} and @var{length} must cover a region entirely mapped
1419in the same @code{mmap} statement. A new mapping with the same
1420characteristics will be returned, but a with the length @var{new_length}
1421instead.
1422
1423One option is possible, @code{MREMAP_MAYMOVE}. If it is given in
1424@var{flags}, the system may remove the existing mapping and create a new
1425one of the desired length in another location.
1426
1427The address of the resulting mapping is returned, or @math{-1}. Possible
1428error codes include:
1429
1430This function is only available on a few systems. Except for performing
1431optional optimizations one should not rely on this function.
1432@table @code
1433
1434@item EFAULT
1435There is no existing mapping in at least part of the original region, or
1436the region covers two or more distinct mappings.
1437
1438@item EINVAL
1439The address given is misaligned or inappropriate.
1440
1441@item EAGAIN
1442The region has pages locked, and if extended it would exceed the
1443process's resource limit for locked pages. @xref{Limits on Resources}.
1444
1445@item ENOMEM
1446The region is private writable, and insufficent virtual memory is
1447available to extend it. Also, this error will occur if
1448@code{MREMAP_MAYMOVE} is not given and the extension would collide with
1449another mapped region.
1450
1451@end table
1452@end deftypefun
1453
1454Not all file descriptors may be mapped. Sockets, pipes, and most devices
1455only allow sequential access and do not fit into the mapping abstraction.
1456In addition, some regular files may not be mmapable, and older kernels may
1457not support mapping at all. Thus, programs using @code{mmap} should
1458have a fallback method to use should it fail. @xref{Mmap,,,standards,GNU
1459Coding Standards}.
1460
1461@c XXX madvice documentation missing
1462
28f540f4
RM
1463@node Waiting for I/O
1464@section Waiting for Input or Output
1465@cindex waiting for input or output
1466@cindex multiplexing input
1467@cindex input from multiple files
1468
1469Sometimes a program needs to accept input on multiple input channels
1470whenever input arrives. For example, some workstations may have devices
1471such as a digitizing tablet, function button box, or dial box that are
1472connected via normal asynchronous serial interfaces; good user interface
1473style requires responding immediately to input on any device. Another
1474example is a program that acts as a server to several other processes
1475via pipes or sockets.
1476
1477You cannot normally use @code{read} for this purpose, because this
1478blocks the program until input is available on one particular file
1479descriptor; input on other channels won't wake it up. You could set
1480nonblocking mode and poll each file descriptor in turn, but this is very
1481inefficient.
1482
1483A better solution is to use the @code{select} function. This blocks the
1484program until input or output is ready on a specified set of file
1485descriptors, or until a timer expires, whichever comes first. This
1486facility is declared in the header file @file{sys/types.h}.
1487@pindex sys/types.h
1488
1489In the case of a server socket (@pxref{Listening}), we say that
1490``input'' is available when there are pending connections that could be
1491accepted (@pxref{Accepting Connections}). @code{accept} for server
1492sockets blocks and interacts with @code{select} just as @code{read} does
1493for normal input.
1494
1495@cindex file descriptor sets, for @code{select}
1496The file descriptor sets for the @code{select} function are specified
1497as @code{fd_set} objects. Here is the description of the data type
1498and some macros for manipulating these objects.
1499
1500@comment sys/types.h
1501@comment BSD
1502@deftp {Data Type} fd_set
1503The @code{fd_set} data type represents file descriptor sets for the
1504@code{select} function. It is actually a bit array.
1505@end deftp
1506
1507@comment sys/types.h
1508@comment BSD
1509@deftypevr Macro int FD_SETSIZE
1510The value of this macro is the maximum number of file descriptors that a
1511@code{fd_set} object can hold information about. On systems with a
1512fixed maximum number, @code{FD_SETSIZE} is at least that number. On
1513some systems, including GNU, there is no absolute limit on the number of
1514descriptors open, but this macro still has a constant value which
1515controls the number of bits in an @code{fd_set}; if you get a file
1516descriptor with a value as high as @code{FD_SETSIZE}, you cannot put
1517that descriptor into an @code{fd_set}.
1518@end deftypevr
1519
1520@comment sys/types.h
1521@comment BSD
1522@deftypefn Macro void FD_ZERO (fd_set *@var{set})
1523This macro initializes the file descriptor set @var{set} to be the
1524empty set.
1525@end deftypefn
1526
1527@comment sys/types.h
1528@comment BSD
1529@deftypefn Macro void FD_SET (int @var{filedes}, fd_set *@var{set})
1530This macro adds @var{filedes} to the file descriptor set @var{set}.
1531@end deftypefn
1532
1533@comment sys/types.h
1534@comment BSD
1535@deftypefn Macro void FD_CLR (int @var{filedes}, fd_set *@var{set})
1536This macro removes @var{filedes} from the file descriptor set @var{set}.
1537@end deftypefn
1538
1539@comment sys/types.h
1540@comment BSD
1541@deftypefn Macro int FD_ISSET (int @var{filedes}, fd_set *@var{set})
1542This macro returns a nonzero value (true) if @var{filedes} is a member
3081378b 1543of the file descriptor set @var{set}, and zero (false) otherwise.
28f540f4
RM
1544@end deftypefn
1545
1546Next, here is the description of the @code{select} function itself.
1547
1548@comment sys/types.h
1549@comment BSD
1550@deftypefun int select (int @var{nfds}, fd_set *@var{read-fds}, fd_set *@var{write-fds}, fd_set *@var{except-fds}, struct timeval *@var{timeout})
1551The @code{select} function blocks the calling process until there is
1552activity on any of the specified sets of file descriptors, or until the
1553timeout period has expired.
1554
1555The file descriptors specified by the @var{read-fds} argument are
1556checked to see if they are ready for reading; the @var{write-fds} file
1557descriptors are checked to see if they are ready for writing; and the
1558@var{except-fds} file descriptors are checked for exceptional
1559conditions. You can pass a null pointer for any of these arguments if
1560you are not interested in checking for that kind of condition.
1561
d07e37e2 1562A file descriptor is considered ready for reading if it is not at end of
28f540f4
RM
1563file. A server socket is considered ready for reading if there is a
1564pending connection which can be accepted with @code{accept};
1565@pxref{Accepting Connections}. A client socket is ready for writing when
1566its connection is fully established; @pxref{Connecting}.
1567
1568``Exceptional conditions'' does not mean errors---errors are reported
1569immediately when an erroneous system call is executed, and do not
1570constitute a state of the descriptor. Rather, they include conditions
1571such as the presence of an urgent message on a socket. (@xref{Sockets},
1572for information on urgent messages.)
1573
1574The @code{select} function checks only the first @var{nfds} file
1575descriptors. The usual thing is to pass @code{FD_SETSIZE} as the value
1576of this argument.
1577
1578The @var{timeout} specifies the maximum time to wait. If you pass a
1579null pointer for this argument, it means to block indefinitely until one
1580of the file descriptors is ready. Otherwise, you should provide the
1581time in @code{struct timeval} format; see @ref{High-Resolution
1582Calendar}. Specify zero as the time (a @code{struct timeval} containing
1583all zeros) if you want to find out which descriptors are ready without
1584waiting if none are ready.
1585
1586The normal return value from @code{select} is the total number of ready file
1587descriptors in all of the sets. Each of the argument sets is overwritten
1588with information about the descriptors that are ready for the corresponding
1589operation. Thus, to see if a particular descriptor @var{desc} has input,
1590use @code{FD_ISSET (@var{desc}, @var{read-fds})} after @code{select} returns.
1591
1592If @code{select} returns because the timeout period expires, it returns
1593a value of zero.
1594
1595Any signal will cause @code{select} to return immediately. So if your
1596program uses signals, you can't rely on @code{select} to keep waiting
1597for the full time specified. If you want to be sure of waiting for a
1598particular amount of time, you must check for @code{EINTR} and repeat
1599the @code{select} with a newly calculated timeout based on the current
1600time. See the example below. See also @ref{Interrupted Primitives}.
1601
1602If an error occurs, @code{select} returns @code{-1} and does not modify
2c6fe0bd 1603the argument file descriptor sets. The following @code{errno} error
28f540f4
RM
1604conditions are defined for this function:
1605
1606@table @code
1607@item EBADF
1608One of the file descriptor sets specified an invalid file descriptor.
1609
1610@item EINTR
1611The operation was interrupted by a signal. @xref{Interrupted Primitives}.
1612
1613@item EINVAL
1614The @var{timeout} argument is invalid; one of the components is negative
1615or too large.
1616@end table
1617@end deftypefun
1618
1619@strong{Portability Note:} The @code{select} function is a BSD Unix
1620feature.
1621
1622Here is an example showing how you can use @code{select} to establish a
1623timeout period for reading from a file descriptor. The @code{input_timeout}
1624function blocks the calling process until input is available on the
1625file descriptor, or until the timeout period expires.
1626
1627@smallexample
1628@include select.c.texi
1629@end smallexample
1630
1631There is another example showing the use of @code{select} to multiplex
1632input from multiple sockets in @ref{Server Example}.
1633
1634
dfd2257a
UD
1635@node Synchronizing I/O
1636@section Synchronizing I/O operations
1637
1638@cindex synchronizing
1639In most modern operation systems the normal I/O operations are not
1640executed synchronously. I.e., even if a @code{write} system call
1641returns this does not mean the data is actually written to the media,
1642e.g., the disk.
1643
1644In situations where synchronization points are necessary the user can
1645use special functions which ensure that all operations finished before
1646they return.
1647
1648@comment unistd.h
1649@comment X/Open
1650@deftypefun int sync (void)
1651A call to this function will not return as long as there is data which
1652that is not written to the device. All dirty buffers in the kernel will
1653be written and so an overall consistent system can be achieved (if no
1654other process in parallel writes data).
1655
1656A prototype for @code{sync} can be found in @file{unistd.h}.
1657
1658The return value is zero to indicate no error.
1659@end deftypefun
1660
1661More often it is wanted that not all data in the system is committed.
1662Programs want to ensure that data written to a given file are all
1663committed and in this situation @code{sync} is overkill.
1664
1665@comment unistd.h
1666@comment POSIX
1667@deftypefun int fsync (int @var{fildes})
1668The @code{fsync} can be used to make sure all data associated with the
1669open file @var{fildes} is written to the device associated with the
1670descriptor. The function call does not return unless all actions have
1671finished.
1672
1673A prototype for @code{fsync} can be found in @file{unistd.h}.
1674
1675This function is a cancelation point in multi-threaded programs. This
1676is a problem if the thread allocates some resources (like memory, file
1677descriptors, semaphores or whatever) at the time @code{fsync} is
1678called. If the thread gets canceled these resources stay allocated
1679until the program ends. To avoid this calls to @code{fsync} should be
1680protected using cancelation handlers.
1681@c ref pthread_cleanup_push / pthread_cleanup_pop
1682
1683The return value of the function is zero if no error occured. Otherwise
1684it is @math{-1} and the global variable @var{errno} is set to the
1685following values:
1686@table @code
1687@item EBADF
1688The descriptor @var{fildes} is not valid.
1689
1690@item EINVAL
1691No synchronization is possible since the system does not implement this.
1692@end table
1693@end deftypefun
1694
1695Sometimes it is not even necessary to write all data associated with a
1696file descriptor. E.g., in database files which do not change in size it
1697is enough to write all the file content data to the device.
f2ea0f5b 1698Meta-information like the modification time etc. are not that important
dfd2257a
UD
1699and leaving such information uncommitted does not prevent a successful
1700recovering of the file in case of a problem.
1701
1702@comment unistd.h
1703@comment POSIX
1704@deftypefun int fdatasync (int @var{fildes})
f2ea0f5b 1705When a call to the @code{fdatasync} function returns it is made sure
dfd2257a 1706that all of the file data is written to the device. For all pending I/O
f2ea0f5b 1707operations the parts guaranteeing data integrity finished.
dfd2257a
UD
1708
1709Not all systems implement the @code{fdatasync} operation. On systems
1710missing this functionality @code{fdatasync} is emulated by a call to
1711@code{fsync} since the performed actions are a superset of those
1712required by @code{fdatasyn}.
1713
1714The prototype for @code{fdatasync} is in @file{unistd.h}.
1715
1716The return value of the function is zero if no error occured. Otherwise
1717it is @math{-1} and the global variable @var{errno} is set to the
1718following values:
1719@table @code
1720@item EBADF
1721The descriptor @var{fildes} is not valid.
1722
1723@item EINVAL
1724No synchronization is possible since the system does not implement this.
1725@end table
1726@end deftypefun
1727
1728
b07d03e0
UD
1729@node Asynchronous I/O
1730@section Perform I/O Operations in Parallel
1731
1732The POSIX.1b standard defines a new set of I/O operations which can
1733reduce the time an application spends waiting at I/O significantly. The
1734new functions allow a program to initiate one or more I/O operations and
c756c71c 1735then immediately resume the normal work while the I/O operations are
a3a4a74e
UD
1736executed in parallel. The functionality is available if the
1737@file{unistd.h} file defines the symbol @code{_POSIX_ASYNCHRONOUS_IO}.
b07d03e0
UD
1738
1739These functions are part of the library with realtime functions named
1740@file{librt}. They are not actually part of the @file{libc} binary.
1741The implementation of these functions can be done using support in the
c756c71c
UD
1742kernel (if available) or using an implementation based on threads at
1743userlevel. In the latter case it might be necessary to link applications
fed8f7f7 1744with the thread library @file{libpthread} in addition to @file{librt}.
b07d03e0 1745
c756c71c 1746All AIO operations operate on files which were opened previously. There
b07d03e0
UD
1747might be arbitrary many operations for one file running. The
1748asynchronous I/O operations are controlled using a data structure named
1749@code{struct aiocb} (@dfn{AIO control block}). It is defined in
1750@file{aio.h} as follows.
1751
1752@comment aio.h
1753@comment POSIX.1b
1754@deftp {Data Type} {struct aiocb}
1755The POSIX.1b standard mandates that the @code{struct aiocb} structure
1756contains at least the members described in the following table. There
1757might be more elements which are used by the implementation but
1758depending on these elements is not portable and is highly deprecated.
1759
1760@table @code
1761@item int aio_fildes
1762This element specifies the file descriptor which is used for the
1763operation. It must be a legal descriptor since otherwise the operation
1764fails for obvious reasons.
1765
1766The device on which the file is opened must allow the seek operation.
1767I.e., it is not possible to use any of the AIO operations on devices
1768like terminals where an @code{lseek} call would lead to an error.
1769
1770@item off_t aio_offset
fed8f7f7
UD
1771This element specifies at which offset in the file the operation (input
1772or output) is performed. Since the operations are carried out in arbitrary
b07d03e0
UD
1773order and more than one operation for one file descriptor can be
1774started, one cannot expect a current read/write position of the file
1775descriptor.
1776
1777@item volatile void *aio_buf
1778This is a pointer to the buffer with the data to be written or the place
c756c71c 1779where the read data is stored.
b07d03e0
UD
1780
1781@item size_t aio_nbytes
1782This element specifies the length of the buffer pointed to by @code{aio_buf}.
1783
1784@item int aio_reqprio
c756c71c
UD
1785If the platform has defined @code{_POSIX_PRIORITIZED_IO} and
1786@code{_POSIX_PRIORITY_SCHEDULING} the AIO requests are
b07d03e0
UD
1787processed based on the current scheduling priority. The
1788@code{aio_reqprio} element can then be used to lower the priority of the
1789AIO operation.
1790
1791@item struct sigevent aio_sigevent
1792This element specifies how the calling process is notified once the
fed8f7f7 1793operation terminates. If the @code{sigev_notify} element is
b07d03e0
UD
1794@code{SIGEV_NONE} no notification is send. If it is @code{SIGEV_SIGNAL}
1795the signal determined by @code{sigev_signo} is send. Otherwise
fed8f7f7 1796@code{sigev_notify} must be @code{SIGEV_THREAD}. In this case a thread
c756c71c 1797is created which starts executing the function pointed to by
b07d03e0
UD
1798@code{sigev_notify_function}.
1799
1800@item int aio_lio_opcode
1801This element is only used by the @code{lio_listio} and
c756c71c
UD
1802@code{lio_listio64} functions. Since these functions allow to start an
1803arbitrary number of operations at once and since each operation can be
b07d03e0
UD
1804input or output (or nothing) the information must be stored in the
1805control block. The possible values are:
1806
1807@vtable @code
1808@item LIO_READ
1809Start a read operation. Read from the file at position
1810@code{aio_offset} and store the next @code{aio_nbytes} bytes in the
1811buffer pointed to by @code{aio_buf}.
1812
1813@item LIO_WRITE
1814Start a write operation. Write @code{aio_nbytes} bytes starting at
1815@code{aio_buf} into the file starting at position @code{aio_offset}.
1816
1817@item LIO_NOP
1818Do nothing for this control block. This value is useful sometimes when
1819an array of @code{struct aiocb} values contains holes, i.e., some of the
fed8f7f7 1820values must not be handled although the whole array is presented to the
b07d03e0
UD
1821@code{lio_listio} function.
1822@end vtable
1823@end table
a3a4a74e 1824
fed8f7f7 1825When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a
a3a4a74e
UD
182632 bits machine this type is in fact @code{struct aiocb64} since the LFS
1827interface transparently replaces the @code{struct aiocb} definition.
1828@end deftp
1829
1830For use with the AIO functions defined in the LFS there is a similar type
1831defined which replaces the types of the appropriate members with larger
1832types but otherwise is equivalent to @code{struct aiocb}. Especially
1833all member names are the same.
1834
1835@comment aio.h
1836@comment POSIX.1b
1837@deftp {Data Type} {struct aiocb64}
1838@table @code
1839@item int aio_fildes
1840This element specifies the file descriptor which is used for the
1841operation. It must be a legal descriptor since otherwise the operation
1842fails for obvious reasons.
1843
1844The device on which the file is opened must allow the seek operation.
1845I.e., it is not possible to use any of the AIO operations on devices
1846like terminals where an @code{lseek} call would lead to an error.
1847
1848@item off64_t aio_offset
1849This element specified at which offset in the file the operation (input
1850or output) is performed. Since the operation are carried in arbitrary
1851order and more than one operation for one file descriptor can be
1852started, one cannot expect a current read/write position of the file
1853descriptor.
1854
1855@item volatile void *aio_buf
1856This is a pointer to the buffer with the data to be written or the place
1857where the ead data is stored.
1858
1859@item size_t aio_nbytes
1860This element specifies the length of the buffer pointed to by @code{aio_buf}.
1861
1862@item int aio_reqprio
1863If for the platform @code{_POSIX_PRIORITIZED_IO} and
1864@code{_POSIX_PRIORITY_SCHEDULING} is defined the AIO requests are
1865processed based on the current scheduling priority. The
1866@code{aio_reqprio} element can then be used to lower the priority of the
1867AIO operation.
1868
1869@item struct sigevent aio_sigevent
1870This element specifies how the calling process is notified once the
fed8f7f7 1871operation terminates. If the @code{sigev_notify} element is
a3a4a74e
UD
1872@code{SIGEV_NONE} no notification is send. If it is @code{SIGEV_SIGNAL}
1873the signal determined by @code{sigev_signo} is send. Otherwise
1874@code{sigev_notify} must be @code{SIGEV_THREAD} in which case a thread
1875which starts executing the function pointeed to by
1876@code{sigev_notify_function}.
1877
1878@item int aio_lio_opcode
1879This element is only used by the @code{lio_listio} and
1880@code{[lio_listio64} functions. Since these functions allow to start an
fed8f7f7 1881arbitrary number of operations at once and since each operation can be
a3a4a74e
UD
1882input or output (or nothing) the information must be stored in the
1883control block. See the description of @code{struct aiocb} for a description
1884of the possible values.
1885@end table
1886
1887When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a
188832 bits machine this type is available under the name @code{struct
1889aiocb64} since the LFS replaces transparently the old interface.
b07d03e0
UD
1890@end deftp
1891
1892@menu
a3a4a74e
UD
1893* Asynchronous Reads/Writes:: Asynchronous Read and Write Operations.
1894* Status of AIO Operations:: Getting the Status of AIO Operations.
1895* Synchronizing AIO Operations:: Getting into a consistent state.
b07d03e0 1896* Cancel AIO Operations:: Cancelation of AIO Operations.
a3a4a74e 1897* Configuration of AIO:: How to optimize the AIO implementation.
b07d03e0
UD
1898@end menu
1899
a3a4a74e
UD
1900@node Asynchronous Reads/Writes
1901@subsection Asynchronous Read and Write Operations
b07d03e0
UD
1902
1903@comment aio.h
1904@comment POSIX.1b
1905@deftypefun int aio_read (struct aiocb *@var{aiocbp})
1906This function initiates an asynchronous read operation. The function
c756c71c 1907call immediately returns after the operation was enqueued or when an
fed8f7f7 1908error was encountered.
b07d03e0 1909
a3a4a74e 1910The first @code{aiocbp->aio_nbytes} bytes of the file for which
c756c71c
UD
1911@code{aiocbp->aio_fildes} is a descriptor are written to the buffer
1912starting at @code{aiocbp->aio_buf}. Reading starts at the absolute
1913position @code{aiocbp->aio_offset} in the file.
b07d03e0
UD
1914
1915If prioritized I/O is supported by the platform the
1916@code{aiocbp->aio_reqprio} value is used to adjust the priority before
1917the request is actually enqueued.
1918
1919The calling process is notified about the termination of the read
1920request according to the @code{aiocbp->aio_sigevent} value.
1921
1922When @code{aio_read} returns the return value is zero if no error
1923occurred that can be found before the process is enqueued. If such an
a3a4a74e 1924early error is found the function returns @math{-1} and sets
b07d03e0
UD
1925@code{errno} to one of the following values.
1926
1927@table @code
1928@item EAGAIN
1929The request was not enqueued due to (temporarily) exceeded resource
1930limitations.
1931@item ENOSYS
1932The @code{aio_read} function is not implemented.
1933@item EBADF
1934The @code{aiocbp->aio_fildes} descriptor is not valid. This condition
fed8f7f7
UD
1935needs not be recognized before enqueueing the request and so this error
1936might also be signaled asynchronously.
b07d03e0
UD
1937@item EINVAL
1938The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqpiro} value is
1939invalid. This condition need not be recognized before enqueueing the
1940request and so this error might also be signaled asynchrously.
1941@end table
1942
a3a4a74e 1943In the case @code{aio_read} returns zero the current status of the
b07d03e0 1944request can be queried using @code{aio_error} and @code{aio_return}
c756c71c 1945functions. As long as the value returned by @code{aio_error} is
b07d03e0
UD
1946@code{EINPROGRESS} the operation has not yet completed. If
1947@code{aio_error} returns zero the operation successfully terminated,
1948otherwise the value is to be interpreted as an error code. If the
1949function terminated the result of the operation can be get using a call
1950to @code{aio_return}. The returned value is the same as an equivalent
fed8f7f7 1951call to @code{read} would have returned. Possible error codes returned
b07d03e0
UD
1952by @code{aio_error} are:
1953
1954@table @code
1955@item EBADF
1956The @code{aiocbp->aio_fildes} descriptor is not valid.
1957@item ECANCELED
1958The operation was canceled before the operation was finished
1959(@pxref{Cancel AIO Operations})
1960@item EINVAL
1961The @code{aiocbp->aio_offset} value is invalid.
1962@end table
a3a4a74e
UD
1963
1964When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
1965function is in fact @code{aio_read64} since the LFS interface transparently
1966replaces the normal implementation.
b07d03e0
UD
1967@end deftypefun
1968
1969@comment aio.h
a3a4a74e 1970@comment Unix98
b07d03e0
UD
1971@deftypefun int aio_read64 (struct aiocb *@var{aiocbp})
1972This function is similar to the @code{aio_read} function. The only
c756c71c 1973difference is that on @w{32 bits} machines the file descriptor should
b07d03e0 1974be opened in the large file mode. Internally @code{aio_read64} uses
a3a4a74e
UD
1975functionality equivalent to @code{lseek64} (@pxref{File Position
1976Primitive}) to position the file descriptor correctly for the reading,
fed8f7f7 1977as opposed to @code{lseek} functionality used in @code{aio_read}.
a3a4a74e
UD
1978
1979When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
1980function is available under the name @code{aio_read} and so transparently
1981replaces the interface for small files on 32 bits machines.
b07d03e0
UD
1982@end deftypefun
1983
a3a4a74e
UD
1984To write data asynchronously to a file there exists an equivalent pair
1985of functions with a very similar interface.
1986
1987@comment aio.h
1988@comment POSIX.1b
1989@deftypefun int aio_write (struct aiocb *@var{aiocbp})
1990This function initiates an asynchronous write operation. The function
1991call immediately returns after the operation was enqueued or if before
fed8f7f7 1992this happens an error was encountered.
a3a4a74e
UD
1993
1994The first @code{aiocbp->aio_nbytes} bytes from the buffer starting at
1995@code{aiocbp->aio_buf} are written to the file for which
1996@code{aiocbp->aio_fildes} is an descriptor, starting at the absolute
1997position @code{aiocbp->aio_offset} in the file.
1998
1999If prioritized I/O is supported by the platform the
2000@code{aiocbp->aio_reqprio} value is used to adjust the priority before
2001the request is actually enqueued.
2002
2003The calling process is notified about the termination of the read
2004request according to the @code{aiocbp->aio_sigevent} value.
2005
2006When @code{aio_write} returns the return value is zero if no error
2007occurred that can be found before the process is enqueued. If such an
2008early error is found the function returns @math{-1} and sets
2009@code{errno} to one of the following values.
2010
2011@table @code
2012@item EAGAIN
2013The request was not enqueued due to (temporarily) exceeded resource
2014limitations.
2015@item ENOSYS
2016The @code{aio_write} function is not implemented.
2017@item EBADF
2018The @code{aiocbp->aio_fildes} descriptor is not valid. This condition
fed8f7f7
UD
2019needs not be recognized before enqueueing the request and so this error
2020might also be signaled asynchronously.
a3a4a74e
UD
2021@item EINVAL
2022The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqpiro} value is
fed8f7f7
UD
2023invalid. This condition needs not be recognized before enqueueing the
2024request and so this error might also be signaled asynchronously.
a3a4a74e
UD
2025@end table
2026
2027In the case @code{aio_write} returns zero the current status of the
2028request can be queried using @code{aio_error} and @code{aio_return}
c756c71c 2029functions. As long as the value returned by @code{aio_error} is
a3a4a74e
UD
2030@code{EINPROGRESS} the operation has not yet completed. If
2031@code{aio_error} returns zero the operation successfully terminated,
2032otherwise the value is to be interpreted as an error code. If the
2033function terminated the result of the operation can be get using a call
2034to @code{aio_return}. The returned value is the same as an equivalent
2035call to @code{read} would have returned. Possible error code returned
2036by @code{aio_error} are:
2037
2038@table @code
2039@item EBADF
2040The @code{aiocbp->aio_fildes} descriptor is not valid.
2041@item ECANCELED
2042The operation was canceled before the operation was finished
2043(@pxref{Cancel AIO Operations})
2044@item EINVAL
2045The @code{aiocbp->aio_offset} value is invalid.
2046@end table
2047
2048When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2049function is in fact @code{aio_write64} since the LFS interface transparently
2050replaces the normal implementation.
2051@end deftypefun
2052
2053@comment aio.h
2054@comment Unix98
2055@deftypefun int aio_write64 (struct aiocb *@var{aiocbp})
2056This function is similar to the @code{aio_write} function. The only
fed8f7f7 2057difference is that on @w{32 bits} machines the file descriptor should
a3a4a74e
UD
2058be opened in the large file mode. Internally @code{aio_write64} uses
2059functionality equivalent to @code{lseek64} (@pxref{File Position
2060Primitive}) to position the file descriptor correctly for the writing,
fed8f7f7 2061as opposed to @code{lseek} functionality used in @code{aio_write}.
a3a4a74e
UD
2062
2063When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2064function is available under the name @code{aio_write} and so transparently
2065replaces the interface for small files on 32 bits machines.
2066@end deftypefun
2067
2068Beside these functions with the more or less traditional interface
2069POSIX.1b also defines a function with can initiate more than one
2070operation at once and which can handled freely mixed read and write
2071operation. It is therefore similar to a combination of @code{readv} and
2072@code{writev}.
2073
2074@comment aio.h
2075@comment POSIX.1b
2076@deftypefun int lio_listio (int @var{mode}, struct aiocb *const @var{list}[], int @var{nent}, struct sigevent *@var{sig})
2077The @code{lio_listio} function can be used to enqueue an arbitrary
2078number of read and write requests at one time. The requests can all be
2079meant for the same file, all for different files or every solution in
2080between.
2081
2082@code{lio_listio} gets the @var{nent} requests from the array pointed to
2083by @var{list}. What operation has to be performed is determined by the
2084@code{aio_lio_opcode} member in each element of @var{list}. If this
2085field is @code{LIO_READ} an read operation is queued, similar to a call
2086of @code{aio_read} for this element of the array (except that the way
2087the termination is signalled is different, as we will see below). If
2088the @code{aio_lio_opcode} member is @code{LIO_WRITE} an write operation
2089is enqueued. Otherwise the @code{aio_lio_opcode} must be @code{LIO_NOP}
2090in which case this element of @var{list} is simply ignored. This
2091``operation'' is useful in situations where one has a fixed array of
2092@code{struct aiocb} elements from which only a few need to be handled at
2093a time. Another situation is where the @code{lio_listio} call was
2094cancelled before all requests are processed (@pxref{Cancel AIO
2095Operations}) and the remaining requests have to be reissued.
2096
fed8f7f7 2097The other members of each element of the array pointed to by
a3a4a74e
UD
2098@code{list} must have values suitable for the operation as described in
2099the documentation for @code{aio_read} and @code{aio_write} above.
2100
2101The @var{mode} argument determines how @code{lio_listio} behaves after
2102having enqueued all the requests. If @var{mode} is @code{LIO_WAIT} it
2103waits until all requests terminated. Otherwise @var{mode} must be
fed8f7f7 2104@code{LIO_NOWAIT} and in this case the function returns immediately after
a3a4a74e
UD
2105having enqueued all the requests. In this case the caller gets a
2106notification of the termination of all requests according to the
2107@var{sig} parameter. If @var{sig} is @code{NULL} no notification is
2108send. Otherwise a signal is sent or a thread is started, just as
2109described in the description for @code{aio_read} or @code{aio_write}.
2110
2111If @var{mode} is @code{LIO_WAIT} the return value of @code{lio_listio}
2112is @math{0} when all requests completed successfully. Otherwise the
2113function return @math{-1} and @code{errno} is set accordingly. To find
2114out which request or requests failed one has to use the @code{aio_error}
2115function on all the elements of the array @var{list}.
2116
2117In case @var{mode} is @code{LIO_NOWAIT} the function return @math{0} if
2118all requests were enqueued correctly. The current state of the requests
2119can be found using @code{aio_error} and @code{aio_return} as described
2120above. In case @code{lio_listio} returns @math{-1} in this mode the
2121global variable @code{errno} is set accordingly. If a request did not
2122yet terminate a call to @code{aio_error} returns @code{EINPROGRESS}. If
2123the value is different the request is finished and the error value (or
2124@math{0}) is returned and the result of the operation can be retrieved
2125using @code{aio_return}.
2126
2127Possible values for @code{errno} are:
2128
2129@table @code
2130@item EAGAIN
2131The resources necessary to queue all the requests are not available in
2132the moment. The error status for each element of @var{list} must be
2133checked which request failed.
2134
fed8f7f7 2135Another reason could be that the system wide limit of AIO requests is
a3a4a74e
UD
2136exceeded. This cannot be the case for the implementation on GNU systems
2137since no arbitrary limits exist.
2138@item EINVAL
2139The @var{mode} parameter is invalid or @var{nent} is larger than
2140@code{AIO_LISTIO_MAX}.
2141@item EIO
2142One or more of the request's I/O operations failed. The error status of
fed8f7f7 2143each request should be checked for which one failed.
a3a4a74e
UD
2144@item ENOSYS
2145The @code{lio_listio} function is not supported.
2146@end table
2147
2148If the @var{mode} parameter is @code{LIO_NOWAIT} and the caller cancels
2149an request the error status for this request returned by
2150@code{aio_error} is @code{ECANCELED}.
2151
2152When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2153function is in fact @code{lio_listio64} since the LFS interface
2154transparently replaces the normal implementation.
2155@end deftypefun
2156
2157@comment aio.h
2158@comment Unix98
2159@deftypefun int lio_listio64 (int @var{mode}, struct aiocb *const @var{list}, int @var{nent}, struct sigevent *@var{sig})
2160This function is similar to the @code{aio_listio} function. The only
2161difference is that only @w{32 bits} machines the file descriptor should
2162be opened in the large file mode. Internally @code{lio_listio64} uses
2163functionality equivalent to @code{lseek64} (@pxref{File Position
2164Primitive}) to position the file descriptor correctly for the reading or
fed8f7f7 2165writing, as opposed to @code{lseek} functionality used in
a3a4a74e
UD
2166@code{lio_listio}.
2167
2168When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2169function is available under the name @code{lio_listio} and so
2170transparently replaces the interface for small files on 32 bits
2171machines.
2172@end deftypefun
2173
2174@node Status of AIO Operations
2175@subsection Getting the Status of AIO Operations
2176
fed8f7f7 2177As already described in the documentation of the functions in the last
a3a4a74e
UD
2178section it must be possible to get information about the status of a I/O
2179request. When the operation is performed really asynchronous (as with
2180@code{aio_read} and @code{aio_write} and with @code{aio_listio} when the
2181mode is @code{LIO_NOWAIT}) one sometimes needs to know whether a
2182specific request already terminated and if yes, what the result was..
2183The following two function allow to get this kind of information.
2184
2185@comment aio.h
2186@comment POSIX.1b
2187@deftypefun int aio_error (const struct aiocb *@var{aiocbp})
2188This function determines the error state of the request described by the
fed8f7f7 2189@code{struct aiocb} variable pointed to by @var{aiocbp}. If the
a3a4a74e
UD
2190request has not yet terminated the value returned is always
2191@code{EINPROGRESS}. Once the request has terminated the value
2192@code{aio_error} returns is either @math{0} if the request completed
fed8f7f7 2193successfully or it returns the value which would be stored in the
a3a4a74e
UD
2194@code{errno} variable if the request would have been done using
2195@code{read}, @code{write}, or @code{fsync}.
2196
2197The function can return @code{ENOSYS} if it is not implemented. It
2198could also return @code{EINVAL} if the @var{aiocbp} parameter does not
2199refer to an asynchronous operation whose return status is not yet known.
2200
2201When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2202function is in fact @code{aio_error64} since the LFS interface
2203transparently replaces the normal implementation.
2204@end deftypefun
2205
2206@comment aio.h
2207@comment Unix98
2208@deftypefun int aio_error64 (const struct aiocb64 *@var{aiocbp})
2209This function is similar to @code{aio_error} with the only difference
2210that the argument is a reference to a variable of type @code{struct
2211aiocb64}.
2212
2213When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2214function is available under the name @code{aio_error} and so
2215transparently replaces the interface for small files on 32 bits
2216machines.
2217@end deftypefun
2218
2219@comment aio.h
2220@comment POSIX.1b
2221@deftypefun ssize_t aio_return (const struct aiocb *@var{aiocbp})
2222This function can be used to retrieve the return status of the operation
2223carried out by the request described in the variable pointed to by
2224@var{aiocbp}. As long as the error status of this request as returned
2225by @code{aio_error} is @code{EINPROGRESS} the return of this function is
2226undefined.
2227
fed8f7f7
UD
2228Once the request is finished this function can be used exactly once to
2229retrieve the return value. Following calls might lead to undefined
a3a4a74e
UD
2230behaviour. The return value itself is the value which would have been
2231returned by the @code{read}, @code{write}, or @code{fsync} call.
2232
2233The function can return @code{ENOSYS} if it is not implemented. It
2234could also return @code{EINVAL} if the @var{aiocbp} parameter does not
2235refer to an asynchronous operation whose return status is not yet known.
2236
2237When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2238function is in fact @code{aio_return64} since the LFS interface
2239transparently replaces the normal implementation.
2240@end deftypefun
2241
2242@comment aio.h
2243@comment Unix98
2244@deftypefun int aio_return64 (const struct aiocb64 *@var{aiocbp})
2245This function is similar to @code{aio_return} with the only difference
2246that the argument is a reference to a variable of type @code{struct
2247aiocb64}.
2248
2249When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2250function is available under the name @code{aio_return} and so
2251transparently replaces the interface for small files on 32 bits
2252machines.
2253@end deftypefun
2254
2255@node Synchronizing AIO Operations
2256@subsection Getting into a Consistent State
2257
2258When dealing with asynchronous operations it is sometimes necessary to
fed8f7f7 2259get into a consistent state. This would mean for AIO that one wants to
a3a4a74e
UD
2260know whether a certain request or a group of request were processed.
2261This could be done by waiting for the notification sent by the system
fed8f7f7 2262after the operation terminated but this sometimes would mean wasting
a3a4a74e
UD
2263resources (mainly computation time). Instead POSIX.1b defines two
2264functions which will help with most kinds of consistency.
2265
2266The @code{aio_fsync} and @code{aio_fsync64} functions are only available
2267if in @file{unistd.h} the symbol @code{_POSIX_SYNCHRONIZED_IO} is
2268defined.
2269
2270@cindex synchronizing
2271@comment aio.h
2272@comment POSIX.1b
2273@deftypefun int aio_fsync (int @var{op}, struct aiocb *@var{aiocbp})
2274Calling this function forces all I/O operations operating queued at the
fed8f7f7 2275time of the function call operating on the file descriptor
a3a4a74e
UD
2276@code{aiocbp->aio_fildes} into the synchronized I/O completion state
2277(@pxref{Synchronizing I/O}). The @code{aio_fsync} function return
2278immediately but the notification through the method described in
2279@code{aiocbp->aio_sigevent} will happen only after all requests for this
2280file descriptor terminated and the file is synchronized. This also
2281means that requests for this very same file descriptor which are queued
2282after the synchronization request are not effected.
2283
2284If @var{op} is @code{O_DSYNC} the synchronization happens as with a call
2285to @code{fdatasync}. Otherwise @var{op} should be @code{O_SYNC} and
fed8f7f7 2286the synchronization happens as with @code{fsync}.
a3a4a74e 2287
fed8f7f7 2288As long as the synchronization has not happened a call to
a3a4a74e 2289@code{aio_error} with the reference to the object pointed to by
fed8f7f7
UD
2290@var{aiocbp} returns @code{EINPROGRESS}. Once the synchronization is
2291done @code{aio_error} return @math{0} if the synchronization was not
a3a4a74e
UD
2292successful. Otherwise the value returned is the value to which the
2293@code{fsync} or @code{fdatasync} function would have set the
2294@code{errno} variable. In this case nothing can be assumed about the
2295consistency for the data written to this file descriptor.
2296
2297The return value of this function is @math{0} if the request was
2298successfully filed. Otherwise the return value is @math{-1} and
2299@code{errno} is set to one of the following values:
2300
2301@table @code
2302@item EAGAIN
fed8f7f7 2303The request could not be enqueued due to temporary lack of resources.
a3a4a74e
UD
2304@item EBADF
2305The file descriptor @code{aiocbp->aio_fildes} is not valid or not open
2306for writing.
2307@item EINVAL
2308The implementation does not support I/O synchronization or the @var{op}
2309parameter is other than @code{O_DSYNC} and @code{O_SYNC}.
2310@item ENOSYS
2311This function is not implemented.
2312@end table
2313
2314When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2315function is in fact @code{aio_return64} since the LFS interface
2316transparently replaces the normal implementation.
2317@end deftypefun
2318
2319@comment aio.h
2320@comment Unix98
2321@deftypefun int aio_fsync64 (int @var{op}, struct aiocb64 *@var{aiocbp})
2322This function is similar to @code{aio_fsync} with the only difference
2323that the argument is a reference to a variable of type @code{struct
2324aiocb64}.
2325
2326When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2327function is available under the name @code{aio_fsync} and so
2328transparently replaces the interface for small files on 32 bits
2329machines.
2330@end deftypefun
2331
fed8f7f7 2332Another method of synchronization is to wait until one or more requests of a
a3a4a74e
UD
2333specific set terminated. This could be achieved by the @code{aio_*}
2334functions to notify the initiating process about the termination but in
2335some situations this is not the ideal solution. In a program which
2336constantly updates clients somehow connected to the server it is not
2337always the best solution to go round robin since some connections might
2338be slow. On the other hand letting the @code{aio_*} function notify the
2339caller might also be not the best solution since whenever the process
2340works on preparing data for on client it makes no sense to be
2341interrupted by a notification since the new client will not be handled
2342before the current client is served. For situations like this
2343@code{aio_suspend} should be used.
2344
2345@comment aio.h
2346@comment POSIX.1b
2347@deftypefun int aio_suspend (const struct aiocb *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout})
2348When calling this function the calling thread is suspended until at
2349least one of the requests pointed to by the @var{nent} elements of the
2350array @var{list} has completed. If any of the requests already has
2351completed at the time @code{aio_suspend} is called the function returns
2352immediately. Whether a request has terminated or not is done by
2353comparing the error status of the request with @code{EINPROGRESS}. If
2354an element of @var{list} is @code{NULL} the entry is simply ignored.
2355
2356If no request has finished the calling process is suspended. If
2357@var{timeout} is @code{NULL} the process is not waked until a request
2358finished. If @var{timeout} is not @code{NULL} the process remains
2359suspended at as long as specified in @var{timeout}. In this case
2360@code{aio_suspend} returns with an error.
2361
fed8f7f7 2362The return value of the function is @math{0} if one or more requests
a3a4a74e
UD
2363from the @var{list} have terminated. Otherwise the function returns
2364@math{-1} and @code{errno} is set to one of the following values:
2365
2366@table @code
2367@item EAGAIN
2368None of the requests from the @var{list} completed in the time specified
2369by @var{timeout}.
2370@item EINTR
2371A signal interrupted the @code{aio_suspend} function. This signal might
2372also be sent by the AIO implementation while signalling the termination
2373of one of the requests.
2374@item ENOSYS
2375The @code{aio_suspend} function is not implemented.
2376@end table
2377
2378When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2379function is in fact @code{aio_suspend64} since the LFS interface
2380transparently replaces the normal implementation.
2381@end deftypefun
2382
2383@comment aio.h
2384@comment Unix98
2385@deftypefun int aio_suspend64 (const struct aiocb64 *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout})
2386This function is similar to @code{aio_suspend} with the only difference
2387that the argument is a reference to a variable of type @code{struct
2388aiocb64}.
2389
2390When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2391function is available under the name @code{aio_suspend} and so
2392transparently replaces the interface for small files on 32 bits
2393machines.
2394@end deftypefun
b07d03e0
UD
2395
2396@node Cancel AIO Operations
2397@subsection Cancelation of AIO Operations
2398
a3a4a74e
UD
2399When one or more requests are asynchronously processed it might be
2400useful in some situations to cancel a selected operation, e.g., if it
2401becomes obvious that the written data is not anymore accurate and would
2402have to be overwritten soon. As an example assume an application, which
2403writes data in files in a situation where new incoming data would have
2404to be written in a file which will be updated by an enqueued request.
2405The POSIX AIO implementation provides such a function but this function
2406is not capable to force the cancelation of the request. It is up to the
2407implementation to decide whether it is possible to cancel the operation
2408or not. Therefore using this function is merely a hint.
2409
2410@comment aio.h
2411@comment POSIX.1b
2412@deftypefun int aio_cancel (int @var{fildes}, struct aiocb *@var{aiocbp})
2413The @code{aio_cancel} function can be used to cancel one or more
2414outstanding requests. If the @var{aiocbp} parameter is @code{NULL} the
2415function tries to cancel all outstanding requests which would process
2416the file descriptor @var{fildes} (i.e.,, whose @code{aio_fildes} member
2417is @var{fildes}). If @var{aiocbp} is not @code{NULL} the very specific
2418request pointed to by @var{aiocbp} is tried to be canceled.
2419
2420For requests which were successfully canceled the normal notification
2421about the termination of the request should take place. I.e., depending
2422on the @code{struct sigevent} object which controls this, nothing
2423happens, a signal is sent or a thread is started. If the request cannot
2424be canceled it terminates the usual way after performing te operation.
2425
2426After a request is successfully canceled a call to @code{aio_error} with
2427a reference to this request as the parameter will return
2428@code{ECANCELED} and a call to @code{aio_return} will return @math{-1}.
2429If the request wasn't canceled and is still running the error status is
2430still @code{EINPROGRESS}.
2431
2432The return value of the function is @code{AIO_CANCELED} if there were
2433requests which haven't terminated and which successfully were canceled.
2434If there is one or more request left which couldn't be canceled the
2435return value is @code{AIO_NOTCANCELED}. In this case @code{aio_error}
2436must be used to find out which of the perhaps multiple requests (in
2437@var{aiocbp} is @code{NULL}) wasn't successfully canceled. If all
2438requests already terminated at the time @code{aio_cancel} is called the
2439return value is @code{AIO_ALLDONE}.
2440
2441If an error occurred during the execution of @code{aio_cancel} the
2442function returns @math{-1} and sets @code{errno} to one of the following
2443values.
2444
2445@table @code
2446@item EBADF
2447The file descriptor @var{fildes} is not valid.
2448@item ENOSYS
2449@code{aio_cancel} is not implemented.
2450@end table
2451
2452When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2453function is in fact @code{aio_cancel64} since the LFS interface
2454transparently replaces the normal implementation.
2455@end deftypefun
2456
2457@comment aio.h
2458@comment Unix98
2459@deftypefun int aio_cancel64 (int @var{fildes}, struct aiocb *@var{aiocbp})
2460This function is similar to @code{aio_cancel} with the only difference
2461that the argument is a reference to a variable of type @code{struct
2462aiocb64}.
2463
2464When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
2465function is available under the name @code{aio_cancel} and so
2466transparently replaces the interface for small files on 32 bits
2467machines.
2468@end deftypefun
2469
2470@node Configuration of AIO
2471@subsection How to optimize the AIO implementation
2472
2473The POSIX standard does not specify how the AIO functions are
2474implemented. They could be system calls but it is also possible to
2475emulate them at userlevel.
2476
fed8f7f7 2477At least the available implementation at the point of this writing is a
a3a4a74e
UD
2478userlevel implementation which uses threads for handling the enqueued
2479requests. This implementation requires to make some decisions about
2480limitations but hard limitations are something which better should be
2481avoided the GNU C library implementation provides a mean to tune the AIO
2482implementation individually for each use.
2483
2484@comment aio.h
2485@comment GNU
2486@deftp {Data Type} {struct aioinit}
2487This data type is used to pass the configuration or tunable parameters
2488to the implementation. The program has to initialize the members of
2489this struct and pass it to the implementation using the @code{aio_init}
2490function.
2491
2492@table @code
2493@item int aio_threads
2494This member specifies the maximal number of threads which must be used
2495at any one time.
2496@item int aio_num
c756c71c 2497This number provides an estimate on the maximal number of simultaneously
a3a4a74e
UD
2498enqueued requests.
2499@item int aio_locks
2500@c What?
2501@item int aio_usedba
2502@c What?
2503@item int aio_debug
2504@c What?
2505@item int aio_numusers
2506@c What?
2507@item int aio_reserved[2]
2508@c What?
2509@end table
2510@end deftp
2511
2512@comment aio.h
2513@comment GNU
2514@deftypefun void aio_init (const struct aioinit *@var{init})
2515This function must be called before any other AIO function. Calling it
2516is completely voluntarily since it only is meant to help the AIO
2517implementation to perform better.
2518
2519Before calling the @code{aio_init} function the members of a variable of
2520type @code{struct aioinit} must be initialized. Then a reference to
2521this variable is passed as the parameter to @code{aio_init} which itself
2522may or may not pay attention to the hints.
2523
c756c71c
UD
2524The function has no return value and no error cases are defined. It is
2525a extension which follows a proposal from the SGI implementation in
2526@w{Irix 6}. It is not covered by POSIX.1b or Unix98.
a3a4a74e 2527@end deftypefun
b07d03e0 2528
28f540f4
RM
2529@node Control Operations
2530@section Control Operations on Files
2531
2532@cindex control operations on files
2533@cindex @code{fcntl} function
2534This section describes how you can perform various other operations on
2535file descriptors, such as inquiring about or setting flags describing
2536the status of the file descriptor, manipulating record locks, and the
2537like. All of these operations are performed by the function @code{fcntl}.
2538
2539The second argument to the @code{fcntl} function is a command that
2540specifies which operation to perform. The function and macros that name
2541various flags that are used with it are declared in the header file
2542@file{fcntl.h}. Many of these flags are also used by the @code{open}
2543function; see @ref{Opening and Closing Files}.
2544@pindex fcntl.h
2545
2546@comment fcntl.h
2547@comment POSIX.1
2548@deftypefun int fcntl (int @var{filedes}, int @var{command}, @dots{})
2549The @code{fcntl} function performs the operation specified by
2550@var{command} on the file descriptor @var{filedes}. Some commands
2551require additional arguments to be supplied. These additional arguments
2552and the return value and error conditions are given in the detailed
2553descriptions of the individual commands.
2554
2555Briefly, here is a list of what the various commands are.
2556
2557@table @code
2558@item F_DUPFD
2559Duplicate the file descriptor (return another file descriptor pointing
2560to the same open file). @xref{Duplicating Descriptors}.
2561
2562@item F_GETFD
2563Get flags associated with the file descriptor. @xref{Descriptor Flags}.
2564
2565@item F_SETFD
2566Set flags associated with the file descriptor. @xref{Descriptor Flags}.
2567
2568@item F_GETFL
2569Get flags associated with the open file. @xref{File Status Flags}.
2570
2571@item F_SETFL
2572Set flags associated with the open file. @xref{File Status Flags}.
2573
2574@item F_GETLK
2575Get a file lock. @xref{File Locks}.
2576
2577@item F_SETLK
2578Set or clear a file lock. @xref{File Locks}.
2579
2580@item F_SETLKW
2581Like @code{F_SETLK}, but wait for completion. @xref{File Locks}.
2582
2583@item F_GETOWN
2584Get process or process group ID to receive @code{SIGIO} signals.
2585@xref{Interrupt Input}.
2586
2587@item F_SETOWN
2588Set process or process group ID to receive @code{SIGIO} signals.
2589@xref{Interrupt Input}.
2590@end table
dfd2257a
UD
2591
2592This function is a cancelation point in multi-threaded programs. This
2593is a problem if the thread allocates some resources (like memory, file
2594descriptors, semaphores or whatever) at the time @code{fcntl} is
2595called. If the thread gets canceled these resources stay allocated
2596until the program ends. To avoid this calls to @code{fcntl} should be
2597protected using cancelation handlers.
2598@c ref pthread_cleanup_push / pthread_cleanup_pop
28f540f4
RM
2599@end deftypefun
2600
2601
2602@node Duplicating Descriptors
2603@section Duplicating Descriptors
2604
2605@cindex duplicating file descriptors
2606@cindex redirecting input and output
2607
2608You can @dfn{duplicate} a file descriptor, or allocate another file
2609descriptor that refers to the same open file as the original. Duplicate
2610descriptors share one file position and one set of file status flags
2611(@pxref{File Status Flags}), but each has its own set of file descriptor
2612flags (@pxref{Descriptor Flags}).
2613
2614The major use of duplicating a file descriptor is to implement
2615@dfn{redirection} of input or output: that is, to change the
2616file or pipe that a particular file descriptor corresponds to.
2617
2618You can perform this operation using the @code{fcntl} function with the
2619@code{F_DUPFD} command, but there are also convenient functions
2620@code{dup} and @code{dup2} for duplicating descriptors.
2621
2622@pindex unistd.h
2623@pindex fcntl.h
2624The @code{fcntl} function and flags are declared in @file{fcntl.h},
2625while prototypes for @code{dup} and @code{dup2} are in the header file
2626@file{unistd.h}.
2627
2628@comment unistd.h
2629@comment POSIX.1
2630@deftypefun int dup (int @var{old})
2631This function copies descriptor @var{old} to the first available
2632descriptor number (the first number not currently open). It is
2633equivalent to @code{fcntl (@var{old}, F_DUPFD, 0)}.
2634@end deftypefun
2635
2636@comment unistd.h
2637@comment POSIX.1
2638@deftypefun int dup2 (int @var{old}, int @var{new})
2639This function copies the descriptor @var{old} to descriptor number
2640@var{new}.
2641
2642If @var{old} is an invalid descriptor, then @code{dup2} does nothing; it
2643does not close @var{new}. Otherwise, the new duplicate of @var{old}
2644replaces any previous meaning of descriptor @var{new}, as if @var{new}
2645were closed first.
2646
2647If @var{old} and @var{new} are different numbers, and @var{old} is a
2648valid descriptor number, then @code{dup2} is equivalent to:
2649
2650@smallexample
2651close (@var{new});
2652fcntl (@var{old}, F_DUPFD, @var{new})
2653@end smallexample
2654
2655However, @code{dup2} does this atomically; there is no instant in the
2656middle of calling @code{dup2} at which @var{new} is closed and not yet a
2657duplicate of @var{old}.
2658@end deftypefun
2659
2660@comment fcntl.h
2661@comment POSIX.1
2662@deftypevr Macro int F_DUPFD
2663This macro is used as the @var{command} argument to @code{fcntl}, to
2664copy the file descriptor given as the first argument.
2665
2666The form of the call in this case is:
2667
2668@smallexample
2669fcntl (@var{old}, F_DUPFD, @var{next-filedes})
2670@end smallexample
2671
2672The @var{next-filedes} argument is of type @code{int} and specifies that
2673the file descriptor returned should be the next available one greater
2674than or equal to this value.
2675
2676The return value from @code{fcntl} with this command is normally the value
07435eb4 2677of the new file descriptor. A return value of @math{-1} indicates an
28f540f4
RM
2678error. The following @code{errno} error conditions are defined for
2679this command:
2680
2681@table @code
2682@item EBADF
2683The @var{old} argument is invalid.
2684
2685@item EINVAL
2686The @var{next-filedes} argument is invalid.
2687
2688@item EMFILE
2689There are no more file descriptors available---your program is already
2690using the maximum. In BSD and GNU, the maximum is controlled by a
2691resource limit that can be changed; @pxref{Limits on Resources}, for
2692more information about the @code{RLIMIT_NOFILE} limit.
2693@end table
2694
2695@code{ENFILE} is not a possible error code for @code{dup2} because
2696@code{dup2} does not create a new opening of a file; duplicate
2697descriptors do not count toward the limit which @code{ENFILE}
2698indicates. @code{EMFILE} is possible because it refers to the limit on
2699distinct descriptor numbers in use in one process.
2700@end deftypevr
2701
2702Here is an example showing how to use @code{dup2} to do redirection.
2703Typically, redirection of the standard streams (like @code{stdin}) is
2704done by a shell or shell-like program before calling one of the
2705@code{exec} functions (@pxref{Executing a File}) to execute a new
2706program in a child process. When the new program is executed, it
2707creates and initializes the standard streams to point to the
2708corresponding file descriptors, before its @code{main} function is
2709invoked.
2710
2711So, to redirect standard input to a file, the shell could do something
2712like:
2713
2714@smallexample
2715pid = fork ();
2716if (pid == 0)
2717 @{
2718 char *filename;
2719 char *program;
2720 int file;
2721 @dots{}
2722 file = TEMP_FAILURE_RETRY (open (filename, O_RDONLY));
2723 dup2 (file, STDIN_FILENO);
2724 TEMP_FAILURE_RETRY (close (file));
2725 execv (program, NULL);
2726 @}
2727@end smallexample
2728
2729There is also a more detailed example showing how to implement redirection
2730in the context of a pipeline of processes in @ref{Launching Jobs}.
2731
2732
2733@node Descriptor Flags
2734@section File Descriptor Flags
2735@cindex file descriptor flags
2736
2737@dfn{File descriptor flags} are miscellaneous attributes of a file
2738descriptor. These flags are associated with particular file
2739descriptors, so that if you have created duplicate file descriptors
2740from a single opening of a file, each descriptor has its own set of flags.
2741
2742Currently there is just one file descriptor flag: @code{FD_CLOEXEC},
2743which causes the descriptor to be closed if you use any of the
2744@code{exec@dots{}} functions (@pxref{Executing a File}).
2745
2746The symbols in this section are defined in the header file
2747@file{fcntl.h}.
2748@pindex fcntl.h
2749
2750@comment fcntl.h
2751@comment POSIX.1
2752@deftypevr Macro int F_GETFD
2753This macro is used as the @var{command} argument to @code{fcntl}, to
2754specify that it should return the file descriptor flags associated
2c6fe0bd 2755with the @var{filedes} argument.
28f540f4
RM
2756
2757The normal return value from @code{fcntl} with this command is a
2758nonnegative number which can be interpreted as the bitwise OR of the
2759individual flags (except that currently there is only one flag to use).
2760
07435eb4 2761In case of an error, @code{fcntl} returns @math{-1}. The following
28f540f4
RM
2762@code{errno} error conditions are defined for this command:
2763
2764@table @code
2765@item EBADF
2766The @var{filedes} argument is invalid.
2767@end table
2768@end deftypevr
2769
2770
2771@comment fcntl.h
2772@comment POSIX.1
2773@deftypevr Macro int F_SETFD
2774This macro is used as the @var{command} argument to @code{fcntl}, to
2775specify that it should set the file descriptor flags associated with the
2776@var{filedes} argument. This requires a third @code{int} argument to
2777specify the new flags, so the form of the call is:
2778
2779@smallexample
2780fcntl (@var{filedes}, F_SETFD, @var{new-flags})
2781@end smallexample
2782
2783The normal return value from @code{fcntl} with this command is an
07435eb4 2784unspecified value other than @math{-1}, which indicates an error.
28f540f4
RM
2785The flags and error conditions are the same as for the @code{F_GETFD}
2786command.
2787@end deftypevr
2788
2789The following macro is defined for use as a file descriptor flag with
2790the @code{fcntl} function. The value is an integer constant usable
2791as a bit mask value.
2792
2793@comment fcntl.h
2794@comment POSIX.1
2795@deftypevr Macro int FD_CLOEXEC
2796@cindex close-on-exec (file descriptor flag)
2797This flag specifies that the file descriptor should be closed when
2798an @code{exec} function is invoked; see @ref{Executing a File}. When
2799a file descriptor is allocated (as with @code{open} or @code{dup}),
2800this bit is initially cleared on the new file descriptor, meaning that
2801descriptor will survive into the new program after @code{exec}.
2802@end deftypevr
2803
2804If you want to modify the file descriptor flags, you should get the
2805current flags with @code{F_GETFD} and modify the value. Don't assume
2806that the flags listed here are the only ones that are implemented; your
2807program may be run years from now and more flags may exist then. For
2808example, here is a function to set or clear the flag @code{FD_CLOEXEC}
2809without altering any other flags:
2810
2811@smallexample
2812/* @r{Set the @code{FD_CLOEXEC} flag of @var{desc} if @var{value} is nonzero,}
2813 @r{or clear the flag if @var{value} is 0.}
2c6fe0bd 2814 @r{Return 0 on success, or -1 on error with @code{errno} set.} */
28f540f4
RM
2815
2816int
2817set_cloexec_flag (int desc, int value)
2818@{
2819 int oldflags = fcntl (desc, F_GETFD, 0);
2820 /* @r{If reading the flags failed, return error indication now.}
2821 if (oldflags < 0)
2822 return oldflags;
2823 /* @r{Set just the flag we want to set.} */
2824 if (value != 0)
2825 oldflags |= FD_CLOEXEC;
2826 else
2827 oldflags &= ~FD_CLOEXEC;
2828 /* @r{Store modified flag word in the descriptor.} */
2829 return fcntl (desc, F_SETFD, oldflags);
2830@}
2831@end smallexample
2832
2833@node File Status Flags
2834@section File Status Flags
2835@cindex file status flags
2836
2837@dfn{File status flags} are used to specify attributes of the opening of a
2838file. Unlike the file descriptor flags discussed in @ref{Descriptor
2839Flags}, the file status flags are shared by duplicated file descriptors
2840resulting from a single opening of the file. The file status flags are
2841specified with the @var{flags} argument to @code{open};
2842@pxref{Opening and Closing Files}.
2843
2844File status flags fall into three categories, which are described in the
2845following sections.
2846
2847@itemize @bullet
2848@item
2849@ref{Access Modes}, specify what type of access is allowed to the
2850file: reading, writing, or both. They are set by @code{open} and are
2851returned by @code{fcntl}, but cannot be changed.
2852
2853@item
2854@ref{Open-time Flags}, control details of what @code{open} will do.
2855These flags are not preserved after the @code{open} call.
2856
2857@item
2858@ref{Operating Modes}, affect how operations such as @code{read} and
2859@code{write} are done. They are set by @code{open}, and can be fetched or
2860changed with @code{fcntl}.
2861@end itemize
2862
2863The symbols in this section are defined in the header file
2864@file{fcntl.h}.
2865@pindex fcntl.h
2866
2867@menu
2868* Access Modes:: Whether the descriptor can read or write.
2869* Open-time Flags:: Details of @code{open}.
2870* Operating Modes:: Special modes to control I/O operations.
2871* Getting File Status Flags:: Fetching and changing these flags.
2872@end menu
2873
2874@node Access Modes
2875@subsection File Access Modes
2876
2877The file access modes allow a file descriptor to be used for reading,
2878writing, or both. (In the GNU system, they can also allow none of these,
2879and allow execution of the file as a program.) The access modes are chosen
2880when the file is opened, and never change.
2881
2882@comment fcntl.h
2883@comment POSIX.1
2884@deftypevr Macro int O_RDONLY
2885Open the file for read access.
2886@end deftypevr
2887
2888@comment fcntl.h
2889@comment POSIX.1
2890@deftypevr Macro int O_WRONLY
2891Open the file for write access.
2892@end deftypevr
2893
2894@comment fcntl.h
2895@comment POSIX.1
2896@deftypevr Macro int O_RDWR
2897Open the file for both reading and writing.
2898@end deftypevr
2899
2900In the GNU system (and not in other systems), @code{O_RDONLY} and
2901@code{O_WRONLY} are independent bits that can be bitwise-ORed together,
2902and it is valid for either bit to be set or clear. This means that
2903@code{O_RDWR} is the same as @code{O_RDONLY|O_WRONLY}. A file access
2904mode of zero is permissible; it allows no operations that do input or
2905output to the file, but does allow other operations such as
2906@code{fchmod}. On the GNU system, since ``read-only'' or ``write-only''
2907is a misnomer, @file{fcntl.h} defines additional names for the file
2908access modes. These names are preferred when writing GNU-specific code.
2909But most programs will want to be portable to other POSIX.1 systems and
2910should use the POSIX.1 names above instead.
2911
2912@comment fcntl.h
2913@comment GNU
2914@deftypevr Macro int O_READ
2915Open the file for reading. Same as @code{O_RDWR}; only defined on GNU.
2916@end deftypevr
2917
2918@comment fcntl.h
2919@comment GNU
2920@deftypevr Macro int O_WRITE
2921Open the file for reading. Same as @code{O_WRONLY}; only defined on GNU.
2922@end deftypevr
2923
2924@comment fcntl.h
2925@comment GNU
2926@deftypevr Macro int O_EXEC
2927Open the file for executing. Only defined on GNU.
2928@end deftypevr
2929
2930To determine the file access mode with @code{fcntl}, you must extract
2931the access mode bits from the retrieved file status flags. In the GNU
2932system, you can just test the @code{O_READ} and @code{O_WRITE} bits in
2933the flags word. But in other POSIX.1 systems, reading and writing
2934access modes are not stored as distinct bit flags. The portable way to
2935extract the file access mode bits is with @code{O_ACCMODE}.
2936
2937@comment fcntl.h
2938@comment POSIX.1
2939@deftypevr Macro int O_ACCMODE
2940This macro stands for a mask that can be bitwise-ANDed with the file
2941status flag value to produce a value representing the file access mode.
2942The mode will be @code{O_RDONLY}, @code{O_WRONLY}, or @code{O_RDWR}.
2943(In the GNU system it could also be zero, and it never includes the
2944@code{O_EXEC} bit.)
2945@end deftypevr
2946
2947@node Open-time Flags
2948@subsection Open-time Flags
2949
2950The open-time flags specify options affecting how @code{open} will behave.
2951These options are not preserved once the file is open. The exception to
2952this is @code{O_NONBLOCK}, which is also an I/O operating mode and so it
2953@emph{is} saved. @xref{Opening and Closing Files}, for how to call
2954@code{open}.
2955
2956There are two sorts of options specified by open-time flags.
2957
2958@itemize @bullet
2959@item
2960@dfn{File name translation flags} affect how @code{open} looks up the
2961file name to locate the file, and whether the file can be created.
2962@cindex file name translation flags
2963@cindex flags, file name translation
2964
2965@item
2966@dfn{Open-time action flags} specify extra operations that @code{open} will
2967perform on the file once it is open.
2968@cindex open-time action flags
2969@cindex flags, open-time action
2970@end itemize
2971
2972Here are the file name translation flags.
2973
2974@comment fcntl.h
2975@comment POSIX.1
2976@deftypevr Macro int O_CREAT
2977If set, the file will be created if it doesn't already exist.
2978@c !!! mode arg, umask
2979@cindex create on open (file status flag)
2980@end deftypevr
2981
2982@comment fcntl.h
2983@comment POSIX.1
2984@deftypevr Macro int O_EXCL
2985If both @code{O_CREAT} and @code{O_EXCL} are set, then @code{open} fails
2986if the specified file already exists. This is guaranteed to never
2987clobber an existing file.
2988@end deftypevr
2989
2990@comment fcntl.h
2991@comment POSIX.1
2992@deftypevr Macro int O_NONBLOCK
2993@cindex non-blocking open
2994This prevents @code{open} from blocking for a ``long time'' to open the
2995file. This is only meaningful for some kinds of files, usually devices
2996such as serial ports; when it is not meaningful, it is harmless and
2997ignored. Often opening a port to a modem blocks until the modem reports
2998carrier detection; if @code{O_NONBLOCK} is specified, @code{open} will
2999return immediately without a carrier.
3000
3001Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O operating
3002mode and a file name translation flag. This means that specifying
3003@code{O_NONBLOCK} in @code{open} also sets nonblocking I/O mode;
3004@pxref{Operating Modes}. To open the file without blocking but do normal
3005I/O that blocks, you must call @code{open} with @code{O_NONBLOCK} set and
3006then call @code{fcntl} to turn the bit off.
3007@end deftypevr
3008
3009@comment fcntl.h
3010@comment POSIX.1
3011@deftypevr Macro int O_NOCTTY
3012If the named file is a terminal device, don't make it the controlling
3013terminal for the process. @xref{Job Control}, for information about
3014what it means to be the controlling terminal.
3015
3016In the GNU system and 4.4 BSD, opening a file never makes it the
3017controlling terminal and @code{O_NOCTTY} is zero. However, other
3018systems may use a nonzero value for @code{O_NOCTTY} and set the
3019controlling terminal when you open a file that is a terminal device; so
3020to be portable, use @code{O_NOCTTY} when it is important to avoid this.
3021@cindex controlling terminal, setting
3022@end deftypevr
3023
3024The following three file name translation flags exist only in the GNU system.
3025
3026@comment fcntl.h
3027@comment GNU
3028@deftypevr Macro int O_IGNORE_CTTY
3029Do not recognize the named file as the controlling terminal, even if it
3030refers to the process's existing controlling terminal device. Operations
3031on the new file descriptor will never induce job control signals.
3032@xref{Job Control}.
3033@end deftypevr
3034
3035@comment fcntl.h
3036@comment GNU
3037@deftypevr Macro int O_NOLINK
3038If the named file is a symbolic link, open the link itself instead of
3039the file it refers to. (@code{fstat} on the new file descriptor will
3040return the information returned by @code{lstat} on the link's name.)
3041@cindex symbolic link, opening
3042@end deftypevr
3043
3044@comment fcntl.h
3045@comment GNU
3046@deftypevr Macro int O_NOTRANS
3047If the named file is specially translated, do not invoke the translator.
3048Open the bare file the translator itself sees.
3049@end deftypevr
3050
3051
3052The open-time action flags tell @code{open} to do additional operations
3053which are not really related to opening the file. The reason to do them
3054as part of @code{open} instead of in separate calls is that @code{open}
3055can do them @i{atomically}.
3056
3057@comment fcntl.h
3058@comment POSIX.1
3059@deftypevr Macro int O_TRUNC
3060Truncate the file to zero length. This option is only useful for
3061regular files, not special files such as directories or FIFOs. POSIX.1
3062requires that you open the file for writing to use @code{O_TRUNC}. In
3063BSD and GNU you must have permission to write the file to truncate it,
3064but you need not open for write access.
3065
3066This is the only open-time action flag specified by POSIX.1. There is
3067no good reason for truncation to be done by @code{open}, instead of by
3068calling @code{ftruncate} afterwards. The @code{O_TRUNC} flag existed in
3069Unix before @code{ftruncate} was invented, and is retained for backward
3070compatibility.
3071@end deftypevr
3072
27e309c1
UD
3073The remaining operating modes are BSD extensions. They exist only
3074on some systems. On other systems, these macros are not defined.
3075
28f540f4
RM
3076@comment fcntl.h
3077@comment BSD
3078@deftypevr Macro int O_SHLOCK
3079Acquire a shared lock on the file, as with @code{flock}.
3080@xref{File Locks}.
3081
3082If @code{O_CREAT} is specified, the locking is done atomically when
3083creating the file. You are guaranteed that no other process will get
3084the lock on the new file first.
3085@end deftypevr
3086
3087@comment fcntl.h
3088@comment BSD
3089@deftypevr Macro int O_EXLOCK
3090Acquire an exclusive lock on the file, as with @code{flock}.
3091@xref{File Locks}. This is atomic like @code{O_SHLOCK}.
3092@end deftypevr
3093
3094@node Operating Modes
3095@subsection I/O Operating Modes
3096
3097The operating modes affect how input and output operations using a file
3098descriptor work. These flags are set by @code{open} and can be fetched
3099and changed with @code{fcntl}.
3100
3101@comment fcntl.h
3102@comment POSIX.1
3103@deftypevr Macro int O_APPEND
3104The bit that enables append mode for the file. If set, then all
3105@code{write} operations write the data at the end of the file, extending
3106it, regardless of the current file position. This is the only reliable
3107way to append to a file. In append mode, you are guaranteed that the
3108data you write will always go to the current end of the file, regardless
3109of other processes writing to the file. Conversely, if you simply set
3110the file position to the end of file and write, then another process can
3111extend the file after you set the file position but before you write,
3112resulting in your data appearing someplace before the real end of file.
3113@end deftypevr
3114
3115@comment fcntl.h
3116@comment POSIX.1
2c6fe0bd 3117@deftypevr Macro int O_NONBLOCK
28f540f4
RM
3118The bit that enables nonblocking mode for the file. If this bit is set,
3119@code{read} requests on the file can return immediately with a failure
3120status if there is no input immediately available, instead of blocking.
3121Likewise, @code{write} requests can also return immediately with a
3122failure status if the output can't be written immediately.
3123
3124Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O
3125operating mode and a file name translation flag; @pxref{Open-time Flags}.
3126@end deftypevr
3127
3128@comment fcntl.h
3129@comment BSD
3130@deftypevr Macro int O_NDELAY
3131This is an obsolete name for @code{O_NONBLOCK}, provided for
3132compatibility with BSD. It is not defined by the POSIX.1 standard.
3133@end deftypevr
3134
3135The remaining operating modes are BSD and GNU extensions. They exist only
3136on some systems. On other systems, these macros are not defined.
3137
3138@comment fcntl.h
3139@comment BSD
3140@deftypevr Macro int O_ASYNC
3141The bit that enables asynchronous input mode. If set, then @code{SIGIO}
3142signals will be generated when input is available. @xref{Interrupt Input}.
3143
3144Asynchronous input mode is a BSD feature.
3145@end deftypevr
3146
3147@comment fcntl.h
3148@comment BSD
3149@deftypevr Macro int O_FSYNC
3150The bit that enables synchronous writing for the file. If set, each
3151@code{write} call will make sure the data is reliably stored on disk before
3152returning. @c !!! xref fsync
3153
3154Synchronous writing is a BSD feature.
3155@end deftypevr
3156
3157@comment fcntl.h
3158@comment BSD
3159@deftypevr Macro int O_SYNC
3160This is another name for @code{O_FSYNC}. They have the same value.
3161@end deftypevr
3162
3163@comment fcntl.h
3164@comment GNU
3165@deftypevr Macro int O_NOATIME
3166If this bit is set, @code{read} will not update the access time of the
3167file. @xref{File Times}. This is used by programs that do backups, so
3168that backing a file up does not count as reading it.
3169Only the owner of the file or the superuser may use this bit.
3170
3171This is a GNU extension.
3172@end deftypevr
3173
3174@node Getting File Status Flags
3175@subsection Getting and Setting File Status Flags
3176
3177The @code{fcntl} function can fetch or change file status flags.
3178
3179@comment fcntl.h
3180@comment POSIX.1
3181@deftypevr Macro int F_GETFL
3182This macro is used as the @var{command} argument to @code{fcntl}, to
3183read the file status flags for the open file with descriptor
3184@var{filedes}.
3185
3186The normal return value from @code{fcntl} with this command is a
3187nonnegative number which can be interpreted as the bitwise OR of the
3188individual flags. Since the file access modes are not single-bit values,
3189you can mask off other bits in the returned flags with @code{O_ACCMODE}
3190to compare them.
3191
07435eb4 3192In case of an error, @code{fcntl} returns @math{-1}. The following
28f540f4
RM
3193@code{errno} error conditions are defined for this command:
3194
3195@table @code
3196@item EBADF
3197The @var{filedes} argument is invalid.
3198@end table
3199@end deftypevr
3200
3201@comment fcntl.h
3202@comment POSIX.1
3203@deftypevr Macro int F_SETFL
3204This macro is used as the @var{command} argument to @code{fcntl}, to set
3205the file status flags for the open file corresponding to the
3206@var{filedes} argument. This command requires a third @code{int}
3207argument to specify the new flags, so the call looks like this:
3208
3209@smallexample
3210fcntl (@var{filedes}, F_SETFL, @var{new-flags})
3211@end smallexample
3212
3213You can't change the access mode for the file in this way; that is,
3214whether the file descriptor was opened for reading or writing.
3215
3216The normal return value from @code{fcntl} with this command is an
07435eb4 3217unspecified value other than @math{-1}, which indicates an error. The
28f540f4
RM
3218error conditions are the same as for the @code{F_GETFL} command.
3219@end deftypevr
3220
3221If you want to modify the file status flags, you should get the current
3222flags with @code{F_GETFL} and modify the value. Don't assume that the
3223flags listed here are the only ones that are implemented; your program
3224may be run years from now and more flags may exist then. For example,
3225here is a function to set or clear the flag @code{O_NONBLOCK} without
3226altering any other flags:
3227
3228@smallexample
3229@group
3230/* @r{Set the @code{O_NONBLOCK} flag of @var{desc} if @var{value} is nonzero,}
3231 @r{or clear the flag if @var{value} is 0.}
2c6fe0bd 3232 @r{Return 0 on success, or -1 on error with @code{errno} set.} */
28f540f4
RM
3233
3234int
3235set_nonblock_flag (int desc, int value)
3236@{
3237 int oldflags = fcntl (desc, F_GETFL, 0);
3238 /* @r{If reading the flags failed, return error indication now.} */
3239 if (oldflags == -1)
3240 return -1;
3241 /* @r{Set just the flag we want to set.} */
3242 if (value != 0)
3243 oldflags |= O_NONBLOCK;
3244 else
3245 oldflags &= ~O_NONBLOCK;
3246 /* @r{Store modified flag word in the descriptor.} */
3247 return fcntl (desc, F_SETFL, oldflags);
3248@}
3249@end group
3250@end smallexample
3251
3252@node File Locks
3253@section File Locks
3254
3255@cindex file locks
3256@cindex record locking
3257The remaining @code{fcntl} commands are used to support @dfn{record
3258locking}, which permits multiple cooperating programs to prevent each
3259other from simultaneously accessing parts of a file in error-prone
3260ways.
3261
3262@cindex exclusive lock
3263@cindex write lock
3264An @dfn{exclusive} or @dfn{write} lock gives a process exclusive access
3265for writing to the specified part of the file. While a write lock is in
3266place, no other process can lock that part of the file.
3267
3268@cindex shared lock
3269@cindex read lock
3270A @dfn{shared} or @dfn{read} lock prohibits any other process from
3271requesting a write lock on the specified part of the file. However,
3272other processes can request read locks.
3273
3274The @code{read} and @code{write} functions do not actually check to see
3275whether there are any locks in place. If you want to implement a
3276locking protocol for a file shared by multiple processes, your application
3277must do explicit @code{fcntl} calls to request and clear locks at the
3278appropriate points.
3279
3280Locks are associated with processes. A process can only have one kind
3281of lock set for each byte of a given file. When any file descriptor for
3282that file is closed by the process, all of the locks that process holds
3283on that file are released, even if the locks were made using other
3284descriptors that remain open. Likewise, locks are released when a
3285process exits, and are not inherited by child processes created using
3286@code{fork} (@pxref{Creating a Process}).
3287
3288When making a lock, use a @code{struct flock} to specify what kind of
3289lock and where. This data type and the associated macros for the
3290@code{fcntl} function are declared in the header file @file{fcntl.h}.
3291@pindex fcntl.h
3292
3293@comment fcntl.h
3294@comment POSIX.1
3295@deftp {Data Type} {struct flock}
3296This structure is used with the @code{fcntl} function to describe a file
3297lock. It has these members:
3298
3299@table @code
3300@item short int l_type
3301Specifies the type of the lock; one of @code{F_RDLCK}, @code{F_WRLCK}, or
3302@code{F_UNLCK}.
3303
3304@item short int l_whence
3305This corresponds to the @var{whence} argument to @code{fseek} or
3306@code{lseek}, and specifies what the offset is relative to. Its value
3307can be one of @code{SEEK_SET}, @code{SEEK_CUR}, or @code{SEEK_END}.
3308
3309@item off_t l_start
3310This specifies the offset of the start of the region to which the lock
3311applies, and is given in bytes relative to the point specified by
3312@code{l_whence} member.
3313
3314@item off_t l_len
3315This specifies the length of the region to be locked. A value of
3316@code{0} is treated specially; it means the region extends to the end of
3317the file.
3318
3319@item pid_t l_pid
3320This field is the process ID (@pxref{Process Creation Concepts}) of the
3321process holding the lock. It is filled in by calling @code{fcntl} with
3322the @code{F_GETLK} command, but is ignored when making a lock.
3323@end table
3324@end deftp
3325
3326@comment fcntl.h
3327@comment POSIX.1
3328@deftypevr Macro int F_GETLK
3329This macro is used as the @var{command} argument to @code{fcntl}, to
3330specify that it should get information about a lock. This command
3331requires a third argument of type @w{@code{struct flock *}} to be passed
3332to @code{fcntl}, so that the form of the call is:
3333
3334@smallexample
3335fcntl (@var{filedes}, F_GETLK, @var{lockp})
3336@end smallexample
3337
3338If there is a lock already in place that would block the lock described
3339by the @var{lockp} argument, information about that lock overwrites
3340@code{*@var{lockp}}. Existing locks are not reported if they are
3341compatible with making a new lock as specified. Thus, you should
3342specify a lock type of @code{F_WRLCK} if you want to find out about both
3343read and write locks, or @code{F_RDLCK} if you want to find out about
3344write locks only.
3345
3346There might be more than one lock affecting the region specified by the
3347@var{lockp} argument, but @code{fcntl} only returns information about
3348one of them. The @code{l_whence} member of the @var{lockp} structure is
3349set to @code{SEEK_SET} and the @code{l_start} and @code{l_len} fields
3350set to identify the locked region.
3351
3352If no lock applies, the only change to the @var{lockp} structure is to
3353update the @code{l_type} to a value of @code{F_UNLCK}.
3354
3355The normal return value from @code{fcntl} with this command is an
07435eb4 3356unspecified value other than @math{-1}, which is reserved to indicate an
28f540f4
RM
3357error. The following @code{errno} error conditions are defined for
3358this command:
3359
3360@table @code
3361@item EBADF
3362The @var{filedes} argument is invalid.
3363
3364@item EINVAL
3365Either the @var{lockp} argument doesn't specify valid lock information,
3366or the file associated with @var{filedes} doesn't support locks.
3367@end table
3368@end deftypevr
3369
3370@comment fcntl.h
3371@comment POSIX.1
3372@deftypevr Macro int F_SETLK
3373This macro is used as the @var{command} argument to @code{fcntl}, to
3374specify that it should set or clear a lock. This command requires a
3375third argument of type @w{@code{struct flock *}} to be passed to
3376@code{fcntl}, so that the form of the call is:
3377
3378@smallexample
3379fcntl (@var{filedes}, F_SETLK, @var{lockp})
3380@end smallexample
3381
3382If the process already has a lock on any part of the region, the old lock
3383on that part is replaced with the new lock. You can remove a lock
3384by specifying a lock type of @code{F_UNLCK}.
3385
3386If the lock cannot be set, @code{fcntl} returns immediately with a value
07435eb4 3387of @math{-1}. This function does not block waiting for other processes
28f540f4 3388to release locks. If @code{fcntl} succeeds, it return a value other
07435eb4 3389than @math{-1}.
28f540f4
RM
3390
3391The following @code{errno} error conditions are defined for this
3392function:
3393
3394@table @code
3395@item EAGAIN
3396@itemx EACCES
3397The lock cannot be set because it is blocked by an existing lock on the
3398file. Some systems use @code{EAGAIN} in this case, and other systems
3399use @code{EACCES}; your program should treat them alike, after
3400@code{F_SETLK}. (The GNU system always uses @code{EAGAIN}.)
3401
3402@item EBADF
3403Either: the @var{filedes} argument is invalid; you requested a read lock
3404but the @var{filedes} is not open for read access; or, you requested a
3405write lock but the @var{filedes} is not open for write access.
3406
3407@item EINVAL
3408Either the @var{lockp} argument doesn't specify valid lock information,
3409or the file associated with @var{filedes} doesn't support locks.
3410
3411@item ENOLCK
3412The system has run out of file lock resources; there are already too
3413many file locks in place.
3414
3415Well-designed file systems never report this error, because they have no
3416limitation on the number of locks. However, you must still take account
3417of the possibility of this error, as it could result from network access
3418to a file system on another machine.
3419@end table
3420@end deftypevr
3421
3422@comment fcntl.h
3423@comment POSIX.1
3424@deftypevr Macro int F_SETLKW
3425This macro is used as the @var{command} argument to @code{fcntl}, to
3426specify that it should set or clear a lock. It is just like the
3427@code{F_SETLK} command, but causes the process to block (or wait)
3428until the request can be specified.
3429
3430This command requires a third argument of type @code{struct flock *}, as
3431for the @code{F_SETLK} command.
3432
3433The @code{fcntl} return values and errors are the same as for the
3434@code{F_SETLK} command, but these additional @code{errno} error conditions
3435are defined for this command:
3436
3437@table @code
3438@item EINTR
3439The function was interrupted by a signal while it was waiting.
3440@xref{Interrupted Primitives}.
3441
3442@item EDEADLK
3443The specified region is being locked by another process. But that
3444process is waiting to lock a region which the current process has
3445locked, so waiting for the lock would result in deadlock. The system
3446does not guarantee that it will detect all such conditions, but it lets
3447you know if it notices one.
3448@end table
3449@end deftypevr
3450
3451
3452The following macros are defined for use as values for the @code{l_type}
3453member of the @code{flock} structure. The values are integer constants.
3454
3455@table @code
3456@comment fcntl.h
3457@comment POSIX.1
3458@vindex F_RDLCK
3459@item F_RDLCK
3460This macro is used to specify a read (or shared) lock.
3461
3462@comment fcntl.h
3463@comment POSIX.1
3464@vindex F_WRLCK
3465@item F_WRLCK
3466This macro is used to specify a write (or exclusive) lock.
3467
3468@comment fcntl.h
3469@comment POSIX.1
3470@vindex F_UNLCK
3471@item F_UNLCK
3472This macro is used to specify that the region is unlocked.
3473@end table
3474
3475As an example of a situation where file locking is useful, consider a
3476program that can be run simultaneously by several different users, that
3477logs status information to a common file. One example of such a program
3478might be a game that uses a file to keep track of high scores. Another
3479example might be a program that records usage or accounting information
3480for billing purposes.
3481
3482Having multiple copies of the program simultaneously writing to the
3483file could cause the contents of the file to become mixed up. But
3484you can prevent this kind of problem by setting a write lock on the
2c6fe0bd 3485file before actually writing to the file.
28f540f4
RM
3486
3487If the program also needs to read the file and wants to make sure that
3488the contents of the file are in a consistent state, then it can also use
3489a read lock. While the read lock is set, no other process can lock
3490that part of the file for writing.
3491
3492@c ??? This section could use an example program.
3493
3494Remember that file locks are only a @emph{voluntary} protocol for
3495controlling access to a file. There is still potential for access to
3496the file by programs that don't use the lock protocol.
3497
3498@node Interrupt Input
3499@section Interrupt-Driven Input
3500
3501@cindex interrupt-driven input
3502If you set the @code{O_ASYNC} status flag on a file descriptor
3503(@pxref{File Status Flags}), a @code{SIGIO} signal is sent whenever
3504input or output becomes possible on that file descriptor. The process
3505or process group to receive the signal can be selected by using the
3506@code{F_SETOWN} command to the @code{fcntl} function. If the file
3507descriptor is a socket, this also selects the recipient of @code{SIGURG}
3508signals that are delivered when out-of-band data arrives on that socket;
3509see @ref{Out-of-Band Data}. (@code{SIGURG} is sent in any situation
3510where @code{select} would report the socket as having an ``exceptional
3511condition''. @xref{Waiting for I/O}.)
3512
3513If the file descriptor corresponds to a terminal device, then @code{SIGIO}
2c6fe0bd 3514signals are sent to the foreground process group of the terminal.
28f540f4
RM
3515@xref{Job Control}.
3516
3517@pindex fcntl.h
3518The symbols in this section are defined in the header file
3519@file{fcntl.h}.
3520
3521@comment fcntl.h
3522@comment BSD
3523@deftypevr Macro int F_GETOWN
3524This macro is used as the @var{command} argument to @code{fcntl}, to
3525specify that it should get information about the process or process
3526group to which @code{SIGIO} signals are sent. (For a terminal, this is
3527actually the foreground process group ID, which you can get using
3528@code{tcgetpgrp}; see @ref{Terminal Access Functions}.)
3529
3530The return value is interpreted as a process ID; if negative, its
3531absolute value is the process group ID.
3532
3533The following @code{errno} error condition is defined for this command:
3534
3535@table @code
3536@item EBADF
3537The @var{filedes} argument is invalid.
3538@end table
3539@end deftypevr
3540
3541@comment fcntl.h
3542@comment BSD
3543@deftypevr Macro int F_SETOWN
3544This macro is used as the @var{command} argument to @code{fcntl}, to
3545specify that it should set the process or process group to which
3546@code{SIGIO} signals are sent. This command requires a third argument
3547of type @code{pid_t} to be passed to @code{fcntl}, so that the form of
3548the call is:
3549
3550@smallexample
3551fcntl (@var{filedes}, F_SETOWN, @var{pid})
3552@end smallexample
3553
3554The @var{pid} argument should be a process ID. You can also pass a
3555negative number whose absolute value is a process group ID.
3556
07435eb4 3557The return value from @code{fcntl} with this command is @math{-1}
28f540f4
RM
3558in case of error and some other value if successful. The following
3559@code{errno} error conditions are defined for this command:
3560
3561@table @code
3562@item EBADF
3563The @var{filedes} argument is invalid.
3564
3565@item ESRCH
3566There is no process or process group corresponding to @var{pid}.
3567@end table
3568@end deftypevr
3569
3570@c ??? This section could use an example program.
07435eb4
UD
3571
3572@node IOCTLs
3573@section Generic I/O Control operations
3574@cindex generic i/o control operations
3575@cindex IOCTLs
3576
3577The GNU system can handle most input/output operations on many different
3578devices and objects in terms of a few file primitives - @code{read},
3579@code{write} and @code{lseek}. However, most devices also have a few
3580peculiar operations which do not fit into this model. Such as:
3581
3582@itemize @bullet
3583
3584@item
3585Changing the character font used on a terminal.
3586
3587@item
3588Telling a magnetic tape system to rewind or fast forward. (Since they
3589cannot move in byte increments, @code{lseek} is inapplicable).
3590
3591@item
3592Ejecting a disk from a drive.
3593
3594@item
3595Playing an audio track from a CD-ROM drive.
3596
3597@item
3598Maintaining routing tables for a network.
3599
3600@end itemize
3601
3602Although some such objects such as sockets and terminals
3603@footnote{Actually, the terminal-specific functions are implemented with
3604IOCTLs on many platforms.} have special functions of their own, it would
3605not be practical to create functions for all these cases.
3606
3607Instead these minor operations, known as @dfn{IOCTL}s, are assigned code
3608numbers and multiplexed through the @code{ioctl} function, defined in
3609@code{sys/ioctl.h}. The code numbers themselves are defined in many
3610different headers.
3611
3612@deftypefun int ioctl (int @var{filedes}, int @var{command}, @dots{})
3613
3614The @code{ioctl} function performs the generic I/O operation
3615@var{command} on @var{filedes}.
3616
3617A third argument is usually present, either a single number or a pointer
3618to a structure. The meaning of this argument, the returned value, and
3619any error codes depends upon the command used. Often @math{-1} is
3620returned for a failure.
3621
3622@end deftypefun
3623
3624On some systems, IOCTLs used by different devices share the same numbers.
3625Thus, although use of an inappropriate IOCTL @emph{usually} only produces
3626an error, you should not attempt to use device-specific IOCTLs on an
3627unknown device.
3628
3629Most IOCTLs are OS-specific and/or only used in special system utilities,
3630and are thus beyond the scope of this document. For an example of the use
8b7fb588 3631of an IOCTL, see @ref{Out-of-Band Data}.