]>
Commit | Line | Data |
---|---|---|
28f540f4 | 1 | @node Low-Level I/O, File System Interface, I/O on Streams, Top |
7a68c94a | 2 | @c %MENU% Low-level, less portable I/O |
28f540f4 RM |
3 | @chapter Low-Level Input/Output |
4 | ||
5 | This chapter describes functions for performing low-level input/output | |
6 | operations on file descriptors. These functions include the primitives | |
7 | for the higher-level I/O functions described in @ref{I/O on Streams}, as | |
8 | well as functions for performing low-level control operations for which | |
9 | there are no equivalents on streams. | |
10 | ||
11 | Stream-level I/O is more flexible and usually more convenient; | |
12 | therefore, programmers generally use the descriptor-level functions only | |
13 | when necessary. These are some of the usual reasons: | |
14 | ||
15 | @itemize @bullet | |
16 | @item | |
17 | For reading binary files in large chunks. | |
18 | ||
19 | @item | |
20 | For reading an entire file into core before parsing it. | |
21 | ||
22 | @item | |
23 | To perform operations other than data transfer, which can only be done | |
24 | with a descriptor. (You can use @code{fileno} to get the descriptor | |
25 | corresponding to a stream.) | |
26 | ||
27 | @item | |
28 | To pass descriptors to a child process. (The child can create its own | |
29 | stream to use a descriptor that it inherits, but cannot inherit a stream | |
30 | directly.) | |
31 | @end itemize | |
32 | ||
33 | @menu | |
34 | * Opening and Closing Files:: How to open and close file | |
2c6fe0bd | 35 | descriptors. |
28f540f4 RM |
36 | * I/O Primitives:: Reading and writing data. |
37 | * File Position Primitive:: Setting a descriptor's file | |
2c6fe0bd | 38 | position. |
28f540f4 RM |
39 | * Descriptors and Streams:: Converting descriptor to stream |
40 | or vice-versa. | |
41 | * Stream/Descriptor Precautions:: Precautions needed if you use both | |
42 | descriptors and streams. | |
49c091e5 | 43 | * Scatter-Gather:: Fast I/O to discontinuous buffers. |
07435eb4 | 44 | * Memory-mapped I/O:: Using files like memory. |
28f540f4 RM |
45 | * Waiting for I/O:: How to check for input or output |
46 | on multiple file descriptors. | |
dfd2257a | 47 | * Synchronizing I/O:: Making sure all I/O actions completed. |
b07d03e0 | 48 | * Asynchronous I/O:: Perform I/O in parallel. |
28f540f4 RM |
49 | * Control Operations:: Various other operations on file |
50 | descriptors. | |
51 | * Duplicating Descriptors:: Fcntl commands for duplicating | |
52 | file descriptors. | |
53 | * Descriptor Flags:: Fcntl commands for manipulating | |
54 | flags associated with file | |
2c6fe0bd | 55 | descriptors. |
28f540f4 RM |
56 | * File Status Flags:: Fcntl commands for manipulating |
57 | flags associated with open files. | |
58 | * File Locks:: Fcntl commands for implementing | |
59 | file locking. | |
60 | * Interrupt Input:: Getting an asynchronous signal when | |
61 | input arrives. | |
07435eb4 | 62 | * IOCTLs:: Generic I/O Control operations. |
28f540f4 RM |
63 | @end menu |
64 | ||
65 | ||
66 | @node Opening and Closing Files | |
67 | @section Opening and Closing Files | |
68 | ||
69 | @cindex opening a file descriptor | |
70 | @cindex closing a file descriptor | |
71 | This section describes the primitives for opening and closing files | |
72 | using file descriptors. The @code{open} and @code{creat} functions are | |
73 | declared in the header file @file{fcntl.h}, while @code{close} is | |
74 | declared in @file{unistd.h}. | |
75 | @pindex unistd.h | |
76 | @pindex fcntl.h | |
77 | ||
78 | @comment fcntl.h | |
79 | @comment POSIX.1 | |
80 | @deftypefun int open (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}]) | |
81 | The @code{open} function creates and returns a new file descriptor | |
82 | for the file named by @var{filename}. Initially, the file position | |
83 | indicator for the file is at the beginning of the file. The argument | |
84 | @var{mode} is used only when a file is created, but it doesn't hurt | |
85 | to supply the argument in any case. | |
86 | ||
87 | The @var{flags} argument controls how the file is to be opened. This is | |
88 | a bit mask; you create the value by the bitwise OR of the appropriate | |
89 | parameters (using the @samp{|} operator in C). | |
90 | @xref{File Status Flags}, for the parameters available. | |
91 | ||
92 | The normal return value from @code{open} is a non-negative integer file | |
07435eb4 | 93 | descriptor. In the case of an error, a value of @math{-1} is returned |
28f540f4 RM |
94 | instead. In addition to the usual file name errors (@pxref{File |
95 | Name Errors}), the following @code{errno} error conditions are defined | |
96 | for this function: | |
97 | ||
98 | @table @code | |
99 | @item EACCES | |
04b9968b UD |
100 | The file exists but is not readable/writeable as requested by the @var{flags} |
101 | argument, the file does not exist and the directory is unwriteable so | |
28f540f4 RM |
102 | it cannot be created. |
103 | ||
104 | @item EEXIST | |
105 | Both @code{O_CREAT} and @code{O_EXCL} are set, and the named file already | |
106 | exists. | |
107 | ||
108 | @item EINTR | |
109 | The @code{open} operation was interrupted by a signal. | |
110 | @xref{Interrupted Primitives}. | |
111 | ||
112 | @item EISDIR | |
113 | The @var{flags} argument specified write access, and the file is a directory. | |
114 | ||
115 | @item EMFILE | |
116 | The process has too many files open. | |
117 | The maximum number of file descriptors is controlled by the | |
118 | @code{RLIMIT_NOFILE} resource limit; @pxref{Limits on Resources}. | |
119 | ||
120 | @item ENFILE | |
121 | The entire system, or perhaps the file system which contains the | |
122 | directory, cannot support any additional open files at the moment. | |
123 | (This problem cannot happen on the GNU system.) | |
124 | ||
125 | @item ENOENT | |
126 | The named file does not exist, and @code{O_CREAT} is not specified. | |
127 | ||
128 | @item ENOSPC | |
129 | The directory or file system that would contain the new file cannot be | |
130 | extended, because there is no disk space left. | |
131 | ||
132 | @item ENXIO | |
133 | @code{O_NONBLOCK} and @code{O_WRONLY} are both set in the @var{flags} | |
134 | argument, the file named by @var{filename} is a FIFO (@pxref{Pipes and | |
135 | FIFOs}), and no process has the file open for reading. | |
136 | ||
137 | @item EROFS | |
138 | The file resides on a read-only file system and any of @w{@code{O_WRONLY}}, | |
139 | @code{O_RDWR}, and @code{O_TRUNC} are set in the @var{flags} argument, | |
140 | or @code{O_CREAT} is set and the file does not already exist. | |
141 | @end table | |
142 | ||
143 | @c !!! umask | |
144 | ||
04b9968b | 145 | If on a 32 bit machine the sources are translated with |
b07d03e0 UD |
146 | @code{_FILE_OFFSET_BITS == 64} the function @code{open} returns a file |
147 | descriptor opened in the large file mode which enables the file handling | |
fed8f7f7 | 148 | functions to use files up to @math{2^63} bytes in size and offset from |
b07d03e0 UD |
149 | @math{-2^63} to @math{2^63}. This happens transparently for the user |
150 | since all of the lowlevel file handling functions are equally replaced. | |
151 | ||
04b9968b | 152 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
153 | is a problem if the thread allocates some resources (like memory, file |
154 | descriptors, semaphores or whatever) at the time @code{open} is | |
04b9968b | 155 | called. If the thread gets cancelled these resources stay allocated |
dfd2257a | 156 | until the program ends. To avoid this calls to @code{open} should be |
04b9968b | 157 | protected using cancellation handlers. |
dfd2257a UD |
158 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
159 | ||
28f540f4 RM |
160 | The @code{open} function is the underlying primitive for the @code{fopen} |
161 | and @code{freopen} functions, that create streams. | |
162 | @end deftypefun | |
163 | ||
b07d03e0 | 164 | @comment fcntl.h |
a3a4a74e | 165 | @comment Unix98 |
b07d03e0 UD |
166 | @deftypefun int open64 (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}]) |
167 | This function is similar to @code{open}. It returns a file descriptor | |
168 | which can be used to access the file named by @var{filename}. The only | |
04b9968b | 169 | difference is that on 32 bit systems the file is opened in the |
b07d03e0 UD |
170 | large file mode. I.e., file length and file offsets can exceed 31 bits. |
171 | ||
b07d03e0 UD |
172 | When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this |
173 | function is actually available under the name @code{open}. I.e., the | |
174 | new, extended API using 64 bit file sizes and offsets transparently | |
175 | replaces the old API. | |
176 | @end deftypefun | |
177 | ||
28f540f4 RM |
178 | @comment fcntl.h |
179 | @comment POSIX.1 | |
180 | @deftypefn {Obsolete function} int creat (const char *@var{filename}, mode_t @var{mode}) | |
181 | This function is obsolete. The call: | |
182 | ||
183 | @smallexample | |
184 | creat (@var{filename}, @var{mode}) | |
185 | @end smallexample | |
186 | ||
187 | @noindent | |
188 | is equivalent to: | |
189 | ||
190 | @smallexample | |
191 | open (@var{filename}, O_WRONLY | O_CREAT | O_TRUNC, @var{mode}) | |
192 | @end smallexample | |
b07d03e0 | 193 | |
04b9968b | 194 | If on a 32 bit machine the sources are translated with |
b07d03e0 UD |
195 | @code{_FILE_OFFSET_BITS == 64} the function @code{creat} returns a file |
196 | descriptor opened in the large file mode which enables the file handling | |
197 | functions to use files up to @math{2^63} in size and offset from | |
198 | @math{-2^63} to @math{2^63}. This happens transparently for the user | |
199 | since all of the lowlevel file handling functions are equally replaced. | |
200 | @end deftypefn | |
201 | ||
202 | @comment fcntl.h | |
a3a4a74e | 203 | @comment Unix98 |
b07d03e0 UD |
204 | @deftypefn {Obsolete function} int creat64 (const char *@var{filename}, mode_t @var{mode}) |
205 | This function is similar to @code{creat}. It returns a file descriptor | |
206 | which can be used to access the file named by @var{filename}. The only | |
04b9968b | 207 | the difference is that on 32 bit systems the file is opened in the |
b07d03e0 UD |
208 | large file mode. I.e., file length and file offsets can exceed 31 bits. |
209 | ||
210 | To use this file descriptor one must not use the normal operations but | |
211 | instead the counterparts named @code{*64}, e.g., @code{read64}. | |
212 | ||
213 | When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this | |
214 | function is actually available under the name @code{open}. I.e., the | |
215 | new, extended API using 64 bit file sizes and offsets transparently | |
216 | replaces the old API. | |
28f540f4 RM |
217 | @end deftypefn |
218 | ||
219 | @comment unistd.h | |
220 | @comment POSIX.1 | |
221 | @deftypefun int close (int @var{filedes}) | |
222 | The function @code{close} closes the file descriptor @var{filedes}. | |
223 | Closing a file has the following consequences: | |
224 | ||
225 | @itemize @bullet | |
2c6fe0bd | 226 | @item |
28f540f4 RM |
227 | The file descriptor is deallocated. |
228 | ||
229 | @item | |
230 | Any record locks owned by the process on the file are unlocked. | |
231 | ||
232 | @item | |
233 | When all file descriptors associated with a pipe or FIFO have been closed, | |
234 | any unread data is discarded. | |
235 | @end itemize | |
236 | ||
04b9968b | 237 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
238 | is a problem if the thread allocates some resources (like memory, file |
239 | descriptors, semaphores or whatever) at the time @code{close} is | |
04b9968b UD |
240 | called. If the thread gets cancelled these resources stay allocated |
241 | until the program ends. To avoid this, calls to @code{close} should be | |
242 | protected using cancellation handlers. | |
dfd2257a UD |
243 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
244 | ||
07435eb4 | 245 | The normal return value from @code{close} is @math{0}; a value of @math{-1} |
28f540f4 RM |
246 | is returned in case of failure. The following @code{errno} error |
247 | conditions are defined for this function: | |
248 | ||
249 | @table @code | |
250 | @item EBADF | |
251 | The @var{filedes} argument is not a valid file descriptor. | |
252 | ||
253 | @item EINTR | |
254 | The @code{close} call was interrupted by a signal. | |
255 | @xref{Interrupted Primitives}. | |
256 | Here is an example of how to handle @code{EINTR} properly: | |
257 | ||
258 | @smallexample | |
259 | TEMP_FAILURE_RETRY (close (desc)); | |
260 | @end smallexample | |
261 | ||
262 | @item ENOSPC | |
263 | @itemx EIO | |
264 | @itemx EDQUOT | |
2c6fe0bd | 265 | When the file is accessed by NFS, these errors from @code{write} can sometimes |
28f540f4 RM |
266 | not be detected until @code{close}. @xref{I/O Primitives}, for details |
267 | on their meaning. | |
268 | @end table | |
b07d03e0 UD |
269 | |
270 | Please note that there is @emph{no} separate @code{close64} function. | |
271 | This is not necessary since this function does not determine nor depend | |
fed8f7f7 | 272 | on the mode of the file. The kernel which performs the @code{close} |
04b9968b | 273 | operation knows which mode the descriptor is used for and can handle |
b07d03e0 | 274 | this situation. |
28f540f4 RM |
275 | @end deftypefun |
276 | ||
277 | To close a stream, call @code{fclose} (@pxref{Closing Streams}) instead | |
278 | of trying to close its underlying file descriptor with @code{close}. | |
279 | This flushes any buffered output and updates the stream object to | |
280 | indicate that it is closed. | |
281 | ||
282 | @node I/O Primitives | |
283 | @section Input and Output Primitives | |
284 | ||
285 | This section describes the functions for performing primitive input and | |
286 | output operations on file descriptors: @code{read}, @code{write}, and | |
287 | @code{lseek}. These functions are declared in the header file | |
288 | @file{unistd.h}. | |
289 | @pindex unistd.h | |
290 | ||
291 | @comment unistd.h | |
292 | @comment POSIX.1 | |
293 | @deftp {Data Type} ssize_t | |
294 | This data type is used to represent the sizes of blocks that can be | |
295 | read or written in a single operation. It is similar to @code{size_t}, | |
296 | but must be a signed type. | |
297 | @end deftp | |
298 | ||
299 | @cindex reading from a file descriptor | |
300 | @comment unistd.h | |
301 | @comment POSIX.1 | |
302 | @deftypefun ssize_t read (int @var{filedes}, void *@var{buffer}, size_t @var{size}) | |
303 | The @code{read} function reads up to @var{size} bytes from the file | |
304 | with descriptor @var{filedes}, storing the results in the @var{buffer}. | |
04b9968b UD |
305 | (This is not necessarily a character string, and no terminating null |
306 | character is added.) | |
28f540f4 RM |
307 | |
308 | @cindex end-of-file, on a file descriptor | |
309 | The return value is the number of bytes actually read. This might be | |
310 | less than @var{size}; for example, if there aren't that many bytes left | |
311 | in the file or if there aren't that many bytes immediately available. | |
312 | The exact behavior depends on what kind of file it is. Note that | |
313 | reading less than @var{size} bytes is not an error. | |
314 | ||
315 | A value of zero indicates end-of-file (except if the value of the | |
316 | @var{size} argument is also zero). This is not considered an error. | |
317 | If you keep calling @code{read} while at end-of-file, it will keep | |
318 | returning zero and doing nothing else. | |
319 | ||
320 | If @code{read} returns at least one character, there is no way you can | |
321 | tell whether end-of-file was reached. But if you did reach the end, the | |
322 | next read will return zero. | |
323 | ||
07435eb4 | 324 | In case of an error, @code{read} returns @math{-1}. The following |
28f540f4 RM |
325 | @code{errno} error conditions are defined for this function: |
326 | ||
327 | @table @code | |
328 | @item EAGAIN | |
329 | Normally, when no input is immediately available, @code{read} waits for | |
330 | some input. But if the @code{O_NONBLOCK} flag is set for the file | |
331 | (@pxref{File Status Flags}), @code{read} returns immediately without | |
332 | reading any data, and reports this error. | |
333 | ||
334 | @strong{Compatibility Note:} Most versions of BSD Unix use a different | |
335 | error code for this: @code{EWOULDBLOCK}. In the GNU library, | |
336 | @code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter | |
337 | which name you use. | |
338 | ||
339 | On some systems, reading a large amount of data from a character special | |
340 | file can also fail with @code{EAGAIN} if the kernel cannot find enough | |
341 | physical memory to lock down the user's pages. This is limited to | |
342 | devices that transfer with direct memory access into the user's memory, | |
343 | which means it does not include terminals, since they always use | |
344 | separate buffers inside the kernel. This problem never happens in the | |
345 | GNU system. | |
346 | ||
347 | Any condition that could result in @code{EAGAIN} can instead result in a | |
348 | successful @code{read} which returns fewer bytes than requested. | |
349 | Calling @code{read} again immediately would result in @code{EAGAIN}. | |
350 | ||
351 | @item EBADF | |
352 | The @var{filedes} argument is not a valid file descriptor, | |
353 | or is not open for reading. | |
354 | ||
355 | @item EINTR | |
356 | @code{read} was interrupted by a signal while it was waiting for input. | |
357 | @xref{Interrupted Primitives}. A signal will not necessary cause | |
358 | @code{read} to return @code{EINTR}; it may instead result in a | |
359 | successful @code{read} which returns fewer bytes than requested. | |
360 | ||
361 | @item EIO | |
362 | For many devices, and for disk files, this error code indicates | |
363 | a hardware error. | |
364 | ||
365 | @code{EIO} also occurs when a background process tries to read from the | |
366 | controlling terminal, and the normal action of stopping the process by | |
367 | sending it a @code{SIGTTIN} signal isn't working. This might happen if | |
04b9968b | 368 | the signal is being blocked or ignored, or because the process group is |
28f540f4 RM |
369 | orphaned. @xref{Job Control}, for more information about job control, |
370 | and @ref{Signal Handling}, for information about signals. | |
371 | @end table | |
372 | ||
b07d03e0 UD |
373 | Please note that there is no function named @code{read64}. This is not |
374 | necessary since this function does not directly modify or handle the | |
375 | possibly wide file offset. Since the kernel handles this state | |
04b9968b | 376 | internally, the @code{read} function can be used for all cases. |
b07d03e0 | 377 | |
04b9968b | 378 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
379 | is a problem if the thread allocates some resources (like memory, file |
380 | descriptors, semaphores or whatever) at the time @code{read} is | |
04b9968b UD |
381 | called. If the thread gets cancelled these resources stay allocated |
382 | until the program ends. To avoid this, calls to @code{read} should be | |
383 | protected using cancellation handlers. | |
dfd2257a UD |
384 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
385 | ||
28f540f4 RM |
386 | The @code{read} function is the underlying primitive for all of the |
387 | functions that read from streams, such as @code{fgetc}. | |
388 | @end deftypefun | |
389 | ||
a5a0310d UD |
390 | @comment unistd.h |
391 | @comment Unix98 | |
392 | @deftypefun ssize_t pread (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off_t @var{offset}) | |
393 | The @code{pread} function is similar to the @code{read} function. The | |
04b9968b UD |
394 | first three arguments are identical, and the return values and error |
395 | codes also correspond. | |
a5a0310d UD |
396 | |
397 | The difference is the fourth argument and its handling. The data block | |
398 | is not read from the current position of the file descriptor | |
399 | @code{filedes}. Instead the data is read from the file starting at | |
400 | position @var{offset}. The position of the file descriptor itself is | |
04b9968b | 401 | not affected by the operation. The value is the same as before the call. |
a5a0310d | 402 | |
b07d03e0 UD |
403 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the |
404 | @code{pread} function is in fact @code{pread64} and the type | |
04b9968b | 405 | @code{off_t} has 64 bits, which makes it possible to handle files up to |
c756c71c | 406 | @math{2^63} bytes in length. |
b07d03e0 | 407 | |
a5a0310d UD |
408 | The return value of @code{pread} describes the number of bytes read. |
409 | In the error case it returns @math{-1} like @code{read} does and the | |
04b9968b UD |
410 | error codes are also the same, with these additions: |
411 | ||
a5a0310d UD |
412 | @table @code |
413 | @item EINVAL | |
414 | The value given for @var{offset} is negative and therefore illegal. | |
415 | ||
416 | @item ESPIPE | |
417 | The file descriptor @var{filedes} is associate with a pipe or a FIFO and | |
418 | this device does not allow positioning of the file pointer. | |
419 | @end table | |
420 | ||
421 | The function is an extension defined in the Unix Single Specification | |
422 | version 2. | |
423 | @end deftypefun | |
424 | ||
b07d03e0 | 425 | @comment unistd.h |
a3a4a74e | 426 | @comment Unix98 |
b07d03e0 UD |
427 | @deftypefun ssize_t pread64 (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off64_t @var{offset}) |
428 | This function is similar to the @code{pread} function. The difference | |
429 | is that the @var{offset} parameter is of type @code{off64_t} instead of | |
04b9968b | 430 | @code{off_t} which makes it possible on 32 bit machines to address |
c756c71c | 431 | files larger than @math{2^31} bytes and up to @math{2^63} bytes. The |
b07d03e0 UD |
432 | file descriptor @code{filedes} must be opened using @code{open64} since |
433 | otherwise the large offsets possible with @code{off64_t} will lead to | |
434 | errors with a descriptor in small file mode. | |
435 | ||
c756c71c | 436 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a |
04b9968b UD |
437 | 32 bit machine this function is actually available under the name |
438 | @code{pread} and so transparently replaces the 32 bit interface. | |
b07d03e0 UD |
439 | @end deftypefun |
440 | ||
28f540f4 RM |
441 | @cindex writing to a file descriptor |
442 | @comment unistd.h | |
443 | @comment POSIX.1 | |
444 | @deftypefun ssize_t write (int @var{filedes}, const void *@var{buffer}, size_t @var{size}) | |
445 | The @code{write} function writes up to @var{size} bytes from | |
446 | @var{buffer} to the file with descriptor @var{filedes}. The data in | |
447 | @var{buffer} is not necessarily a character string and a null character is | |
448 | output like any other character. | |
449 | ||
450 | The return value is the number of bytes actually written. This may be | |
451 | @var{size}, but can always be smaller. Your program should always call | |
452 | @code{write} in a loop, iterating until all the data is written. | |
453 | ||
454 | Once @code{write} returns, the data is enqueued to be written and can be | |
455 | read back right away, but it is not necessarily written out to permanent | |
456 | storage immediately. You can use @code{fsync} when you need to be sure | |
457 | your data has been permanently stored before continuing. (It is more | |
458 | efficient for the system to batch up consecutive writes and do them all | |
459 | at once when convenient. Normally they will always be written to disk | |
a5a0310d UD |
460 | within a minute or less.) Modern systems provide another function |
461 | @code{fdatasync} which guarantees integrity only for the file data and | |
462 | is therefore faster. | |
463 | @c !!! xref fsync, fdatasync | |
2c6fe0bd | 464 | You can use the @code{O_FSYNC} open mode to make @code{write} always |
28f540f4 RM |
465 | store the data to disk before returning; @pxref{Operating Modes}. |
466 | ||
07435eb4 | 467 | In the case of an error, @code{write} returns @math{-1}. The following |
28f540f4 RM |
468 | @code{errno} error conditions are defined for this function: |
469 | ||
470 | @table @code | |
471 | @item EAGAIN | |
472 | Normally, @code{write} blocks until the write operation is complete. | |
473 | But if the @code{O_NONBLOCK} flag is set for the file (@pxref{Control | |
04b9968b | 474 | Operations}), it returns immediately without writing any data and |
28f540f4 RM |
475 | reports this error. An example of a situation that might cause the |
476 | process to block on output is writing to a terminal device that supports | |
477 | flow control, where output has been suspended by receipt of a STOP | |
478 | character. | |
479 | ||
480 | @strong{Compatibility Note:} Most versions of BSD Unix use a different | |
481 | error code for this: @code{EWOULDBLOCK}. In the GNU library, | |
482 | @code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter | |
483 | which name you use. | |
484 | ||
485 | On some systems, writing a large amount of data from a character special | |
486 | file can also fail with @code{EAGAIN} if the kernel cannot find enough | |
487 | physical memory to lock down the user's pages. This is limited to | |
488 | devices that transfer with direct memory access into the user's memory, | |
489 | which means it does not include terminals, since they always use | |
490 | separate buffers inside the kernel. This problem does not arise in the | |
491 | GNU system. | |
492 | ||
493 | @item EBADF | |
494 | The @var{filedes} argument is not a valid file descriptor, | |
495 | or is not open for writing. | |
496 | ||
497 | @item EFBIG | |
498 | The size of the file would become larger than the implementation can support. | |
499 | ||
500 | @item EINTR | |
501 | The @code{write} operation was interrupted by a signal while it was | |
04b9968b | 502 | blocked waiting for completion. A signal will not necessarily cause |
28f540f4 RM |
503 | @code{write} to return @code{EINTR}; it may instead result in a |
504 | successful @code{write} which writes fewer bytes than requested. | |
505 | @xref{Interrupted Primitives}. | |
506 | ||
507 | @item EIO | |
508 | For many devices, and for disk files, this error code indicates | |
509 | a hardware error. | |
510 | ||
511 | @item ENOSPC | |
512 | The device containing the file is full. | |
513 | ||
514 | @item EPIPE | |
515 | This error is returned when you try to write to a pipe or FIFO that | |
516 | isn't open for reading by any process. When this happens, a @code{SIGPIPE} | |
517 | signal is also sent to the process; see @ref{Signal Handling}. | |
518 | @end table | |
519 | ||
520 | Unless you have arranged to prevent @code{EINTR} failures, you should | |
521 | check @code{errno} after each failing call to @code{write}, and if the | |
522 | error was @code{EINTR}, you should simply repeat the call. | |
523 | @xref{Interrupted Primitives}. The easy way to do this is with the | |
524 | macro @code{TEMP_FAILURE_RETRY}, as follows: | |
525 | ||
526 | @smallexample | |
527 | nbytes = TEMP_FAILURE_RETRY (write (desc, buffer, count)); | |
528 | @end smallexample | |
529 | ||
b07d03e0 UD |
530 | Please note that there is no function named @code{write64}. This is not |
531 | necessary since this function does not directly modify or handle the | |
532 | possibly wide file offset. Since the kernel handles this state | |
533 | internally the @code{write} function can be used for all cases. | |
534 | ||
04b9968b | 535 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
536 | is a problem if the thread allocates some resources (like memory, file |
537 | descriptors, semaphores or whatever) at the time @code{write} is | |
04b9968b UD |
538 | called. If the thread gets cancelled these resources stay allocated |
539 | until the program ends. To avoid this, calls to @code{write} should be | |
540 | protected using cancellation handlers. | |
dfd2257a UD |
541 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
542 | ||
28f540f4 RM |
543 | The @code{write} function is the underlying primitive for all of the |
544 | functions that write to streams, such as @code{fputc}. | |
545 | @end deftypefun | |
546 | ||
a5a0310d UD |
547 | @comment unistd.h |
548 | @comment Unix98 | |
549 | @deftypefun ssize_t pwrite (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off_t @var{offset}) | |
550 | The @code{pwrite} function is similar to the @code{write} function. The | |
04b9968b UD |
551 | first three arguments are identical, and the return values and error codes |
552 | also correspond. | |
a5a0310d UD |
553 | |
554 | The difference is the fourth argument and its handling. The data block | |
555 | is not written to the current position of the file descriptor | |
556 | @code{filedes}. Instead the data is written to the file starting at | |
557 | position @var{offset}. The position of the file descriptor itself is | |
04b9968b | 558 | not affected by the operation. The value is the same as before the call. |
a5a0310d | 559 | |
b07d03e0 UD |
560 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the |
561 | @code{pwrite} function is in fact @code{pwrite64} and the type | |
04b9968b | 562 | @code{off_t} has 64 bits, which makes it possible to handle files up to |
c756c71c | 563 | @math{2^63} bytes in length. |
b07d03e0 | 564 | |
a5a0310d UD |
565 | The return value of @code{pwrite} describes the number of written bytes. |
566 | In the error case it returns @math{-1} like @code{write} does and the | |
04b9968b UD |
567 | error codes are also the same, with these additions: |
568 | ||
a5a0310d UD |
569 | @table @code |
570 | @item EINVAL | |
571 | The value given for @var{offset} is negative and therefore illegal. | |
572 | ||
573 | @item ESPIPE | |
04b9968b | 574 | The file descriptor @var{filedes} is associated with a pipe or a FIFO and |
a5a0310d UD |
575 | this device does not allow positioning of the file pointer. |
576 | @end table | |
577 | ||
578 | The function is an extension defined in the Unix Single Specification | |
579 | version 2. | |
580 | @end deftypefun | |
581 | ||
b07d03e0 | 582 | @comment unistd.h |
a3a4a74e | 583 | @comment Unix98 |
b07d03e0 UD |
584 | @deftypefun ssize_t pwrite64 (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off64_t @var{offset}) |
585 | This function is similar to the @code{pwrite} function. The difference | |
586 | is that the @var{offset} parameter is of type @code{off64_t} instead of | |
04b9968b | 587 | @code{off_t} which makes it possible on 32 bit machines to address |
c756c71c | 588 | files larger than @math{2^31} bytes and up to @math{2^63} bytes. The |
b07d03e0 UD |
589 | file descriptor @code{filedes} must be opened using @code{open64} since |
590 | otherwise the large offsets possible with @code{off64_t} will lead to | |
591 | errors with a descriptor in small file mode. | |
592 | ||
c756c71c | 593 | When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a |
04b9968b UD |
594 | 32 bit machine this function is actually available under the name |
595 | @code{pwrite} and so transparently replaces the 32 bit interface. | |
b07d03e0 UD |
596 | @end deftypefun |
597 | ||
a5a0310d | 598 | |
28f540f4 RM |
599 | @node File Position Primitive |
600 | @section Setting the File Position of a Descriptor | |
601 | ||
602 | Just as you can set the file position of a stream with @code{fseek}, you | |
603 | can set the file position of a descriptor with @code{lseek}. This | |
604 | specifies the position in the file for the next @code{read} or | |
605 | @code{write} operation. @xref{File Positioning}, for more information | |
606 | on the file position and what it means. | |
607 | ||
608 | To read the current file position value from a descriptor, use | |
609 | @code{lseek (@var{desc}, 0, SEEK_CUR)}. | |
610 | ||
611 | @cindex file positioning on a file descriptor | |
612 | @cindex positioning a file descriptor | |
613 | @cindex seeking on a file descriptor | |
614 | @comment unistd.h | |
615 | @comment POSIX.1 | |
616 | @deftypefun off_t lseek (int @var{filedes}, off_t @var{offset}, int @var{whence}) | |
617 | The @code{lseek} function is used to change the file position of the | |
618 | file with descriptor @var{filedes}. | |
619 | ||
620 | The @var{whence} argument specifies how the @var{offset} should be | |
04b9968b UD |
621 | interpreted, in the same way as for the @code{fseek} function, and it must |
622 | be one of the symbolic constants @code{SEEK_SET}, @code{SEEK_CUR}, or | |
28f540f4 RM |
623 | @code{SEEK_END}. |
624 | ||
625 | @table @code | |
626 | @item SEEK_SET | |
627 | Specifies that @var{whence} is a count of characters from the beginning | |
628 | of the file. | |
629 | ||
630 | @item SEEK_CUR | |
631 | Specifies that @var{whence} is a count of characters from the current | |
632 | file position. This count may be positive or negative. | |
633 | ||
634 | @item SEEK_END | |
635 | Specifies that @var{whence} is a count of characters from the end of | |
636 | the file. A negative count specifies a position within the current | |
637 | extent of the file; a positive count specifies a position past the | |
2c6fe0bd | 638 | current end. If you set the position past the current end, and |
28f540f4 | 639 | actually write data, you will extend the file with zeros up to that |
336dfb2d UD |
640 | position. |
641 | @end table | |
28f540f4 RM |
642 | |
643 | The return value from @code{lseek} is normally the resulting file | |
644 | position, measured in bytes from the beginning of the file. | |
645 | You can use this feature together with @code{SEEK_CUR} to read the | |
646 | current file position. | |
647 | ||
648 | If you want to append to the file, setting the file position to the | |
649 | current end of file with @code{SEEK_END} is not sufficient. Another | |
650 | process may write more data after you seek but before you write, | |
651 | extending the file so the position you write onto clobbers their data. | |
652 | Instead, use the @code{O_APPEND} operating mode; @pxref{Operating Modes}. | |
653 | ||
654 | You can set the file position past the current end of the file. This | |
655 | does not by itself make the file longer; @code{lseek} never changes the | |
656 | file. But subsequent output at that position will extend the file. | |
657 | Characters between the previous end of file and the new position are | |
658 | filled with zeros. Extending the file in this way can create a | |
659 | ``hole'': the blocks of zeros are not actually allocated on disk, so the | |
78759725 | 660 | file takes up less space than it appears to; it is then called a |
28f540f4 RM |
661 | ``sparse file''. |
662 | @cindex sparse files | |
663 | @cindex holes in files | |
664 | ||
665 | If the file position cannot be changed, or the operation is in some way | |
07435eb4 | 666 | invalid, @code{lseek} returns a value of @math{-1}. The following |
28f540f4 RM |
667 | @code{errno} error conditions are defined for this function: |
668 | ||
669 | @table @code | |
670 | @item EBADF | |
671 | The @var{filedes} is not a valid file descriptor. | |
672 | ||
673 | @item EINVAL | |
674 | The @var{whence} argument value is not valid, or the resulting | |
675 | file offset is not valid. A file offset is invalid. | |
676 | ||
677 | @item ESPIPE | |
678 | The @var{filedes} corresponds to an object that cannot be positioned, | |
679 | such as a pipe, FIFO or terminal device. (POSIX.1 specifies this error | |
680 | only for pipes and FIFOs, but in the GNU system, you always get | |
681 | @code{ESPIPE} if the object is not seekable.) | |
682 | @end table | |
683 | ||
b07d03e0 UD |
684 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the |
685 | @code{lseek} function is in fact @code{lseek64} and the type | |
686 | @code{off_t} has 64 bits which makes it possible to handle files up to | |
c756c71c | 687 | @math{2^63} bytes in length. |
b07d03e0 | 688 | |
04b9968b | 689 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
690 | is a problem if the thread allocates some resources (like memory, file |
691 | descriptors, semaphores or whatever) at the time @code{lseek} is | |
04b9968b | 692 | called. If the thread gets cancelled these resources stay allocated |
dfd2257a | 693 | until the program ends. To avoid this calls to @code{lseek} should be |
04b9968b | 694 | protected using cancellation handlers. |
dfd2257a UD |
695 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
696 | ||
28f540f4 | 697 | The @code{lseek} function is the underlying primitive for the |
dfd2257a UD |
698 | @code{fseek}, @code{fseeko}, @code{ftell}, @code{ftello} and |
699 | @code{rewind} functions, which operate on streams instead of file | |
700 | descriptors. | |
28f540f4 RM |
701 | @end deftypefun |
702 | ||
b07d03e0 | 703 | @comment unistd.h |
a3a4a74e | 704 | @comment Unix98 |
b07d03e0 UD |
705 | @deftypefun off64_t lseek64 (int @var{filedes}, off64_t @var{offset}, int @var{whence}) |
706 | This function is similar to the @code{lseek} function. The difference | |
707 | is that the @var{offset} parameter is of type @code{off64_t} instead of | |
04b9968b | 708 | @code{off_t} which makes it possible on 32 bit machines to address |
c756c71c | 709 | files larger than @math{2^31} bytes and up to @math{2^63} bytes. The |
b07d03e0 UD |
710 | file descriptor @code{filedes} must be opened using @code{open64} since |
711 | otherwise the large offsets possible with @code{off64_t} will lead to | |
712 | errors with a descriptor in small file mode. | |
713 | ||
c756c71c | 714 | When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a |
b07d03e0 | 715 | 32 bits machine this function is actually available under the name |
04b9968b | 716 | @code{lseek} and so transparently replaces the 32 bit interface. |
b07d03e0 UD |
717 | @end deftypefun |
718 | ||
28f540f4 | 719 | You can have multiple descriptors for the same file if you open the file |
2c6fe0bd | 720 | more than once, or if you duplicate a descriptor with @code{dup}. |
28f540f4 RM |
721 | Descriptors that come from separate calls to @code{open} have independent |
722 | file positions; using @code{lseek} on one descriptor has no effect on the | |
2c6fe0bd | 723 | other. For example, |
28f540f4 RM |
724 | |
725 | @smallexample | |
726 | @group | |
727 | @{ | |
728 | int d1, d2; | |
729 | char buf[4]; | |
730 | d1 = open ("foo", O_RDONLY); | |
731 | d2 = open ("foo", O_RDONLY); | |
732 | lseek (d1, 1024, SEEK_SET); | |
733 | read (d2, buf, 4); | |
734 | @} | |
735 | @end group | |
736 | @end smallexample | |
737 | ||
738 | @noindent | |
739 | will read the first four characters of the file @file{foo}. (The | |
740 | error-checking code necessary for a real program has been omitted here | |
741 | for brevity.) | |
742 | ||
743 | By contrast, descriptors made by duplication share a common file | |
744 | position with the original descriptor that was duplicated. Anything | |
745 | which alters the file position of one of the duplicates, including | |
746 | reading or writing data, affects all of them alike. Thus, for example, | |
747 | ||
748 | @smallexample | |
749 | @{ | |
750 | int d1, d2, d3; | |
751 | char buf1[4], buf2[4]; | |
752 | d1 = open ("foo", O_RDONLY); | |
753 | d2 = dup (d1); | |
754 | d3 = dup (d2); | |
755 | lseek (d3, 1024, SEEK_SET); | |
756 | read (d1, buf1, 4); | |
757 | read (d2, buf2, 4); | |
758 | @} | |
759 | @end smallexample | |
760 | ||
761 | @noindent | |
762 | will read four characters starting with the 1024'th character of | |
763 | @file{foo}, and then four more characters starting with the 1028'th | |
764 | character. | |
765 | ||
766 | @comment sys/types.h | |
767 | @comment POSIX.1 | |
768 | @deftp {Data Type} off_t | |
769 | This is an arithmetic data type used to represent file sizes. | |
770 | In the GNU system, this is equivalent to @code{fpos_t} or @code{long int}. | |
a3a4a74e UD |
771 | |
772 | If the source is compiled with @code{_FILE_OFFSET_BITS == 64} this type | |
773 | is transparently replaced by @code{off64_t}. | |
28f540f4 RM |
774 | @end deftp |
775 | ||
b07d03e0 | 776 | @comment sys/types.h |
a3a4a74e | 777 | @comment Unix98 |
b07d03e0 UD |
778 | @deftp {Data Type} off64_t |
779 | This type is used similar to @code{off_t}. The difference is that even | |
04b9968b | 780 | on 32 bit machines, where the @code{off_t} type would have 32 bits, |
b07d03e0 UD |
781 | @code{off64_t} has 64 bits and so is able to address files up to |
782 | @math{2^63} bytes in length. | |
a3a4a74e UD |
783 | |
784 | When compiling with @code{_FILE_OFFSET_BITS == 64} this type is | |
785 | available under the name @code{off_t}. | |
b07d03e0 UD |
786 | @end deftp |
787 | ||
28f540f4 RM |
788 | These aliases for the @samp{SEEK_@dots{}} constants exist for the sake |
789 | of compatibility with older BSD systems. They are defined in two | |
790 | different header files: @file{fcntl.h} and @file{sys/file.h}. | |
791 | ||
792 | @table @code | |
793 | @item L_SET | |
794 | An alias for @code{SEEK_SET}. | |
795 | ||
796 | @item L_INCR | |
797 | An alias for @code{SEEK_CUR}. | |
798 | ||
799 | @item L_XTND | |
800 | An alias for @code{SEEK_END}. | |
801 | @end table | |
802 | ||
803 | @node Descriptors and Streams | |
804 | @section Descriptors and Streams | |
805 | @cindex streams, and file descriptors | |
806 | @cindex converting file descriptor to stream | |
807 | @cindex extracting file descriptor from stream | |
808 | ||
809 | Given an open file descriptor, you can create a stream for it with the | |
810 | @code{fdopen} function. You can get the underlying file descriptor for | |
811 | an existing stream with the @code{fileno} function. These functions are | |
812 | declared in the header file @file{stdio.h}. | |
813 | @pindex stdio.h | |
814 | ||
815 | @comment stdio.h | |
816 | @comment POSIX.1 | |
817 | @deftypefun {FILE *} fdopen (int @var{filedes}, const char *@var{opentype}) | |
818 | The @code{fdopen} function returns a new stream for the file descriptor | |
819 | @var{filedes}. | |
820 | ||
821 | The @var{opentype} argument is interpreted in the same way as for the | |
822 | @code{fopen} function (@pxref{Opening Streams}), except that | |
823 | the @samp{b} option is not permitted; this is because GNU makes no | |
824 | distinction between text and binary files. Also, @code{"w"} and | |
04b9968b | 825 | @code{"w+"} do not cause truncation of the file; these have an effect only |
28f540f4 RM |
826 | when opening a file, and in this case the file has already been opened. |
827 | You must make sure that the @var{opentype} argument matches the actual | |
828 | mode of the open file descriptor. | |
829 | ||
830 | The return value is the new stream. If the stream cannot be created | |
831 | (for example, if the modes for the file indicated by the file descriptor | |
832 | do not permit the access specified by the @var{opentype} argument), a | |
833 | null pointer is returned instead. | |
834 | ||
835 | In some other systems, @code{fdopen} may fail to detect that the modes | |
836 | for file descriptor do not permit the access specified by | |
837 | @code{opentype}. The GNU C library always checks for this. | |
838 | @end deftypefun | |
839 | ||
840 | For an example showing the use of the @code{fdopen} function, | |
841 | see @ref{Creating a Pipe}. | |
842 | ||
843 | @comment stdio.h | |
844 | @comment POSIX.1 | |
845 | @deftypefun int fileno (FILE *@var{stream}) | |
846 | This function returns the file descriptor associated with the stream | |
847 | @var{stream}. If an error is detected (for example, if the @var{stream} | |
848 | is not valid) or if @var{stream} does not do I/O to a file, | |
07435eb4 | 849 | @code{fileno} returns @math{-1}. |
28f540f4 RM |
850 | @end deftypefun |
851 | ||
7b4161bb UD |
852 | @comment stdio.h |
853 | @comment GNU | |
854 | @deftypefun int fileno_unlocked (FILE *@var{stream}) | |
855 | The @code{fileno_unlocked} function is equivalent to the @code{fileno} | |
856 | function except that it does not implicitly lock the stream if the state | |
857 | is @code{FSETLOCKING_INTERNAL}. | |
858 | ||
859 | This function is a GNU extension. | |
860 | @end deftypefun | |
861 | ||
28f540f4 RM |
862 | @cindex standard file descriptors |
863 | @cindex file descriptors, standard | |
864 | There are also symbolic constants defined in @file{unistd.h} for the | |
865 | file descriptors belonging to the standard streams @code{stdin}, | |
866 | @code{stdout}, and @code{stderr}; see @ref{Standard Streams}. | |
867 | @pindex unistd.h | |
868 | ||
869 | @comment unistd.h | |
870 | @comment POSIX.1 | |
871 | @table @code | |
872 | @item STDIN_FILENO | |
873 | @vindex STDIN_FILENO | |
874 | This macro has value @code{0}, which is the file descriptor for | |
875 | standard input. | |
876 | @cindex standard input file descriptor | |
877 | ||
878 | @comment unistd.h | |
879 | @comment POSIX.1 | |
880 | @item STDOUT_FILENO | |
881 | @vindex STDOUT_FILENO | |
882 | This macro has value @code{1}, which is the file descriptor for | |
883 | standard output. | |
884 | @cindex standard output file descriptor | |
885 | ||
886 | @comment unistd.h | |
887 | @comment POSIX.1 | |
888 | @item STDERR_FILENO | |
889 | @vindex STDERR_FILENO | |
890 | This macro has value @code{2}, which is the file descriptor for | |
891 | standard error output. | |
892 | @end table | |
893 | @cindex standard error file descriptor | |
894 | ||
895 | @node Stream/Descriptor Precautions | |
896 | @section Dangers of Mixing Streams and Descriptors | |
897 | @cindex channels | |
898 | @cindex streams and descriptors | |
899 | @cindex descriptors and streams | |
900 | @cindex mixing descriptors and streams | |
901 | ||
902 | You can have multiple file descriptors and streams (let's call both | |
903 | streams and descriptors ``channels'' for short) connected to the same | |
904 | file, but you must take care to avoid confusion between channels. There | |
905 | are two cases to consider: @dfn{linked} channels that share a single | |
906 | file position value, and @dfn{independent} channels that have their own | |
907 | file positions. | |
908 | ||
909 | It's best to use just one channel in your program for actual data | |
910 | transfer to any given file, except when all the access is for input. | |
911 | For example, if you open a pipe (something you can only do at the file | |
912 | descriptor level), either do all I/O with the descriptor, or construct a | |
913 | stream from the descriptor with @code{fdopen} and then do all I/O with | |
914 | the stream. | |
915 | ||
916 | @menu | |
917 | * Linked Channels:: Dealing with channels sharing a file position. | |
918 | * Independent Channels:: Dealing with separately opened, unlinked channels. | |
2c6fe0bd | 919 | * Cleaning Streams:: Cleaning a stream makes it safe to use |
28f540f4 RM |
920 | another channel. |
921 | @end menu | |
922 | ||
923 | @node Linked Channels | |
924 | @subsection Linked Channels | |
925 | @cindex linked channels | |
926 | ||
927 | Channels that come from a single opening share the same file position; | |
928 | we call them @dfn{linked} channels. Linked channels result when you | |
929 | make a stream from a descriptor using @code{fdopen}, when you get a | |
930 | descriptor from a stream with @code{fileno}, when you copy a descriptor | |
931 | with @code{dup} or @code{dup2}, and when descriptors are inherited | |
932 | during @code{fork}. For files that don't support random access, such as | |
933 | terminals and pipes, @emph{all} channels are effectively linked. On | |
934 | random-access files, all append-type output streams are effectively | |
935 | linked to each other. | |
936 | ||
937 | @cindex cleaning up a stream | |
938 | If you have been using a stream for I/O, and you want to do I/O using | |
939 | another channel (either a stream or a descriptor) that is linked to it, | |
940 | you must first @dfn{clean up} the stream that you have been using. | |
941 | @xref{Cleaning Streams}. | |
942 | ||
943 | Terminating a process, or executing a new program in the process, | |
944 | destroys all the streams in the process. If descriptors linked to these | |
945 | streams persist in other processes, their file positions become | |
946 | undefined as a result. To prevent this, you must clean up the streams | |
947 | before destroying them. | |
948 | ||
949 | @node Independent Channels | |
950 | @subsection Independent Channels | |
951 | @cindex independent channels | |
952 | ||
953 | When you open channels (streams or descriptors) separately on a seekable | |
954 | file, each channel has its own file position. These are called | |
955 | @dfn{independent channels}. | |
956 | ||
957 | The system handles each channel independently. Most of the time, this | |
958 | is quite predictable and natural (especially for input): each channel | |
959 | can read or write sequentially at its own place in the file. However, | |
960 | if some of the channels are streams, you must take these precautions: | |
961 | ||
962 | @itemize @bullet | |
963 | @item | |
964 | You should clean an output stream after use, before doing anything else | |
965 | that might read or write from the same part of the file. | |
966 | ||
967 | @item | |
968 | You should clean an input stream before reading data that may have been | |
969 | modified using an independent channel. Otherwise, you might read | |
970 | obsolete data that had been in the stream's buffer. | |
971 | @end itemize | |
972 | ||
973 | If you do output to one channel at the end of the file, this will | |
974 | certainly leave the other independent channels positioned somewhere | |
975 | before the new end. You cannot reliably set their file positions to the | |
976 | new end of file before writing, because the file can always be extended | |
977 | by another process between when you set the file position and when you | |
978 | write the data. Instead, use an append-type descriptor or stream; they | |
979 | always output at the current end of the file. In order to make the | |
980 | end-of-file position accurate, you must clean the output channel you | |
981 | were using, if it is a stream. | |
982 | ||
983 | It's impossible for two channels to have separate file pointers for a | |
984 | file that doesn't support random access. Thus, channels for reading or | |
985 | writing such files are always linked, never independent. Append-type | |
986 | channels are also always linked. For these channels, follow the rules | |
987 | for linked channels; see @ref{Linked Channels}. | |
988 | ||
989 | @node Cleaning Streams | |
990 | @subsection Cleaning Streams | |
991 | ||
992 | On the GNU system, you can clean up any stream with @code{fclean}: | |
993 | ||
994 | @comment stdio.h | |
995 | @comment GNU | |
996 | @deftypefun int fclean (FILE *@var{stream}) | |
997 | Clean up the stream @var{stream} so that its buffer is empty. If | |
998 | @var{stream} is doing output, force it out. If @var{stream} is doing | |
999 | input, give the data in the buffer back to the system, arranging to | |
1000 | reread it. | |
1001 | @end deftypefun | |
1002 | ||
1003 | On other systems, you can use @code{fflush} to clean a stream in most | |
1004 | cases. | |
1005 | ||
1006 | You can skip the @code{fclean} or @code{fflush} if you know the stream | |
1007 | is already clean. A stream is clean whenever its buffer is empty. For | |
1008 | example, an unbuffered stream is always clean. An input stream that is | |
1009 | at end-of-file is clean. A line-buffered stream is clean when the last | |
1010 | character output was a newline. | |
1011 | ||
1012 | There is one case in which cleaning a stream is impossible on most | |
1013 | systems. This is when the stream is doing input from a file that is not | |
1014 | random-access. Such streams typically read ahead, and when the file is | |
1015 | not random access, there is no way to give back the excess data already | |
1016 | read. When an input stream reads from a random-access file, | |
1017 | @code{fflush} does clean the stream, but leaves the file pointer at an | |
1018 | unpredictable place; you must set the file pointer before doing any | |
1019 | further I/O. On the GNU system, using @code{fclean} avoids both of | |
1020 | these problems. | |
1021 | ||
1022 | Closing an output-only stream also does @code{fflush}, so this is a | |
1023 | valid way of cleaning an output stream. On the GNU system, closing an | |
1024 | input stream does @code{fclean}. | |
1025 | ||
1026 | You need not clean a stream before using its descriptor for control | |
1027 | operations such as setting terminal modes; these operations don't affect | |
1028 | the file position and are not affected by it. You can use any | |
1029 | descriptor for these operations, and all channels are affected | |
1030 | simultaneously. However, text already ``output'' to a stream but still | |
1031 | buffered by the stream will be subject to the new terminal modes when | |
1032 | subsequently flushed. To make sure ``past'' output is covered by the | |
1033 | terminal settings that were in effect at the time, flush the output | |
1034 | streams for that terminal before setting the modes. @xref{Terminal | |
1035 | Modes}. | |
1036 | ||
07435eb4 UD |
1037 | @node Scatter-Gather |
1038 | @section Fast Scatter-Gather I/O | |
1039 | @cindex scatter-gather | |
1040 | ||
1041 | Some applications may need to read or write data to multiple buffers, | |
04b9968b | 1042 | which are separated in memory. Although this can be done easily enough |
07435eb4 UD |
1043 | with multiple calls to @code{read} and @code{write}, it is inefficent |
1044 | because there is overhead associated with each kernel call. | |
1045 | ||
1046 | Instead, many platforms provide special high-speed primitives to perform | |
1047 | these @dfn{scatter-gather} operations in a single kernel call. The GNU C | |
1048 | library will provide an emulation on any system that lacks these | |
1049 | primitives, so they are not a portability threat. They are defined in | |
1050 | @code{sys/uio.h}. | |
1051 | ||
1052 | These functions are controlled with arrays of @code{iovec} structures, | |
1053 | which describe the location and size of each buffer. | |
1054 | ||
4c450556 UD |
1055 | @comment sys/uio.h |
1056 | @comment BSD | |
07435eb4 UD |
1057 | @deftp {Data Type} {struct iovec} |
1058 | ||
1059 | The @code{iovec} structure describes a buffer. It contains two fields: | |
1060 | ||
1061 | @table @code | |
1062 | ||
1063 | @item void *iov_base | |
1064 | Contains the address of a buffer. | |
1065 | ||
1066 | @item size_t iov_len | |
1067 | Contains the length of the buffer. | |
1068 | ||
1069 | @end table | |
1070 | @end deftp | |
1071 | ||
4c450556 UD |
1072 | @comment sys/uio.h |
1073 | @comment BSD | |
07435eb4 UD |
1074 | @deftypefun ssize_t readv (int @var{filedes}, const struct iovec *@var{vector}, int @var{count}) |
1075 | ||
1076 | The @code{readv} function reads data from @var{filedes} and scatters it | |
1077 | into the buffers described in @var{vector}, which is taken to be | |
1078 | @var{count} structures long. As each buffer is filled, data is sent to the | |
1079 | next. | |
1080 | ||
1081 | Note that @code{readv} is not guaranteed to fill all the buffers. | |
1082 | It may stop at any point, for the same reasons @code{read} would. | |
1083 | ||
1084 | The return value is a count of bytes (@emph{not} buffers) read, @math{0} | |
1085 | indicating end-of-file, or @math{-1} indicating an error. The possible | |
1086 | errors are the same as in @code{read}. | |
1087 | ||
1088 | @end deftypefun | |
1089 | ||
4c450556 UD |
1090 | @comment sys/uio.h |
1091 | @comment BSD | |
07435eb4 UD |
1092 | @deftypefun ssize_t writev (int @var{filedes}, const struct iovec *@var{vector}, int @var{count}) |
1093 | ||
1094 | The @code{writev} function gathers data from the buffers described in | |
1095 | @var{vector}, which is taken to be @var{count} structures long, and writes | |
1096 | them to @code{filedes}. As each buffer is written, it moves on to the | |
1097 | next. | |
1098 | ||
1099 | Like @code{readv}, @code{writev} may stop midstream under the same | |
1100 | conditions @code{write} would. | |
1101 | ||
1102 | The return value is a count of bytes written, or @math{-1} indicating an | |
1103 | error. The possible errors are the same as in @code{write}. | |
1104 | ||
1105 | @end deftypefun | |
1106 | ||
1107 | @c Note - I haven't read this anywhere. I surmised it from my knowledge | |
1108 | @c of computer science. Thus, there could be subtleties I'm missing. | |
1109 | ||
1110 | Note that if the buffers are small (under about 1kB), high-level streams | |
1111 | may be easier to use than these functions. However, @code{readv} and | |
1112 | @code{writev} are more efficient when the individual buffers themselves | |
1113 | (as opposed to the total output), are large. In that case, a high-level | |
1114 | stream would not be able to cache the data effectively. | |
1115 | ||
1116 | @node Memory-mapped I/O | |
1117 | @section Memory-mapped I/O | |
1118 | ||
1119 | On modern operating systems, it is possible to @dfn{mmap} (pronounced | |
1120 | ``em-map'') a file to a region of memory. When this is done, the file can | |
1121 | be accessed just like an array in the program. | |
1122 | ||
04b9968b UD |
1123 | This is more efficent than @code{read} or @code{write}, as only the regions |
1124 | of the file that a program actually accesses are loaded. Accesses to | |
07435eb4 UD |
1125 | not-yet-loaded parts of the mmapped region are handled in the same way as |
1126 | swapped out pages. | |
1127 | ||
b642f101 UD |
1128 | Since mmapped pages can be stored back to their file when physical |
1129 | memory is low, it is possible to mmap files orders of magnitude larger | |
1130 | than both the physical memory @emph{and} swap space. The only limit is | |
1131 | address space. The theoretical limit is 4GB on a 32-bit machine - | |
1132 | however, the actual limit will be smaller since some areas will be | |
1133 | reserved for other purposes. If the LFS interface is used the file size | |
1134 | on 32-bit systems is not limited to 2GB (offsets are signed which | |
1135 | reduces the addressable area of 4GB by half); the full 64-bit are | |
1136 | available. | |
07435eb4 UD |
1137 | |
1138 | Memory mapping only works on entire pages of memory. Thus, addresses | |
1139 | for mapping must be page-aligned, and length values will be rounded up. | |
1140 | To determine the size of a page the machine uses one should use | |
1141 | ||
b642f101 | 1142 | @vindex _SC_PAGESIZE |
07435eb4 UD |
1143 | @smallexample |
1144 | size_t page_size = (size_t) sysconf (_SC_PAGESIZE); | |
1145 | @end smallexample | |
1146 | ||
b642f101 | 1147 | @noindent |
07435eb4 UD |
1148 | These functions are declared in @file{sys/mman.h}. |
1149 | ||
4c450556 UD |
1150 | @comment sys/mman.h |
1151 | @comment POSIX | |
07435eb4 UD |
1152 | @deftypefun {void *} mmap (void *@var{address}, size_t @var{length},int @var{protect}, int @var{flags}, int @var{filedes}, off_t @var{offset}) |
1153 | ||
1154 | The @code{mmap} function creates a new mapping, connected to bytes | |
1155 | (@var{offset}) to (@var{offset} + @var{length}) in the file open on | |
1156 | @var{filedes}. | |
1157 | ||
1158 | @var{address} gives a preferred starting address for the mapping. | |
1159 | @code{NULL} expresses no preference. Any previous mapping at that | |
1160 | address is automatically removed. The address you give may still be | |
1161 | changed, unless you use the @code{MAP_FIXED} flag. | |
1162 | ||
1163 | @vindex PROT_READ | |
1164 | @vindex PROT_WRITE | |
1165 | @vindex PROT_EXEC | |
1166 | @var{protect} contains flags that control what kind of access is | |
1167 | permitted. They include @code{PROT_READ}, @code{PROT_WRITE}, and | |
1168 | @code{PROT_EXEC}, which permit reading, writing, and execution, | |
1169 | respectively. Inappropriate access will cause a segfault (@pxref{Program | |
1170 | Error Signals}). | |
1171 | ||
1172 | Note that most hardware designs cannot support write permission without | |
1173 | read permission, and many do not distinguish read and execute permission. | |
49c091e5 | 1174 | Thus, you may receive wider permissions than you ask for, and mappings of |
07435eb4 UD |
1175 | write-only files may be denied even if you do not use @code{PROT_READ}. |
1176 | ||
1177 | @var{flags} contains flags that control the nature of the map. | |
1178 | One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified. | |
1179 | ||
1180 | They include: | |
1181 | ||
1182 | @vtable @code | |
1183 | @item MAP_PRIVATE | |
1184 | This specifies that writes to the region should never be written back | |
1185 | to the attached file. Instead, a copy is made for the process, and the | |
1186 | region will be swapped normally if memory runs low. No other process will | |
1187 | see the changes. | |
1188 | ||
1189 | Since private mappings effectively revert to ordinary memory | |
1190 | when written to, you must have enough virtual memory for a copy of | |
1191 | the entire mmapped region if you use this mode with @code{PROT_WRITE}. | |
1192 | ||
1193 | @item MAP_SHARED | |
1194 | This specifies that writes to the region will be written back to the | |
1195 | file. Changes made will be shared immediately with other processes | |
1196 | mmaping the same file. | |
1197 | ||
1198 | Note that actual writing may take place at any time. You need to use | |
1199 | @code{msync}, described below, if it is important that other processes | |
1200 | using conventional I/O get a consistent view of the file. | |
1201 | ||
1202 | @item MAP_FIXED | |
1203 | This forces the system to use the exact mapping address specified in | |
1204 | @var{address} and fail if it can't. | |
1205 | ||
1206 | @c One of these is official - the other is obviously an obsolete synonym | |
1207 | @c Which is which? | |
1208 | @item MAP_ANONYMOUS | |
1209 | @itemx MAP_ANON | |
1210 | This flag tells the system to create an anonymous mapping, not connected | |
1211 | to a file. @var{filedes} and @var{off} are ignored, and the region is | |
1212 | initialized with zeros. | |
1213 | ||
1214 | Anonymous maps are used as the basic primitive to extend the heap on some | |
1215 | systems. They are also useful to share data between multiple tasks | |
1216 | without creating a file. | |
1217 | ||
49c091e5 | 1218 | On some systems using private anonymous mmaps is more efficient than using |
07435eb4 UD |
1219 | @code{malloc} for large blocks. This is not an issue with the GNU C library, |
1220 | as the included @code{malloc} automatically uses @code{mmap} where appropriate. | |
1221 | ||
1222 | @c Linux has some other MAP_ options, which I have not discussed here. | |
1223 | @c MAP_DENYWRITE, MAP_EXECUTABLE and MAP_GROWSDOWN don't seem applicable to | |
1224 | @c user programs (and I don't understand the last two). MAP_LOCKED does | |
1225 | @c not appear to be implemented. | |
1226 | ||
1227 | @end vtable | |
1228 | ||
1229 | @code{mmap} returns the address of the new mapping, or @math{-1} for an | |
1230 | error. | |
1231 | ||
1232 | Possible errors include: | |
1233 | ||
1234 | @table @code | |
1235 | ||
1236 | @item EINVAL | |
1237 | ||
1238 | Either @var{address} was unusable, or inconsistent @var{flags} were | |
1239 | given. | |
1240 | ||
1241 | @item EACCES | |
1242 | ||
1243 | @var{filedes} was not open for the type of access specified in @var{protect}. | |
1244 | ||
1245 | @item ENOMEM | |
1246 | ||
1247 | Either there is not enough memory for the operation, or the process is | |
1248 | out of address space. | |
1249 | ||
1250 | @item ENODEV | |
1251 | ||
1252 | This file is of a type that doesn't support mapping. | |
1253 | ||
1254 | @item ENOEXEC | |
1255 | ||
1256 | The file is on a filesystem that doesn't support mapping. | |
1257 | ||
1258 | @c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock. | |
1259 | @c However mandatory locks are not discussed in this manual. | |
1260 | @c | |
1261 | @c Similarly, ETXTBSY will occur if the MAP_DENYWRITE flag (not documented | |
1262 | @c here) is used and the file is already open for writing. | |
1263 | ||
1264 | @end table | |
1265 | ||
1266 | @end deftypefun | |
1267 | ||
4c450556 UD |
1268 | @comment sys/mman.h |
1269 | @comment LFS | |
b642f101 UD |
1270 | @deftypefun {void *} mmap64 (void *@var{address}, size_t @var{length},int @var{protect}, int @var{flags}, int @var{filedes}, off64_t @var{offset}) |
1271 | The @code{mmap64} function is equivalent to the @code{mmap} function but | |
1272 | the @var{offset} parameter is of type @code{off64_t}. On 32-bit systems | |
1273 | this allows the file associated with the @var{filedes} descriptor to be | |
1274 | larger than 2GB. @var{filedes} must be a descriptor returned from a | |
1275 | call to @code{open64} or @code{fopen64} and @code{freopen64} where the | |
1276 | descriptor is retrieved with @code{fileno}. | |
1277 | ||
1278 | When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this | |
1279 | function is actually available under the name @code{mmap}. I.e., the | |
1280 | new, extended API using 64 bit file sizes and offsets transparently | |
1281 | replaces the old API. | |
1282 | @end deftypefun | |
1283 | ||
4c450556 UD |
1284 | @comment sys/mman.h |
1285 | @comment POSIX | |
07435eb4 UD |
1286 | @deftypefun int munmap (void *@var{addr}, size_t @var{length}) |
1287 | ||
1288 | @code{munmap} removes any memory maps from (@var{addr}) to (@var{addr} + | |
1289 | @var{length}). @var{length} should be the length of the mapping. | |
1290 | ||
04b9968b | 1291 | It is safe to unmap multiple mappings in one command, or include unmapped |
07435eb4 | 1292 | space in the range. It is also possible to unmap only part of an existing |
04b9968b | 1293 | mapping. However, only entire pages can be removed. If @var{length} is not |
07435eb4 UD |
1294 | an even number of pages, it will be rounded up. |
1295 | ||
1296 | It returns @math{0} for success and @math{-1} for an error. | |
1297 | ||
1298 | One error is possible: | |
1299 | ||
1300 | @table @code | |
1301 | ||
1302 | @item EINVAL | |
04b9968b | 1303 | The memory range given was outside the user mmap range or wasn't page |
07435eb4 UD |
1304 | aligned. |
1305 | ||
1306 | @end table | |
1307 | ||
1308 | @end deftypefun | |
1309 | ||
4c450556 UD |
1310 | @comment sys/mman.h |
1311 | @comment POSIX | |
07435eb4 UD |
1312 | @deftypefun int msync (void *@var{address}, size_t @var{length}, int @var{flags}) |
1313 | ||
1314 | When using shared mappings, the kernel can write the file at any time | |
1315 | before the mapping is removed. To be certain data has actually been | |
49c091e5 UD |
1316 | written to the file and will be accessible to non-memory-mapped I/O, it |
1317 | is necessary to use this function. | |
07435eb4 UD |
1318 | |
1319 | It operates on the region @var{address} to (@var{address} + @var{length}). | |
1320 | It may be used on part of a mapping or multiple mappings, however the | |
1321 | region given should not contain any unmapped space. | |
1322 | ||
1323 | @var{flags} can contain some options: | |
1324 | ||
1325 | @vtable @code | |
1326 | ||
1327 | @item MS_SYNC | |
1328 | ||
1329 | This flag makes sure the data is actually written @emph{to disk}. | |
1330 | Normally @code{msync} only makes sure that accesses to a file with | |
1331 | conventional I/O reflect the recent changes. | |
1332 | ||
1333 | @item MS_ASYNC | |
1334 | ||
1335 | This tells @code{msync} to begin the synchronization, but not to wait for | |
1336 | it to complete. | |
1337 | ||
1338 | @c Linux also has MS_INVALIDATE, which I don't understand. | |
1339 | ||
1340 | @end vtable | |
1341 | ||
1342 | @code{msync} returns @math{0} for success and @math{-1} for | |
1343 | error. Errors include: | |
1344 | ||
1345 | @table @code | |
1346 | ||
1347 | @item EINVAL | |
1348 | An invalid region was given, or the @var{flags} were invalid. | |
1349 | ||
1350 | @item EFAULT | |
1351 | There is no existing mapping in at least part of the given region. | |
1352 | ||
1353 | @end table | |
1354 | ||
1355 | @end deftypefun | |
1356 | ||
4c450556 UD |
1357 | @comment sys/mman.h |
1358 | @comment GNU | |
07435eb4 UD |
1359 | @deftypefun {void *} mremap (void *@var{address}, size_t @var{length}, size_t @var{new_length}, int @var{flag}) |
1360 | ||
1361 | This function can be used to change the size of an existing memory | |
1362 | area. @var{address} and @var{length} must cover a region entirely mapped | |
1363 | in the same @code{mmap} statement. A new mapping with the same | |
04b9968b | 1364 | characteristics will be returned with the length @var{new_length}. |
07435eb4 UD |
1365 | |
1366 | One option is possible, @code{MREMAP_MAYMOVE}. If it is given in | |
1367 | @var{flags}, the system may remove the existing mapping and create a new | |
1368 | one of the desired length in another location. | |
1369 | ||
1370 | The address of the resulting mapping is returned, or @math{-1}. Possible | |
1371 | error codes include: | |
1372 | ||
07435eb4 UD |
1373 | @table @code |
1374 | ||
1375 | @item EFAULT | |
1376 | There is no existing mapping in at least part of the original region, or | |
1377 | the region covers two or more distinct mappings. | |
1378 | ||
1379 | @item EINVAL | |
1380 | The address given is misaligned or inappropriate. | |
1381 | ||
1382 | @item EAGAIN | |
1383 | The region has pages locked, and if extended it would exceed the | |
1384 | process's resource limit for locked pages. @xref{Limits on Resources}. | |
1385 | ||
1386 | @item ENOMEM | |
04b9968b | 1387 | The region is private writeable, and insufficent virtual memory is |
07435eb4 UD |
1388 | available to extend it. Also, this error will occur if |
1389 | @code{MREMAP_MAYMOVE} is not given and the extension would collide with | |
1390 | another mapped region. | |
1391 | ||
1392 | @end table | |
1393 | @end deftypefun | |
1394 | ||
04b9968b UD |
1395 | This function is only available on a few systems. Except for performing |
1396 | optional optimizations one should not rely on this function. | |
1397 | ||
07435eb4 UD |
1398 | Not all file descriptors may be mapped. Sockets, pipes, and most devices |
1399 | only allow sequential access and do not fit into the mapping abstraction. | |
1400 | In addition, some regular files may not be mmapable, and older kernels may | |
1401 | not support mapping at all. Thus, programs using @code{mmap} should | |
1402 | have a fallback method to use should it fail. @xref{Mmap,,,standards,GNU | |
1403 | Coding Standards}. | |
1404 | ||
1405 | @c XXX madvice documentation missing | |
1406 | ||
28f540f4 RM |
1407 | @node Waiting for I/O |
1408 | @section Waiting for Input or Output | |
1409 | @cindex waiting for input or output | |
1410 | @cindex multiplexing input | |
1411 | @cindex input from multiple files | |
1412 | ||
1413 | Sometimes a program needs to accept input on multiple input channels | |
1414 | whenever input arrives. For example, some workstations may have devices | |
1415 | such as a digitizing tablet, function button box, or dial box that are | |
1416 | connected via normal asynchronous serial interfaces; good user interface | |
1417 | style requires responding immediately to input on any device. Another | |
1418 | example is a program that acts as a server to several other processes | |
1419 | via pipes or sockets. | |
1420 | ||
1421 | You cannot normally use @code{read} for this purpose, because this | |
1422 | blocks the program until input is available on one particular file | |
1423 | descriptor; input on other channels won't wake it up. You could set | |
1424 | nonblocking mode and poll each file descriptor in turn, but this is very | |
1425 | inefficient. | |
1426 | ||
1427 | A better solution is to use the @code{select} function. This blocks the | |
1428 | program until input or output is ready on a specified set of file | |
1429 | descriptors, or until a timer expires, whichever comes first. This | |
1430 | facility is declared in the header file @file{sys/types.h}. | |
1431 | @pindex sys/types.h | |
1432 | ||
1433 | In the case of a server socket (@pxref{Listening}), we say that | |
1434 | ``input'' is available when there are pending connections that could be | |
1435 | accepted (@pxref{Accepting Connections}). @code{accept} for server | |
1436 | sockets blocks and interacts with @code{select} just as @code{read} does | |
1437 | for normal input. | |
1438 | ||
1439 | @cindex file descriptor sets, for @code{select} | |
1440 | The file descriptor sets for the @code{select} function are specified | |
1441 | as @code{fd_set} objects. Here is the description of the data type | |
1442 | and some macros for manipulating these objects. | |
1443 | ||
1444 | @comment sys/types.h | |
1445 | @comment BSD | |
1446 | @deftp {Data Type} fd_set | |
1447 | The @code{fd_set} data type represents file descriptor sets for the | |
1448 | @code{select} function. It is actually a bit array. | |
1449 | @end deftp | |
1450 | ||
1451 | @comment sys/types.h | |
1452 | @comment BSD | |
1453 | @deftypevr Macro int FD_SETSIZE | |
1454 | The value of this macro is the maximum number of file descriptors that a | |
1455 | @code{fd_set} object can hold information about. On systems with a | |
1456 | fixed maximum number, @code{FD_SETSIZE} is at least that number. On | |
1457 | some systems, including GNU, there is no absolute limit on the number of | |
1458 | descriptors open, but this macro still has a constant value which | |
1459 | controls the number of bits in an @code{fd_set}; if you get a file | |
1460 | descriptor with a value as high as @code{FD_SETSIZE}, you cannot put | |
1461 | that descriptor into an @code{fd_set}. | |
1462 | @end deftypevr | |
1463 | ||
1464 | @comment sys/types.h | |
1465 | @comment BSD | |
1466 | @deftypefn Macro void FD_ZERO (fd_set *@var{set}) | |
1467 | This macro initializes the file descriptor set @var{set} to be the | |
1468 | empty set. | |
1469 | @end deftypefn | |
1470 | ||
1471 | @comment sys/types.h | |
1472 | @comment BSD | |
1473 | @deftypefn Macro void FD_SET (int @var{filedes}, fd_set *@var{set}) | |
1474 | This macro adds @var{filedes} to the file descriptor set @var{set}. | |
1475 | @end deftypefn | |
1476 | ||
1477 | @comment sys/types.h | |
1478 | @comment BSD | |
1479 | @deftypefn Macro void FD_CLR (int @var{filedes}, fd_set *@var{set}) | |
1480 | This macro removes @var{filedes} from the file descriptor set @var{set}. | |
1481 | @end deftypefn | |
1482 | ||
1483 | @comment sys/types.h | |
1484 | @comment BSD | |
1485 | @deftypefn Macro int FD_ISSET (int @var{filedes}, fd_set *@var{set}) | |
1486 | This macro returns a nonzero value (true) if @var{filedes} is a member | |
3081378b | 1487 | of the file descriptor set @var{set}, and zero (false) otherwise. |
28f540f4 RM |
1488 | @end deftypefn |
1489 | ||
1490 | Next, here is the description of the @code{select} function itself. | |
1491 | ||
1492 | @comment sys/types.h | |
1493 | @comment BSD | |
1494 | @deftypefun int select (int @var{nfds}, fd_set *@var{read-fds}, fd_set *@var{write-fds}, fd_set *@var{except-fds}, struct timeval *@var{timeout}) | |
1495 | The @code{select} function blocks the calling process until there is | |
1496 | activity on any of the specified sets of file descriptors, or until the | |
1497 | timeout period has expired. | |
1498 | ||
1499 | The file descriptors specified by the @var{read-fds} argument are | |
1500 | checked to see if they are ready for reading; the @var{write-fds} file | |
1501 | descriptors are checked to see if they are ready for writing; and the | |
1502 | @var{except-fds} file descriptors are checked for exceptional | |
1503 | conditions. You can pass a null pointer for any of these arguments if | |
1504 | you are not interested in checking for that kind of condition. | |
1505 | ||
d07e37e2 | 1506 | A file descriptor is considered ready for reading if it is not at end of |
28f540f4 RM |
1507 | file. A server socket is considered ready for reading if there is a |
1508 | pending connection which can be accepted with @code{accept}; | |
1509 | @pxref{Accepting Connections}. A client socket is ready for writing when | |
1510 | its connection is fully established; @pxref{Connecting}. | |
1511 | ||
1512 | ``Exceptional conditions'' does not mean errors---errors are reported | |
1513 | immediately when an erroneous system call is executed, and do not | |
1514 | constitute a state of the descriptor. Rather, they include conditions | |
1515 | such as the presence of an urgent message on a socket. (@xref{Sockets}, | |
1516 | for information on urgent messages.) | |
1517 | ||
1518 | The @code{select} function checks only the first @var{nfds} file | |
1519 | descriptors. The usual thing is to pass @code{FD_SETSIZE} as the value | |
1520 | of this argument. | |
1521 | ||
1522 | The @var{timeout} specifies the maximum time to wait. If you pass a | |
1523 | null pointer for this argument, it means to block indefinitely until one | |
1524 | of the file descriptors is ready. Otherwise, you should provide the | |
1525 | time in @code{struct timeval} format; see @ref{High-Resolution | |
1526 | Calendar}. Specify zero as the time (a @code{struct timeval} containing | |
1527 | all zeros) if you want to find out which descriptors are ready without | |
1528 | waiting if none are ready. | |
1529 | ||
1530 | The normal return value from @code{select} is the total number of ready file | |
1531 | descriptors in all of the sets. Each of the argument sets is overwritten | |
1532 | with information about the descriptors that are ready for the corresponding | |
1533 | operation. Thus, to see if a particular descriptor @var{desc} has input, | |
1534 | use @code{FD_ISSET (@var{desc}, @var{read-fds})} after @code{select} returns. | |
1535 | ||
1536 | If @code{select} returns because the timeout period expires, it returns | |
1537 | a value of zero. | |
1538 | ||
1539 | Any signal will cause @code{select} to return immediately. So if your | |
1540 | program uses signals, you can't rely on @code{select} to keep waiting | |
1541 | for the full time specified. If you want to be sure of waiting for a | |
1542 | particular amount of time, you must check for @code{EINTR} and repeat | |
1543 | the @code{select} with a newly calculated timeout based on the current | |
1544 | time. See the example below. See also @ref{Interrupted Primitives}. | |
1545 | ||
1546 | If an error occurs, @code{select} returns @code{-1} and does not modify | |
2c6fe0bd | 1547 | the argument file descriptor sets. The following @code{errno} error |
28f540f4 RM |
1548 | conditions are defined for this function: |
1549 | ||
1550 | @table @code | |
1551 | @item EBADF | |
1552 | One of the file descriptor sets specified an invalid file descriptor. | |
1553 | ||
1554 | @item EINTR | |
1555 | The operation was interrupted by a signal. @xref{Interrupted Primitives}. | |
1556 | ||
1557 | @item EINVAL | |
1558 | The @var{timeout} argument is invalid; one of the components is negative | |
1559 | or too large. | |
1560 | @end table | |
1561 | @end deftypefun | |
1562 | ||
1563 | @strong{Portability Note:} The @code{select} function is a BSD Unix | |
1564 | feature. | |
1565 | ||
1566 | Here is an example showing how you can use @code{select} to establish a | |
1567 | timeout period for reading from a file descriptor. The @code{input_timeout} | |
1568 | function blocks the calling process until input is available on the | |
1569 | file descriptor, or until the timeout period expires. | |
1570 | ||
1571 | @smallexample | |
1572 | @include select.c.texi | |
1573 | @end smallexample | |
1574 | ||
1575 | There is another example showing the use of @code{select} to multiplex | |
1576 | input from multiple sockets in @ref{Server Example}. | |
1577 | ||
1578 | ||
dfd2257a UD |
1579 | @node Synchronizing I/O |
1580 | @section Synchronizing I/O operations | |
1581 | ||
1582 | @cindex synchronizing | |
04b9968b | 1583 | In most modern operating systems the normal I/O operations are not |
dfd2257a UD |
1584 | executed synchronously. I.e., even if a @code{write} system call |
1585 | returns this does not mean the data is actually written to the media, | |
1586 | e.g., the disk. | |
1587 | ||
04b9968b UD |
1588 | In situations where synchronization points are necessary,you can use |
1589 | special functions which ensure that all operations finish before | |
dfd2257a UD |
1590 | they return. |
1591 | ||
1592 | @comment unistd.h | |
1593 | @comment X/Open | |
1594 | @deftypefun int sync (void) | |
1595 | A call to this function will not return as long as there is data which | |
04b9968b | 1596 | has not been written to the device. All dirty buffers in the kernel will |
dfd2257a UD |
1597 | be written and so an overall consistent system can be achieved (if no |
1598 | other process in parallel writes data). | |
1599 | ||
1600 | A prototype for @code{sync} can be found in @file{unistd.h}. | |
1601 | ||
1602 | The return value is zero to indicate no error. | |
1603 | @end deftypefun | |
1604 | ||
04b9968b UD |
1605 | Programs more often want to ensure that data written to a given file is |
1606 | committed, rather than all data in the system. For this, @code{sync} is overkill. | |
1607 | ||
dfd2257a UD |
1608 | |
1609 | @comment unistd.h | |
1610 | @comment POSIX | |
1611 | @deftypefun int fsync (int @var{fildes}) | |
1612 | The @code{fsync} can be used to make sure all data associated with the | |
1613 | open file @var{fildes} is written to the device associated with the | |
1614 | descriptor. The function call does not return unless all actions have | |
1615 | finished. | |
1616 | ||
1617 | A prototype for @code{fsync} can be found in @file{unistd.h}. | |
1618 | ||
04b9968b | 1619 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
1620 | is a problem if the thread allocates some resources (like memory, file |
1621 | descriptors, semaphores or whatever) at the time @code{fsync} is | |
04b9968b UD |
1622 | called. If the thread gets cancelled these resources stay allocated |
1623 | until the program ends. To avoid this, calls to @code{fsync} should be | |
1624 | protected using cancellation handlers. | |
dfd2257a UD |
1625 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
1626 | ||
49c091e5 | 1627 | The return value of the function is zero if no error occurred. Otherwise |
dfd2257a UD |
1628 | it is @math{-1} and the global variable @var{errno} is set to the |
1629 | following values: | |
1630 | @table @code | |
1631 | @item EBADF | |
1632 | The descriptor @var{fildes} is not valid. | |
1633 | ||
1634 | @item EINVAL | |
1635 | No synchronization is possible since the system does not implement this. | |
1636 | @end table | |
1637 | @end deftypefun | |
1638 | ||
1639 | Sometimes it is not even necessary to write all data associated with a | |
1640 | file descriptor. E.g., in database files which do not change in size it | |
1641 | is enough to write all the file content data to the device. | |
f2ea0f5b | 1642 | Meta-information like the modification time etc. are not that important |
dfd2257a UD |
1643 | and leaving such information uncommitted does not prevent a successful |
1644 | recovering of the file in case of a problem. | |
1645 | ||
1646 | @comment unistd.h | |
1647 | @comment POSIX | |
1648 | @deftypefun int fdatasync (int @var{fildes}) | |
04b9968b | 1649 | When a call to the @code{fdatasync} function returns, it is ensured |
dfd2257a | 1650 | that all of the file data is written to the device. For all pending I/O |
04b9968b | 1651 | operations, the parts guaranteeing data integrity finished. |
dfd2257a UD |
1652 | |
1653 | Not all systems implement the @code{fdatasync} operation. On systems | |
1654 | missing this functionality @code{fdatasync} is emulated by a call to | |
1655 | @code{fsync} since the performed actions are a superset of those | |
1656 | required by @code{fdatasyn}. | |
1657 | ||
1658 | The prototype for @code{fdatasync} is in @file{unistd.h}. | |
1659 | ||
49c091e5 | 1660 | The return value of the function is zero if no error occurred. Otherwise |
dfd2257a UD |
1661 | it is @math{-1} and the global variable @var{errno} is set to the |
1662 | following values: | |
1663 | @table @code | |
1664 | @item EBADF | |
1665 | The descriptor @var{fildes} is not valid. | |
1666 | ||
1667 | @item EINVAL | |
1668 | No synchronization is possible since the system does not implement this. | |
1669 | @end table | |
1670 | @end deftypefun | |
1671 | ||
1672 | ||
b07d03e0 UD |
1673 | @node Asynchronous I/O |
1674 | @section Perform I/O Operations in Parallel | |
1675 | ||
1676 | The POSIX.1b standard defines a new set of I/O operations which can | |
04b9968b | 1677 | significantly reduce the time an application spends waiting at I/O. The |
b07d03e0 | 1678 | new functions allow a program to initiate one or more I/O operations and |
04b9968b UD |
1679 | then immediately resume normal work while the I/O operations are |
1680 | executed in parallel. This functionality is available if the | |
a3a4a74e | 1681 | @file{unistd.h} file defines the symbol @code{_POSIX_ASYNCHRONOUS_IO}. |
b07d03e0 UD |
1682 | |
1683 | These functions are part of the library with realtime functions named | |
1684 | @file{librt}. They are not actually part of the @file{libc} binary. | |
1685 | The implementation of these functions can be done using support in the | |
c756c71c UD |
1686 | kernel (if available) or using an implementation based on threads at |
1687 | userlevel. In the latter case it might be necessary to link applications | |
fed8f7f7 | 1688 | with the thread library @file{libpthread} in addition to @file{librt}. |
b07d03e0 | 1689 | |
c756c71c | 1690 | All AIO operations operate on files which were opened previously. There |
04b9968b | 1691 | might be arbitrarily many operations running for one file. The |
b07d03e0 UD |
1692 | asynchronous I/O operations are controlled using a data structure named |
1693 | @code{struct aiocb} (@dfn{AIO control block}). It is defined in | |
1694 | @file{aio.h} as follows. | |
1695 | ||
1696 | @comment aio.h | |
1697 | @comment POSIX.1b | |
1698 | @deftp {Data Type} {struct aiocb} | |
1699 | The POSIX.1b standard mandates that the @code{struct aiocb} structure | |
1700 | contains at least the members described in the following table. There | |
04b9968b | 1701 | might be more elements which are used by the implementation, but |
b07d03e0 UD |
1702 | depending on these elements is not portable and is highly deprecated. |
1703 | ||
1704 | @table @code | |
1705 | @item int aio_fildes | |
1706 | This element specifies the file descriptor which is used for the | |
1707 | operation. It must be a legal descriptor since otherwise the operation | |
04b9968b | 1708 | fails. |
b07d03e0 UD |
1709 | |
1710 | The device on which the file is opened must allow the seek operation. | |
1711 | I.e., it is not possible to use any of the AIO operations on devices | |
1712 | like terminals where an @code{lseek} call would lead to an error. | |
1713 | ||
1714 | @item off_t aio_offset | |
fed8f7f7 UD |
1715 | This element specifies at which offset in the file the operation (input |
1716 | or output) is performed. Since the operations are carried out in arbitrary | |
b07d03e0 UD |
1717 | order and more than one operation for one file descriptor can be |
1718 | started, one cannot expect a current read/write position of the file | |
1719 | descriptor. | |
1720 | ||
1721 | @item volatile void *aio_buf | |
1722 | This is a pointer to the buffer with the data to be written or the place | |
c756c71c | 1723 | where the read data is stored. |
b07d03e0 UD |
1724 | |
1725 | @item size_t aio_nbytes | |
1726 | This element specifies the length of the buffer pointed to by @code{aio_buf}. | |
1727 | ||
1728 | @item int aio_reqprio | |
c756c71c UD |
1729 | If the platform has defined @code{_POSIX_PRIORITIZED_IO} and |
1730 | @code{_POSIX_PRIORITY_SCHEDULING} the AIO requests are | |
b07d03e0 UD |
1731 | processed based on the current scheduling priority. The |
1732 | @code{aio_reqprio} element can then be used to lower the priority of the | |
1733 | AIO operation. | |
1734 | ||
1735 | @item struct sigevent aio_sigevent | |
1736 | This element specifies how the calling process is notified once the | |
fed8f7f7 | 1737 | operation terminates. If the @code{sigev_notify} element is |
b07d03e0 UD |
1738 | @code{SIGEV_NONE} no notification is send. If it is @code{SIGEV_SIGNAL} |
1739 | the signal determined by @code{sigev_signo} is send. Otherwise | |
fed8f7f7 | 1740 | @code{sigev_notify} must be @code{SIGEV_THREAD}. In this case a thread |
c756c71c | 1741 | is created which starts executing the function pointed to by |
b07d03e0 UD |
1742 | @code{sigev_notify_function}. |
1743 | ||
1744 | @item int aio_lio_opcode | |
1745 | This element is only used by the @code{lio_listio} and | |
04b9968b UD |
1746 | @code{lio_listio64} functions. Since these functions allow an |
1747 | arbitrary number of operations to start at once, and each operation can be | |
1748 | input or output (or nothing), the information must be stored in the | |
b07d03e0 UD |
1749 | control block. The possible values are: |
1750 | ||
1751 | @vtable @code | |
1752 | @item LIO_READ | |
1753 | Start a read operation. Read from the file at position | |
1754 | @code{aio_offset} and store the next @code{aio_nbytes} bytes in the | |
1755 | buffer pointed to by @code{aio_buf}. | |
1756 | ||
1757 | @item LIO_WRITE | |
1758 | Start a write operation. Write @code{aio_nbytes} bytes starting at | |
1759 | @code{aio_buf} into the file starting at position @code{aio_offset}. | |
1760 | ||
1761 | @item LIO_NOP | |
1762 | Do nothing for this control block. This value is useful sometimes when | |
1763 | an array of @code{struct aiocb} values contains holes, i.e., some of the | |
fed8f7f7 | 1764 | values must not be handled although the whole array is presented to the |
b07d03e0 UD |
1765 | @code{lio_listio} function. |
1766 | @end vtable | |
1767 | @end table | |
a3a4a74e | 1768 | |
fed8f7f7 | 1769 | When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a |
04b9968b | 1770 | 32 bit machine this type is in fact @code{struct aiocb64} since the LFS |
a3a4a74e UD |
1771 | interface transparently replaces the @code{struct aiocb} definition. |
1772 | @end deftp | |
1773 | ||
1774 | For use with the AIO functions defined in the LFS there is a similar type | |
1775 | defined which replaces the types of the appropriate members with larger | |
04b9968b | 1776 | types but otherwise is equivalent to @code{struct aiocb}. Particularly, |
a3a4a74e UD |
1777 | all member names are the same. |
1778 | ||
1779 | @comment aio.h | |
1780 | @comment POSIX.1b | |
1781 | @deftp {Data Type} {struct aiocb64} | |
1782 | @table @code | |
1783 | @item int aio_fildes | |
1784 | This element specifies the file descriptor which is used for the | |
1785 | operation. It must be a legal descriptor since otherwise the operation | |
1786 | fails for obvious reasons. | |
1787 | ||
1788 | The device on which the file is opened must allow the seek operation. | |
1789 | I.e., it is not possible to use any of the AIO operations on devices | |
1790 | like terminals where an @code{lseek} call would lead to an error. | |
1791 | ||
1792 | @item off64_t aio_offset | |
04b9968b | 1793 | This element specifies at which offset in the file the operation (input |
a3a4a74e UD |
1794 | or output) is performed. Since the operation are carried in arbitrary |
1795 | order and more than one operation for one file descriptor can be | |
1796 | started, one cannot expect a current read/write position of the file | |
1797 | descriptor. | |
1798 | ||
1799 | @item volatile void *aio_buf | |
1800 | This is a pointer to the buffer with the data to be written or the place | |
1801 | where the ead data is stored. | |
1802 | ||
1803 | @item size_t aio_nbytes | |
1804 | This element specifies the length of the buffer pointed to by @code{aio_buf}. | |
1805 | ||
1806 | @item int aio_reqprio | |
1807 | If for the platform @code{_POSIX_PRIORITIZED_IO} and | |
04b9968b | 1808 | @code{_POSIX_PRIORITY_SCHEDULING} are defined the AIO requests are |
a3a4a74e UD |
1809 | processed based on the current scheduling priority. The |
1810 | @code{aio_reqprio} element can then be used to lower the priority of the | |
1811 | AIO operation. | |
1812 | ||
1813 | @item struct sigevent aio_sigevent | |
1814 | This element specifies how the calling process is notified once the | |
fed8f7f7 | 1815 | operation terminates. If the @code{sigev_notify} element is |
04b9968b UD |
1816 | @code{SIGEV_NONE} no notification is sent. If it is @code{SIGEV_SIGNAL} |
1817 | the signal determined by @code{sigev_signo} is sent. Otherwise | |
a3a4a74e | 1818 | @code{sigev_notify} must be @code{SIGEV_THREAD} in which case a thread |
04b9968b | 1819 | which starts executing the function pointed to by |
a3a4a74e UD |
1820 | @code{sigev_notify_function}. |
1821 | ||
1822 | @item int aio_lio_opcode | |
1823 | This element is only used by the @code{lio_listio} and | |
04b9968b UD |
1824 | @code{[lio_listio64} functions. Since these functions allow an |
1825 | arbitrary number of operations to start at once, and since each operation can be | |
1826 | input or output (or nothing), the information must be stored in the | |
a3a4a74e UD |
1827 | control block. See the description of @code{struct aiocb} for a description |
1828 | of the possible values. | |
1829 | @end table | |
1830 | ||
1831 | When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a | |
04b9968b | 1832 | 32 bit machine this type is available under the name @code{struct |
a3a4a74e | 1833 | aiocb64} since the LFS replaces transparently the old interface. |
b07d03e0 UD |
1834 | @end deftp |
1835 | ||
1836 | @menu | |
a3a4a74e UD |
1837 | * Asynchronous Reads/Writes:: Asynchronous Read and Write Operations. |
1838 | * Status of AIO Operations:: Getting the Status of AIO Operations. | |
1839 | * Synchronizing AIO Operations:: Getting into a consistent state. | |
04b9968b | 1840 | * Cancel AIO Operations:: Cancellation of AIO Operations. |
a3a4a74e | 1841 | * Configuration of AIO:: How to optimize the AIO implementation. |
b07d03e0 UD |
1842 | @end menu |
1843 | ||
a3a4a74e UD |
1844 | @node Asynchronous Reads/Writes |
1845 | @subsection Asynchronous Read and Write Operations | |
b07d03e0 UD |
1846 | |
1847 | @comment aio.h | |
1848 | @comment POSIX.1b | |
1849 | @deftypefun int aio_read (struct aiocb *@var{aiocbp}) | |
04b9968b UD |
1850 | This function initiates an asynchronous read operation. It |
1851 | immediately returns after the operation was enqueued or when an | |
fed8f7f7 | 1852 | error was encountered. |
b07d03e0 | 1853 | |
a3a4a74e | 1854 | The first @code{aiocbp->aio_nbytes} bytes of the file for which |
c756c71c UD |
1855 | @code{aiocbp->aio_fildes} is a descriptor are written to the buffer |
1856 | starting at @code{aiocbp->aio_buf}. Reading starts at the absolute | |
1857 | position @code{aiocbp->aio_offset} in the file. | |
b07d03e0 UD |
1858 | |
1859 | If prioritized I/O is supported by the platform the | |
1860 | @code{aiocbp->aio_reqprio} value is used to adjust the priority before | |
1861 | the request is actually enqueued. | |
1862 | ||
1863 | The calling process is notified about the termination of the read | |
1864 | request according to the @code{aiocbp->aio_sigevent} value. | |
1865 | ||
04b9968b | 1866 | When @code{aio_read} returns, the return value is zero if no error |
b07d03e0 | 1867 | occurred that can be found before the process is enqueued. If such an |
04b9968b UD |
1868 | early error is found, the function returns @math{-1} and sets |
1869 | @code{errno} to one of the following values: | |
b07d03e0 UD |
1870 | |
1871 | @table @code | |
1872 | @item EAGAIN | |
1873 | The request was not enqueued due to (temporarily) exceeded resource | |
1874 | limitations. | |
1875 | @item ENOSYS | |
1876 | The @code{aio_read} function is not implemented. | |
1877 | @item EBADF | |
1878 | The @code{aiocbp->aio_fildes} descriptor is not valid. This condition | |
04b9968b | 1879 | need not be recognized before enqueueing the request and so this error |
fed8f7f7 | 1880 | might also be signaled asynchronously. |
b07d03e0 UD |
1881 | @item EINVAL |
1882 | The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqpiro} value is | |
1883 | invalid. This condition need not be recognized before enqueueing the | |
49c091e5 | 1884 | request and so this error might also be signaled asynchronously. |
b07d03e0 UD |
1885 | @end table |
1886 | ||
04b9968b UD |
1887 | If @code{aio_read} returns zero, the current status of the request |
1888 | can be queried using @code{aio_error} and @code{aio_return} functions. | |
1889 | As long as the value returned by @code{aio_error} is @code{EINPROGRESS} | |
1890 | the operation has not yet completed. If @code{aio_error} returns zero, | |
78759725 UD |
1891 | the operation successfully terminated, otherwise the value is to be |
1892 | interpreted as an error code. If the function terminated, the result of | |
1893 | the operation can be obtained using a call to @code{aio_return}. The | |
1894 | returned value is the same as an equivalent call to @code{read} would | |
04b9968b | 1895 | have returned. Possible error codes returned by @code{aio_error} are: |
b07d03e0 UD |
1896 | |
1897 | @table @code | |
1898 | @item EBADF | |
1899 | The @code{aiocbp->aio_fildes} descriptor is not valid. | |
1900 | @item ECANCELED | |
04b9968b | 1901 | The operation was cancelled before the operation was finished |
b07d03e0 UD |
1902 | (@pxref{Cancel AIO Operations}) |
1903 | @item EINVAL | |
1904 | The @code{aiocbp->aio_offset} value is invalid. | |
1905 | @end table | |
a3a4a74e UD |
1906 | |
1907 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
1908 | function is in fact @code{aio_read64} since the LFS interface transparently | |
1909 | replaces the normal implementation. | |
b07d03e0 UD |
1910 | @end deftypefun |
1911 | ||
1912 | @comment aio.h | |
a3a4a74e | 1913 | @comment Unix98 |
b07d03e0 UD |
1914 | @deftypefun int aio_read64 (struct aiocb *@var{aiocbp}) |
1915 | This function is similar to the @code{aio_read} function. The only | |
04b9968b | 1916 | difference is that on @w{32 bit} machines the file descriptor should |
b07d03e0 | 1917 | be opened in the large file mode. Internally @code{aio_read64} uses |
a3a4a74e UD |
1918 | functionality equivalent to @code{lseek64} (@pxref{File Position |
1919 | Primitive}) to position the file descriptor correctly for the reading, | |
fed8f7f7 | 1920 | as opposed to @code{lseek} functionality used in @code{aio_read}. |
a3a4a74e UD |
1921 | |
1922 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
1923 | function is available under the name @code{aio_read} and so transparently | |
04b9968b | 1924 | replaces the interface for small files on 32 bit machines. |
b07d03e0 UD |
1925 | @end deftypefun |
1926 | ||
a3a4a74e UD |
1927 | To write data asynchronously to a file there exists an equivalent pair |
1928 | of functions with a very similar interface. | |
1929 | ||
1930 | @comment aio.h | |
1931 | @comment POSIX.1b | |
1932 | @deftypefun int aio_write (struct aiocb *@var{aiocbp}) | |
1933 | This function initiates an asynchronous write operation. The function | |
1934 | call immediately returns after the operation was enqueued or if before | |
fed8f7f7 | 1935 | this happens an error was encountered. |
a3a4a74e UD |
1936 | |
1937 | The first @code{aiocbp->aio_nbytes} bytes from the buffer starting at | |
1938 | @code{aiocbp->aio_buf} are written to the file for which | |
1939 | @code{aiocbp->aio_fildes} is an descriptor, starting at the absolute | |
1940 | position @code{aiocbp->aio_offset} in the file. | |
1941 | ||
1942 | If prioritized I/O is supported by the platform the | |
1943 | @code{aiocbp->aio_reqprio} value is used to adjust the priority before | |
1944 | the request is actually enqueued. | |
1945 | ||
1946 | The calling process is notified about the termination of the read | |
1947 | request according to the @code{aiocbp->aio_sigevent} value. | |
1948 | ||
1949 | When @code{aio_write} returns the return value is zero if no error | |
1950 | occurred that can be found before the process is enqueued. If such an | |
1951 | early error is found the function returns @math{-1} and sets | |
1952 | @code{errno} to one of the following values. | |
1953 | ||
1954 | @table @code | |
1955 | @item EAGAIN | |
1956 | The request was not enqueued due to (temporarily) exceeded resource | |
1957 | limitations. | |
1958 | @item ENOSYS | |
1959 | The @code{aio_write} function is not implemented. | |
1960 | @item EBADF | |
1961 | The @code{aiocbp->aio_fildes} descriptor is not valid. This condition | |
fed8f7f7 UD |
1962 | needs not be recognized before enqueueing the request and so this error |
1963 | might also be signaled asynchronously. | |
a3a4a74e UD |
1964 | @item EINVAL |
1965 | The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqpiro} value is | |
fed8f7f7 UD |
1966 | invalid. This condition needs not be recognized before enqueueing the |
1967 | request and so this error might also be signaled asynchronously. | |
a3a4a74e UD |
1968 | @end table |
1969 | ||
1970 | In the case @code{aio_write} returns zero the current status of the | |
1971 | request can be queried using @code{aio_error} and @code{aio_return} | |
c756c71c | 1972 | functions. As long as the value returned by @code{aio_error} is |
a3a4a74e UD |
1973 | @code{EINPROGRESS} the operation has not yet completed. If |
1974 | @code{aio_error} returns zero the operation successfully terminated, | |
1975 | otherwise the value is to be interpreted as an error code. If the | |
1976 | function terminated the result of the operation can be get using a call | |
1977 | to @code{aio_return}. The returned value is the same as an equivalent | |
1978 | call to @code{read} would have returned. Possible error code returned | |
1979 | by @code{aio_error} are: | |
1980 | ||
1981 | @table @code | |
1982 | @item EBADF | |
1983 | The @code{aiocbp->aio_fildes} descriptor is not valid. | |
1984 | @item ECANCELED | |
04b9968b | 1985 | The operation was cancelled before the operation was finished |
a3a4a74e UD |
1986 | (@pxref{Cancel AIO Operations}) |
1987 | @item EINVAL | |
1988 | The @code{aiocbp->aio_offset} value is invalid. | |
1989 | @end table | |
1990 | ||
1991 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
1992 | function is in fact @code{aio_write64} since the LFS interface transparently | |
1993 | replaces the normal implementation. | |
1994 | @end deftypefun | |
1995 | ||
1996 | @comment aio.h | |
1997 | @comment Unix98 | |
1998 | @deftypefun int aio_write64 (struct aiocb *@var{aiocbp}) | |
1999 | This function is similar to the @code{aio_write} function. The only | |
04b9968b | 2000 | difference is that on @w{32 bit} machines the file descriptor should |
a3a4a74e UD |
2001 | be opened in the large file mode. Internally @code{aio_write64} uses |
2002 | functionality equivalent to @code{lseek64} (@pxref{File Position | |
2003 | Primitive}) to position the file descriptor correctly for the writing, | |
fed8f7f7 | 2004 | as opposed to @code{lseek} functionality used in @code{aio_write}. |
a3a4a74e UD |
2005 | |
2006 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2007 | function is available under the name @code{aio_write} and so transparently | |
04b9968b | 2008 | replaces the interface for small files on 32 bit machines. |
a3a4a74e UD |
2009 | @end deftypefun |
2010 | ||
2011 | Beside these functions with the more or less traditional interface | |
2012 | POSIX.1b also defines a function with can initiate more than one | |
2013 | operation at once and which can handled freely mixed read and write | |
2014 | operation. It is therefore similar to a combination of @code{readv} and | |
2015 | @code{writev}. | |
2016 | ||
2017 | @comment aio.h | |
2018 | @comment POSIX.1b | |
2019 | @deftypefun int lio_listio (int @var{mode}, struct aiocb *const @var{list}[], int @var{nent}, struct sigevent *@var{sig}) | |
2020 | The @code{lio_listio} function can be used to enqueue an arbitrary | |
2021 | number of read and write requests at one time. The requests can all be | |
2022 | meant for the same file, all for different files or every solution in | |
2023 | between. | |
2024 | ||
2025 | @code{lio_listio} gets the @var{nent} requests from the array pointed to | |
2026 | by @var{list}. What operation has to be performed is determined by the | |
2027 | @code{aio_lio_opcode} member in each element of @var{list}. If this | |
2028 | field is @code{LIO_READ} an read operation is queued, similar to a call | |
2029 | of @code{aio_read} for this element of the array (except that the way | |
2030 | the termination is signalled is different, as we will see below). If | |
2031 | the @code{aio_lio_opcode} member is @code{LIO_WRITE} an write operation | |
2032 | is enqueued. Otherwise the @code{aio_lio_opcode} must be @code{LIO_NOP} | |
2033 | in which case this element of @var{list} is simply ignored. This | |
2034 | ``operation'' is useful in situations where one has a fixed array of | |
2035 | @code{struct aiocb} elements from which only a few need to be handled at | |
2036 | a time. Another situation is where the @code{lio_listio} call was | |
2037 | cancelled before all requests are processed (@pxref{Cancel AIO | |
2038 | Operations}) and the remaining requests have to be reissued. | |
2039 | ||
fed8f7f7 | 2040 | The other members of each element of the array pointed to by |
a3a4a74e UD |
2041 | @code{list} must have values suitable for the operation as described in |
2042 | the documentation for @code{aio_read} and @code{aio_write} above. | |
2043 | ||
2044 | The @var{mode} argument determines how @code{lio_listio} behaves after | |
2045 | having enqueued all the requests. If @var{mode} is @code{LIO_WAIT} it | |
2046 | waits until all requests terminated. Otherwise @var{mode} must be | |
fed8f7f7 | 2047 | @code{LIO_NOWAIT} and in this case the function returns immediately after |
a3a4a74e UD |
2048 | having enqueued all the requests. In this case the caller gets a |
2049 | notification of the termination of all requests according to the | |
2050 | @var{sig} parameter. If @var{sig} is @code{NULL} no notification is | |
2051 | send. Otherwise a signal is sent or a thread is started, just as | |
2052 | described in the description for @code{aio_read} or @code{aio_write}. | |
2053 | ||
2054 | If @var{mode} is @code{LIO_WAIT} the return value of @code{lio_listio} | |
2055 | is @math{0} when all requests completed successfully. Otherwise the | |
2056 | function return @math{-1} and @code{errno} is set accordingly. To find | |
2057 | out which request or requests failed one has to use the @code{aio_error} | |
2058 | function on all the elements of the array @var{list}. | |
2059 | ||
2060 | In case @var{mode} is @code{LIO_NOWAIT} the function return @math{0} if | |
2061 | all requests were enqueued correctly. The current state of the requests | |
2062 | can be found using @code{aio_error} and @code{aio_return} as described | |
2063 | above. In case @code{lio_listio} returns @math{-1} in this mode the | |
2064 | global variable @code{errno} is set accordingly. If a request did not | |
2065 | yet terminate a call to @code{aio_error} returns @code{EINPROGRESS}. If | |
2066 | the value is different the request is finished and the error value (or | |
2067 | @math{0}) is returned and the result of the operation can be retrieved | |
2068 | using @code{aio_return}. | |
2069 | ||
2070 | Possible values for @code{errno} are: | |
2071 | ||
2072 | @table @code | |
2073 | @item EAGAIN | |
2074 | The resources necessary to queue all the requests are not available in | |
2075 | the moment. The error status for each element of @var{list} must be | |
2076 | checked which request failed. | |
2077 | ||
fed8f7f7 | 2078 | Another reason could be that the system wide limit of AIO requests is |
a3a4a74e UD |
2079 | exceeded. This cannot be the case for the implementation on GNU systems |
2080 | since no arbitrary limits exist. | |
2081 | @item EINVAL | |
2082 | The @var{mode} parameter is invalid or @var{nent} is larger than | |
2083 | @code{AIO_LISTIO_MAX}. | |
2084 | @item EIO | |
2085 | One or more of the request's I/O operations failed. The error status of | |
fed8f7f7 | 2086 | each request should be checked for which one failed. |
a3a4a74e UD |
2087 | @item ENOSYS |
2088 | The @code{lio_listio} function is not supported. | |
2089 | @end table | |
2090 | ||
2091 | If the @var{mode} parameter is @code{LIO_NOWAIT} and the caller cancels | |
2092 | an request the error status for this request returned by | |
2093 | @code{aio_error} is @code{ECANCELED}. | |
2094 | ||
2095 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2096 | function is in fact @code{lio_listio64} since the LFS interface | |
2097 | transparently replaces the normal implementation. | |
2098 | @end deftypefun | |
2099 | ||
2100 | @comment aio.h | |
2101 | @comment Unix98 | |
2102 | @deftypefun int lio_listio64 (int @var{mode}, struct aiocb *const @var{list}, int @var{nent}, struct sigevent *@var{sig}) | |
2103 | This function is similar to the @code{aio_listio} function. The only | |
04b9968b | 2104 | difference is that only @w{32 bit} machines the file descriptor should |
a3a4a74e UD |
2105 | be opened in the large file mode. Internally @code{lio_listio64} uses |
2106 | functionality equivalent to @code{lseek64} (@pxref{File Position | |
2107 | Primitive}) to position the file descriptor correctly for the reading or | |
fed8f7f7 | 2108 | writing, as opposed to @code{lseek} functionality used in |
a3a4a74e UD |
2109 | @code{lio_listio}. |
2110 | ||
2111 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2112 | function is available under the name @code{lio_listio} and so | |
04b9968b | 2113 | transparently replaces the interface for small files on 32 bit |
a3a4a74e UD |
2114 | machines. |
2115 | @end deftypefun | |
2116 | ||
2117 | @node Status of AIO Operations | |
2118 | @subsection Getting the Status of AIO Operations | |
2119 | ||
fed8f7f7 | 2120 | As already described in the documentation of the functions in the last |
04b9968b UD |
2121 | section, it must be possible to get information about the status of an I/O |
2122 | request. When the operation is performed truly asynchronously (as with | |
a3a4a74e UD |
2123 | @code{aio_read} and @code{aio_write} and with @code{aio_listio} when the |
2124 | mode is @code{LIO_NOWAIT}) one sometimes needs to know whether a | |
04b9968b UD |
2125 | specific request already terminated and if yes, what the result was. |
2126 | The following two functions allow you to get this kind of information. | |
a3a4a74e UD |
2127 | |
2128 | @comment aio.h | |
2129 | @comment POSIX.1b | |
2130 | @deftypefun int aio_error (const struct aiocb *@var{aiocbp}) | |
2131 | This function determines the error state of the request described by the | |
fed8f7f7 | 2132 | @code{struct aiocb} variable pointed to by @var{aiocbp}. If the |
a3a4a74e UD |
2133 | request has not yet terminated the value returned is always |
2134 | @code{EINPROGRESS}. Once the request has terminated the value | |
2135 | @code{aio_error} returns is either @math{0} if the request completed | |
fed8f7f7 | 2136 | successfully or it returns the value which would be stored in the |
a3a4a74e UD |
2137 | @code{errno} variable if the request would have been done using |
2138 | @code{read}, @code{write}, or @code{fsync}. | |
2139 | ||
2140 | The function can return @code{ENOSYS} if it is not implemented. It | |
2141 | could also return @code{EINVAL} if the @var{aiocbp} parameter does not | |
2142 | refer to an asynchronous operation whose return status is not yet known. | |
2143 | ||
2144 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2145 | function is in fact @code{aio_error64} since the LFS interface | |
2146 | transparently replaces the normal implementation. | |
2147 | @end deftypefun | |
2148 | ||
2149 | @comment aio.h | |
2150 | @comment Unix98 | |
2151 | @deftypefun int aio_error64 (const struct aiocb64 *@var{aiocbp}) | |
2152 | This function is similar to @code{aio_error} with the only difference | |
2153 | that the argument is a reference to a variable of type @code{struct | |
2154 | aiocb64}. | |
2155 | ||
2156 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2157 | function is available under the name @code{aio_error} and so | |
04b9968b | 2158 | transparently replaces the interface for small files on 32 bit |
a3a4a74e UD |
2159 | machines. |
2160 | @end deftypefun | |
2161 | ||
2162 | @comment aio.h | |
2163 | @comment POSIX.1b | |
2164 | @deftypefun ssize_t aio_return (const struct aiocb *@var{aiocbp}) | |
2165 | This function can be used to retrieve the return status of the operation | |
2166 | carried out by the request described in the variable pointed to by | |
2167 | @var{aiocbp}. As long as the error status of this request as returned | |
2168 | by @code{aio_error} is @code{EINPROGRESS} the return of this function is | |
2169 | undefined. | |
2170 | ||
fed8f7f7 UD |
2171 | Once the request is finished this function can be used exactly once to |
2172 | retrieve the return value. Following calls might lead to undefined | |
a3a4a74e UD |
2173 | behaviour. The return value itself is the value which would have been |
2174 | returned by the @code{read}, @code{write}, or @code{fsync} call. | |
2175 | ||
2176 | The function can return @code{ENOSYS} if it is not implemented. It | |
2177 | could also return @code{EINVAL} if the @var{aiocbp} parameter does not | |
2178 | refer to an asynchronous operation whose return status is not yet known. | |
2179 | ||
2180 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2181 | function is in fact @code{aio_return64} since the LFS interface | |
2182 | transparently replaces the normal implementation. | |
2183 | @end deftypefun | |
2184 | ||
2185 | @comment aio.h | |
2186 | @comment Unix98 | |
2187 | @deftypefun int aio_return64 (const struct aiocb64 *@var{aiocbp}) | |
2188 | This function is similar to @code{aio_return} with the only difference | |
2189 | that the argument is a reference to a variable of type @code{struct | |
2190 | aiocb64}. | |
2191 | ||
2192 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2193 | function is available under the name @code{aio_return} and so | |
04b9968b | 2194 | transparently replaces the interface for small files on 32 bit |
a3a4a74e UD |
2195 | machines. |
2196 | @end deftypefun | |
2197 | ||
2198 | @node Synchronizing AIO Operations | |
2199 | @subsection Getting into a Consistent State | |
2200 | ||
2201 | When dealing with asynchronous operations it is sometimes necessary to | |
fed8f7f7 | 2202 | get into a consistent state. This would mean for AIO that one wants to |
a3a4a74e UD |
2203 | know whether a certain request or a group of request were processed. |
2204 | This could be done by waiting for the notification sent by the system | |
04b9968b | 2205 | after the operation terminated, but this sometimes would mean wasting |
a3a4a74e UD |
2206 | resources (mainly computation time). Instead POSIX.1b defines two |
2207 | functions which will help with most kinds of consistency. | |
2208 | ||
2209 | The @code{aio_fsync} and @code{aio_fsync64} functions are only available | |
2210 | if in @file{unistd.h} the symbol @code{_POSIX_SYNCHRONIZED_IO} is | |
2211 | defined. | |
2212 | ||
2213 | @cindex synchronizing | |
2214 | @comment aio.h | |
2215 | @comment POSIX.1b | |
2216 | @deftypefun int aio_fsync (int @var{op}, struct aiocb *@var{aiocbp}) | |
2217 | Calling this function forces all I/O operations operating queued at the | |
fed8f7f7 | 2218 | time of the function call operating on the file descriptor |
a3a4a74e | 2219 | @code{aiocbp->aio_fildes} into the synchronized I/O completion state |
04b9968b | 2220 | (@pxref{Synchronizing I/O}). The @code{aio_fsync} function returns |
a3a4a74e UD |
2221 | immediately but the notification through the method described in |
2222 | @code{aiocbp->aio_sigevent} will happen only after all requests for this | |
04b9968b | 2223 | file descriptor have terminated and the file is synchronized. This also |
a3a4a74e | 2224 | means that requests for this very same file descriptor which are queued |
04b9968b | 2225 | after the synchronization request are not affected. |
a3a4a74e UD |
2226 | |
2227 | If @var{op} is @code{O_DSYNC} the synchronization happens as with a call | |
2228 | to @code{fdatasync}. Otherwise @var{op} should be @code{O_SYNC} and | |
fed8f7f7 | 2229 | the synchronization happens as with @code{fsync}. |
a3a4a74e | 2230 | |
fed8f7f7 | 2231 | As long as the synchronization has not happened a call to |
a3a4a74e | 2232 | @code{aio_error} with the reference to the object pointed to by |
fed8f7f7 UD |
2233 | @var{aiocbp} returns @code{EINPROGRESS}. Once the synchronization is |
2234 | done @code{aio_error} return @math{0} if the synchronization was not | |
a3a4a74e UD |
2235 | successful. Otherwise the value returned is the value to which the |
2236 | @code{fsync} or @code{fdatasync} function would have set the | |
2237 | @code{errno} variable. In this case nothing can be assumed about the | |
2238 | consistency for the data written to this file descriptor. | |
2239 | ||
2240 | The return value of this function is @math{0} if the request was | |
2241 | successfully filed. Otherwise the return value is @math{-1} and | |
2242 | @code{errno} is set to one of the following values: | |
2243 | ||
2244 | @table @code | |
2245 | @item EAGAIN | |
fed8f7f7 | 2246 | The request could not be enqueued due to temporary lack of resources. |
a3a4a74e UD |
2247 | @item EBADF |
2248 | The file descriptor @code{aiocbp->aio_fildes} is not valid or not open | |
2249 | for writing. | |
2250 | @item EINVAL | |
2251 | The implementation does not support I/O synchronization or the @var{op} | |
2252 | parameter is other than @code{O_DSYNC} and @code{O_SYNC}. | |
2253 | @item ENOSYS | |
2254 | This function is not implemented. | |
2255 | @end table | |
2256 | ||
2257 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2258 | function is in fact @code{aio_return64} since the LFS interface | |
2259 | transparently replaces the normal implementation. | |
2260 | @end deftypefun | |
2261 | ||
2262 | @comment aio.h | |
2263 | @comment Unix98 | |
2264 | @deftypefun int aio_fsync64 (int @var{op}, struct aiocb64 *@var{aiocbp}) | |
2265 | This function is similar to @code{aio_fsync} with the only difference | |
2266 | that the argument is a reference to a variable of type @code{struct | |
2267 | aiocb64}. | |
2268 | ||
2269 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2270 | function is available under the name @code{aio_fsync} and so | |
04b9968b | 2271 | transparently replaces the interface for small files on 32 bit |
a3a4a74e UD |
2272 | machines. |
2273 | @end deftypefun | |
2274 | ||
fed8f7f7 | 2275 | Another method of synchronization is to wait until one or more requests of a |
a3a4a74e UD |
2276 | specific set terminated. This could be achieved by the @code{aio_*} |
2277 | functions to notify the initiating process about the termination but in | |
2278 | some situations this is not the ideal solution. In a program which | |
2279 | constantly updates clients somehow connected to the server it is not | |
2280 | always the best solution to go round robin since some connections might | |
2281 | be slow. On the other hand letting the @code{aio_*} function notify the | |
2282 | caller might also be not the best solution since whenever the process | |
2283 | works on preparing data for on client it makes no sense to be | |
2284 | interrupted by a notification since the new client will not be handled | |
2285 | before the current client is served. For situations like this | |
2286 | @code{aio_suspend} should be used. | |
2287 | ||
2288 | @comment aio.h | |
2289 | @comment POSIX.1b | |
2290 | @deftypefun int aio_suspend (const struct aiocb *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout}) | |
2291 | When calling this function the calling thread is suspended until at | |
2292 | least one of the requests pointed to by the @var{nent} elements of the | |
2293 | array @var{list} has completed. If any of the requests already has | |
2294 | completed at the time @code{aio_suspend} is called the function returns | |
2295 | immediately. Whether a request has terminated or not is done by | |
2296 | comparing the error status of the request with @code{EINPROGRESS}. If | |
2297 | an element of @var{list} is @code{NULL} the entry is simply ignored. | |
2298 | ||
2299 | If no request has finished the calling process is suspended. If | |
2300 | @var{timeout} is @code{NULL} the process is not waked until a request | |
2301 | finished. If @var{timeout} is not @code{NULL} the process remains | |
2302 | suspended at as long as specified in @var{timeout}. In this case | |
2303 | @code{aio_suspend} returns with an error. | |
2304 | ||
fed8f7f7 | 2305 | The return value of the function is @math{0} if one or more requests |
a3a4a74e UD |
2306 | from the @var{list} have terminated. Otherwise the function returns |
2307 | @math{-1} and @code{errno} is set to one of the following values: | |
2308 | ||
2309 | @table @code | |
2310 | @item EAGAIN | |
2311 | None of the requests from the @var{list} completed in the time specified | |
2312 | by @var{timeout}. | |
2313 | @item EINTR | |
2314 | A signal interrupted the @code{aio_suspend} function. This signal might | |
2315 | also be sent by the AIO implementation while signalling the termination | |
2316 | of one of the requests. | |
2317 | @item ENOSYS | |
2318 | The @code{aio_suspend} function is not implemented. | |
2319 | @end table | |
2320 | ||
2321 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2322 | function is in fact @code{aio_suspend64} since the LFS interface | |
2323 | transparently replaces the normal implementation. | |
2324 | @end deftypefun | |
2325 | ||
2326 | @comment aio.h | |
2327 | @comment Unix98 | |
2328 | @deftypefun int aio_suspend64 (const struct aiocb64 *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout}) | |
2329 | This function is similar to @code{aio_suspend} with the only difference | |
2330 | that the argument is a reference to a variable of type @code{struct | |
2331 | aiocb64}. | |
2332 | ||
2333 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2334 | function is available under the name @code{aio_suspend} and so | |
04b9968b | 2335 | transparently replaces the interface for small files on 32 bit |
a3a4a74e UD |
2336 | machines. |
2337 | @end deftypefun | |
b07d03e0 UD |
2338 | |
2339 | @node Cancel AIO Operations | |
04b9968b | 2340 | @subsection Cancellation of AIO Operations |
b07d03e0 | 2341 | |
a3a4a74e UD |
2342 | When one or more requests are asynchronously processed it might be |
2343 | useful in some situations to cancel a selected operation, e.g., if it | |
2344 | becomes obvious that the written data is not anymore accurate and would | |
2345 | have to be overwritten soon. As an example assume an application, which | |
2346 | writes data in files in a situation where new incoming data would have | |
2347 | to be written in a file which will be updated by an enqueued request. | |
2348 | The POSIX AIO implementation provides such a function but this function | |
04b9968b | 2349 | is not capable to force the cancellation of the request. It is up to the |
a3a4a74e UD |
2350 | implementation to decide whether it is possible to cancel the operation |
2351 | or not. Therefore using this function is merely a hint. | |
2352 | ||
2353 | @comment aio.h | |
2354 | @comment POSIX.1b | |
2355 | @deftypefun int aio_cancel (int @var{fildes}, struct aiocb *@var{aiocbp}) | |
2356 | The @code{aio_cancel} function can be used to cancel one or more | |
2357 | outstanding requests. If the @var{aiocbp} parameter is @code{NULL} the | |
2358 | function tries to cancel all outstanding requests which would process | |
2359 | the file descriptor @var{fildes} (i.e.,, whose @code{aio_fildes} member | |
2360 | is @var{fildes}). If @var{aiocbp} is not @code{NULL} the very specific | |
04b9968b | 2361 | request pointed to by @var{aiocbp} is tried to be cancelled. |
a3a4a74e | 2362 | |
04b9968b | 2363 | For requests which were successfully cancelled the normal notification |
a3a4a74e UD |
2364 | about the termination of the request should take place. I.e., depending |
2365 | on the @code{struct sigevent} object which controls this, nothing | |
2366 | happens, a signal is sent or a thread is started. If the request cannot | |
04b9968b | 2367 | be cancelled it terminates the usual way after performing te operation. |
a3a4a74e | 2368 | |
04b9968b | 2369 | After a request is successfully cancelled a call to @code{aio_error} with |
a3a4a74e UD |
2370 | a reference to this request as the parameter will return |
2371 | @code{ECANCELED} and a call to @code{aio_return} will return @math{-1}. | |
04b9968b | 2372 | If the request wasn't cancelled and is still running the error status is |
a3a4a74e UD |
2373 | still @code{EINPROGRESS}. |
2374 | ||
2375 | The return value of the function is @code{AIO_CANCELED} if there were | |
04b9968b UD |
2376 | requests which haven't terminated and which successfully were cancelled. |
2377 | If there is one or more request left which couldn't be cancelled the | |
a3a4a74e UD |
2378 | return value is @code{AIO_NOTCANCELED}. In this case @code{aio_error} |
2379 | must be used to find out which of the perhaps multiple requests (in | |
04b9968b | 2380 | @var{aiocbp} is @code{NULL}) wasn't successfully cancelled. If all |
a3a4a74e UD |
2381 | requests already terminated at the time @code{aio_cancel} is called the |
2382 | return value is @code{AIO_ALLDONE}. | |
2383 | ||
2384 | If an error occurred during the execution of @code{aio_cancel} the | |
2385 | function returns @math{-1} and sets @code{errno} to one of the following | |
2386 | values. | |
2387 | ||
2388 | @table @code | |
2389 | @item EBADF | |
2390 | The file descriptor @var{fildes} is not valid. | |
2391 | @item ENOSYS | |
2392 | @code{aio_cancel} is not implemented. | |
2393 | @end table | |
2394 | ||
2395 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2396 | function is in fact @code{aio_cancel64} since the LFS interface | |
2397 | transparently replaces the normal implementation. | |
2398 | @end deftypefun | |
2399 | ||
2400 | @comment aio.h | |
2401 | @comment Unix98 | |
2402 | @deftypefun int aio_cancel64 (int @var{fildes}, struct aiocb *@var{aiocbp}) | |
2403 | This function is similar to @code{aio_cancel} with the only difference | |
2404 | that the argument is a reference to a variable of type @code{struct | |
2405 | aiocb64}. | |
2406 | ||
2407 | When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this | |
2408 | function is available under the name @code{aio_cancel} and so | |
04b9968b | 2409 | transparently replaces the interface for small files on 32 bit |
a3a4a74e UD |
2410 | machines. |
2411 | @end deftypefun | |
2412 | ||
2413 | @node Configuration of AIO | |
2414 | @subsection How to optimize the AIO implementation | |
2415 | ||
2416 | The POSIX standard does not specify how the AIO functions are | |
2417 | implemented. They could be system calls but it is also possible to | |
2418 | emulate them at userlevel. | |
2419 | ||
fed8f7f7 | 2420 | At least the available implementation at the point of this writing is a |
a3a4a74e UD |
2421 | userlevel implementation which uses threads for handling the enqueued |
2422 | requests. This implementation requires to make some decisions about | |
2423 | limitations but hard limitations are something which better should be | |
2424 | avoided the GNU C library implementation provides a mean to tune the AIO | |
2425 | implementation individually for each use. | |
2426 | ||
2427 | @comment aio.h | |
2428 | @comment GNU | |
2429 | @deftp {Data Type} {struct aioinit} | |
2430 | This data type is used to pass the configuration or tunable parameters | |
2431 | to the implementation. The program has to initialize the members of | |
2432 | this struct and pass it to the implementation using the @code{aio_init} | |
2433 | function. | |
2434 | ||
2435 | @table @code | |
2436 | @item int aio_threads | |
2437 | This member specifies the maximal number of threads which must be used | |
2438 | at any one time. | |
2439 | @item int aio_num | |
c756c71c | 2440 | This number provides an estimate on the maximal number of simultaneously |
a3a4a74e UD |
2441 | enqueued requests. |
2442 | @item int aio_locks | |
2443 | @c What? | |
2444 | @item int aio_usedba | |
2445 | @c What? | |
2446 | @item int aio_debug | |
2447 | @c What? | |
2448 | @item int aio_numusers | |
2449 | @c What? | |
2450 | @item int aio_reserved[2] | |
2451 | @c What? | |
2452 | @end table | |
2453 | @end deftp | |
2454 | ||
2455 | @comment aio.h | |
2456 | @comment GNU | |
2457 | @deftypefun void aio_init (const struct aioinit *@var{init}) | |
2458 | This function must be called before any other AIO function. Calling it | |
2459 | is completely voluntarily since it only is meant to help the AIO | |
2460 | implementation to perform better. | |
2461 | ||
2462 | Before calling the @code{aio_init} function the members of a variable of | |
2463 | type @code{struct aioinit} must be initialized. Then a reference to | |
2464 | this variable is passed as the parameter to @code{aio_init} which itself | |
2465 | may or may not pay attention to the hints. | |
2466 | ||
c756c71c UD |
2467 | The function has no return value and no error cases are defined. It is |
2468 | a extension which follows a proposal from the SGI implementation in | |
2469 | @w{Irix 6}. It is not covered by POSIX.1b or Unix98. | |
a3a4a74e | 2470 | @end deftypefun |
b07d03e0 | 2471 | |
28f540f4 RM |
2472 | @node Control Operations |
2473 | @section Control Operations on Files | |
2474 | ||
2475 | @cindex control operations on files | |
2476 | @cindex @code{fcntl} function | |
2477 | This section describes how you can perform various other operations on | |
2478 | file descriptors, such as inquiring about or setting flags describing | |
2479 | the status of the file descriptor, manipulating record locks, and the | |
2480 | like. All of these operations are performed by the function @code{fcntl}. | |
2481 | ||
2482 | The second argument to the @code{fcntl} function is a command that | |
2483 | specifies which operation to perform. The function and macros that name | |
2484 | various flags that are used with it are declared in the header file | |
2485 | @file{fcntl.h}. Many of these flags are also used by the @code{open} | |
2486 | function; see @ref{Opening and Closing Files}. | |
2487 | @pindex fcntl.h | |
2488 | ||
2489 | @comment fcntl.h | |
2490 | @comment POSIX.1 | |
2491 | @deftypefun int fcntl (int @var{filedes}, int @var{command}, @dots{}) | |
2492 | The @code{fcntl} function performs the operation specified by | |
2493 | @var{command} on the file descriptor @var{filedes}. Some commands | |
2494 | require additional arguments to be supplied. These additional arguments | |
2495 | and the return value and error conditions are given in the detailed | |
2496 | descriptions of the individual commands. | |
2497 | ||
2498 | Briefly, here is a list of what the various commands are. | |
2499 | ||
2500 | @table @code | |
2501 | @item F_DUPFD | |
2502 | Duplicate the file descriptor (return another file descriptor pointing | |
2503 | to the same open file). @xref{Duplicating Descriptors}. | |
2504 | ||
2505 | @item F_GETFD | |
2506 | Get flags associated with the file descriptor. @xref{Descriptor Flags}. | |
2507 | ||
2508 | @item F_SETFD | |
2509 | Set flags associated with the file descriptor. @xref{Descriptor Flags}. | |
2510 | ||
2511 | @item F_GETFL | |
2512 | Get flags associated with the open file. @xref{File Status Flags}. | |
2513 | ||
2514 | @item F_SETFL | |
2515 | Set flags associated with the open file. @xref{File Status Flags}. | |
2516 | ||
2517 | @item F_GETLK | |
2518 | Get a file lock. @xref{File Locks}. | |
2519 | ||
2520 | @item F_SETLK | |
2521 | Set or clear a file lock. @xref{File Locks}. | |
2522 | ||
2523 | @item F_SETLKW | |
2524 | Like @code{F_SETLK}, but wait for completion. @xref{File Locks}. | |
2525 | ||
2526 | @item F_GETOWN | |
2527 | Get process or process group ID to receive @code{SIGIO} signals. | |
2528 | @xref{Interrupt Input}. | |
2529 | ||
2530 | @item F_SETOWN | |
2531 | Set process or process group ID to receive @code{SIGIO} signals. | |
2532 | @xref{Interrupt Input}. | |
2533 | @end table | |
dfd2257a | 2534 | |
04b9968b | 2535 | This function is a cancellation point in multi-threaded programs. This |
dfd2257a UD |
2536 | is a problem if the thread allocates some resources (like memory, file |
2537 | descriptors, semaphores or whatever) at the time @code{fcntl} is | |
04b9968b | 2538 | called. If the thread gets cancelled these resources stay allocated |
dfd2257a | 2539 | until the program ends. To avoid this calls to @code{fcntl} should be |
04b9968b | 2540 | protected using cancellation handlers. |
dfd2257a | 2541 | @c ref pthread_cleanup_push / pthread_cleanup_pop |
28f540f4 RM |
2542 | @end deftypefun |
2543 | ||
2544 | ||
2545 | @node Duplicating Descriptors | |
2546 | @section Duplicating Descriptors | |
2547 | ||
2548 | @cindex duplicating file descriptors | |
2549 | @cindex redirecting input and output | |
2550 | ||
2551 | You can @dfn{duplicate} a file descriptor, or allocate another file | |
2552 | descriptor that refers to the same open file as the original. Duplicate | |
2553 | descriptors share one file position and one set of file status flags | |
2554 | (@pxref{File Status Flags}), but each has its own set of file descriptor | |
2555 | flags (@pxref{Descriptor Flags}). | |
2556 | ||
2557 | The major use of duplicating a file descriptor is to implement | |
2558 | @dfn{redirection} of input or output: that is, to change the | |
2559 | file or pipe that a particular file descriptor corresponds to. | |
2560 | ||
2561 | You can perform this operation using the @code{fcntl} function with the | |
2562 | @code{F_DUPFD} command, but there are also convenient functions | |
2563 | @code{dup} and @code{dup2} for duplicating descriptors. | |
2564 | ||
2565 | @pindex unistd.h | |
2566 | @pindex fcntl.h | |
2567 | The @code{fcntl} function and flags are declared in @file{fcntl.h}, | |
2568 | while prototypes for @code{dup} and @code{dup2} are in the header file | |
2569 | @file{unistd.h}. | |
2570 | ||
2571 | @comment unistd.h | |
2572 | @comment POSIX.1 | |
2573 | @deftypefun int dup (int @var{old}) | |
2574 | This function copies descriptor @var{old} to the first available | |
2575 | descriptor number (the first number not currently open). It is | |
2576 | equivalent to @code{fcntl (@var{old}, F_DUPFD, 0)}. | |
2577 | @end deftypefun | |
2578 | ||
2579 | @comment unistd.h | |
2580 | @comment POSIX.1 | |
2581 | @deftypefun int dup2 (int @var{old}, int @var{new}) | |
2582 | This function copies the descriptor @var{old} to descriptor number | |
2583 | @var{new}. | |
2584 | ||
2585 | If @var{old} is an invalid descriptor, then @code{dup2} does nothing; it | |
2586 | does not close @var{new}. Otherwise, the new duplicate of @var{old} | |
2587 | replaces any previous meaning of descriptor @var{new}, as if @var{new} | |
2588 | were closed first. | |
2589 | ||
2590 | If @var{old} and @var{new} are different numbers, and @var{old} is a | |
2591 | valid descriptor number, then @code{dup2} is equivalent to: | |
2592 | ||
2593 | @smallexample | |
2594 | close (@var{new}); | |
2595 | fcntl (@var{old}, F_DUPFD, @var{new}) | |
2596 | @end smallexample | |
2597 | ||
2598 | However, @code{dup2} does this atomically; there is no instant in the | |
2599 | middle of calling @code{dup2} at which @var{new} is closed and not yet a | |
2600 | duplicate of @var{old}. | |
2601 | @end deftypefun | |
2602 | ||
2603 | @comment fcntl.h | |
2604 | @comment POSIX.1 | |
2605 | @deftypevr Macro int F_DUPFD | |
2606 | This macro is used as the @var{command} argument to @code{fcntl}, to | |
2607 | copy the file descriptor given as the first argument. | |
2608 | ||
2609 | The form of the call in this case is: | |
2610 | ||
2611 | @smallexample | |
2612 | fcntl (@var{old}, F_DUPFD, @var{next-filedes}) | |
2613 | @end smallexample | |
2614 | ||
2615 | The @var{next-filedes} argument is of type @code{int} and specifies that | |
2616 | the file descriptor returned should be the next available one greater | |
2617 | than or equal to this value. | |
2618 | ||
2619 | The return value from @code{fcntl} with this command is normally the value | |
07435eb4 | 2620 | of the new file descriptor. A return value of @math{-1} indicates an |
28f540f4 RM |
2621 | error. The following @code{errno} error conditions are defined for |
2622 | this command: | |
2623 | ||
2624 | @table @code | |
2625 | @item EBADF | |
2626 | The @var{old} argument is invalid. | |
2627 | ||
2628 | @item EINVAL | |
2629 | The @var{next-filedes} argument is invalid. | |
2630 | ||
2631 | @item EMFILE | |
2632 | There are no more file descriptors available---your program is already | |
2633 | using the maximum. In BSD and GNU, the maximum is controlled by a | |
2634 | resource limit that can be changed; @pxref{Limits on Resources}, for | |
2635 | more information about the @code{RLIMIT_NOFILE} limit. | |
2636 | @end table | |
2637 | ||
2638 | @code{ENFILE} is not a possible error code for @code{dup2} because | |
2639 | @code{dup2} does not create a new opening of a file; duplicate | |
2640 | descriptors do not count toward the limit which @code{ENFILE} | |
2641 | indicates. @code{EMFILE} is possible because it refers to the limit on | |
2642 | distinct descriptor numbers in use in one process. | |
2643 | @end deftypevr | |
2644 | ||
2645 | Here is an example showing how to use @code{dup2} to do redirection. | |
2646 | Typically, redirection of the standard streams (like @code{stdin}) is | |
2647 | done by a shell or shell-like program before calling one of the | |
2648 | @code{exec} functions (@pxref{Executing a File}) to execute a new | |
2649 | program in a child process. When the new program is executed, it | |
2650 | creates and initializes the standard streams to point to the | |
2651 | corresponding file descriptors, before its @code{main} function is | |
2652 | invoked. | |
2653 | ||
2654 | So, to redirect standard input to a file, the shell could do something | |
2655 | like: | |
2656 | ||
2657 | @smallexample | |
2658 | pid = fork (); | |
2659 | if (pid == 0) | |
2660 | @{ | |
2661 | char *filename; | |
2662 | char *program; | |
2663 | int file; | |
2664 | @dots{} | |
2665 | file = TEMP_FAILURE_RETRY (open (filename, O_RDONLY)); | |
2666 | dup2 (file, STDIN_FILENO); | |
2667 | TEMP_FAILURE_RETRY (close (file)); | |
2668 | execv (program, NULL); | |
2669 | @} | |
2670 | @end smallexample | |
2671 | ||
2672 | There is also a more detailed example showing how to implement redirection | |
2673 | in the context of a pipeline of processes in @ref{Launching Jobs}. | |
2674 | ||
2675 | ||
2676 | @node Descriptor Flags | |
2677 | @section File Descriptor Flags | |
2678 | @cindex file descriptor flags | |
2679 | ||
2680 | @dfn{File descriptor flags} are miscellaneous attributes of a file | |
2681 | descriptor. These flags are associated with particular file | |
2682 | descriptors, so that if you have created duplicate file descriptors | |
2683 | from a single opening of a file, each descriptor has its own set of flags. | |
2684 | ||
2685 | Currently there is just one file descriptor flag: @code{FD_CLOEXEC}, | |
2686 | which causes the descriptor to be closed if you use any of the | |
2687 | @code{exec@dots{}} functions (@pxref{Executing a File}). | |
2688 | ||
2689 | The symbols in this section are defined in the header file | |
2690 | @file{fcntl.h}. | |
2691 | @pindex fcntl.h | |
2692 | ||
2693 | @comment fcntl.h | |
2694 | @comment POSIX.1 | |
2695 | @deftypevr Macro int F_GETFD | |
2696 | This macro is used as the @var{command} argument to @code{fcntl}, to | |
2697 | specify that it should return the file descriptor flags associated | |
2c6fe0bd | 2698 | with the @var{filedes} argument. |
28f540f4 RM |
2699 | |
2700 | The normal return value from @code{fcntl} with this command is a | |
2701 | nonnegative number which can be interpreted as the bitwise OR of the | |
2702 | individual flags (except that currently there is only one flag to use). | |
2703 | ||
07435eb4 | 2704 | In case of an error, @code{fcntl} returns @math{-1}. The following |
28f540f4 RM |
2705 | @code{errno} error conditions are defined for this command: |
2706 | ||
2707 | @table @code | |
2708 | @item EBADF | |
2709 | The @var{filedes} argument is invalid. | |
2710 | @end table | |
2711 | @end deftypevr | |
2712 | ||
2713 | ||
2714 | @comment fcntl.h | |
2715 | @comment POSIX.1 | |
2716 | @deftypevr Macro int F_SETFD | |
2717 | This macro is used as the @var{command} argument to @code{fcntl}, to | |
2718 | specify that it should set the file descriptor flags associated with the | |
2719 | @var{filedes} argument. This requires a third @code{int} argument to | |
2720 | specify the new flags, so the form of the call is: | |
2721 | ||
2722 | @smallexample | |
2723 | fcntl (@var{filedes}, F_SETFD, @var{new-flags}) | |
2724 | @end smallexample | |
2725 | ||
2726 | The normal return value from @code{fcntl} with this command is an | |
07435eb4 | 2727 | unspecified value other than @math{-1}, which indicates an error. |
28f540f4 RM |
2728 | The flags and error conditions are the same as for the @code{F_GETFD} |
2729 | command. | |
2730 | @end deftypevr | |
2731 | ||
2732 | The following macro is defined for use as a file descriptor flag with | |
2733 | the @code{fcntl} function. The value is an integer constant usable | |
2734 | as a bit mask value. | |
2735 | ||
2736 | @comment fcntl.h | |
2737 | @comment POSIX.1 | |
2738 | @deftypevr Macro int FD_CLOEXEC | |
2739 | @cindex close-on-exec (file descriptor flag) | |
2740 | This flag specifies that the file descriptor should be closed when | |
2741 | an @code{exec} function is invoked; see @ref{Executing a File}. When | |
2742 | a file descriptor is allocated (as with @code{open} or @code{dup}), | |
2743 | this bit is initially cleared on the new file descriptor, meaning that | |
2744 | descriptor will survive into the new program after @code{exec}. | |
2745 | @end deftypevr | |
2746 | ||
2747 | If you want to modify the file descriptor flags, you should get the | |
2748 | current flags with @code{F_GETFD} and modify the value. Don't assume | |
2749 | that the flags listed here are the only ones that are implemented; your | |
2750 | program may be run years from now and more flags may exist then. For | |
2751 | example, here is a function to set or clear the flag @code{FD_CLOEXEC} | |
2752 | without altering any other flags: | |
2753 | ||
2754 | @smallexample | |
2755 | /* @r{Set the @code{FD_CLOEXEC} flag of @var{desc} if @var{value} is nonzero,} | |
2756 | @r{or clear the flag if @var{value} is 0.} | |
2c6fe0bd | 2757 | @r{Return 0 on success, or -1 on error with @code{errno} set.} */ |
28f540f4 RM |
2758 | |
2759 | int | |
2760 | set_cloexec_flag (int desc, int value) | |
2761 | @{ | |
2762 | int oldflags = fcntl (desc, F_GETFD, 0); | |
2763 | /* @r{If reading the flags failed, return error indication now.} | |
2764 | if (oldflags < 0) | |
2765 | return oldflags; | |
2766 | /* @r{Set just the flag we want to set.} */ | |
2767 | if (value != 0) | |
2768 | oldflags |= FD_CLOEXEC; | |
2769 | else | |
2770 | oldflags &= ~FD_CLOEXEC; | |
2771 | /* @r{Store modified flag word in the descriptor.} */ | |
2772 | return fcntl (desc, F_SETFD, oldflags); | |
2773 | @} | |
2774 | @end smallexample | |
2775 | ||
2776 | @node File Status Flags | |
2777 | @section File Status Flags | |
2778 | @cindex file status flags | |
2779 | ||
2780 | @dfn{File status flags} are used to specify attributes of the opening of a | |
2781 | file. Unlike the file descriptor flags discussed in @ref{Descriptor | |
2782 | Flags}, the file status flags are shared by duplicated file descriptors | |
2783 | resulting from a single opening of the file. The file status flags are | |
2784 | specified with the @var{flags} argument to @code{open}; | |
2785 | @pxref{Opening and Closing Files}. | |
2786 | ||
2787 | File status flags fall into three categories, which are described in the | |
2788 | following sections. | |
2789 | ||
2790 | @itemize @bullet | |
2791 | @item | |
2792 | @ref{Access Modes}, specify what type of access is allowed to the | |
2793 | file: reading, writing, or both. They are set by @code{open} and are | |
2794 | returned by @code{fcntl}, but cannot be changed. | |
2795 | ||
2796 | @item | |
2797 | @ref{Open-time Flags}, control details of what @code{open} will do. | |
2798 | These flags are not preserved after the @code{open} call. | |
2799 | ||
2800 | @item | |
2801 | @ref{Operating Modes}, affect how operations such as @code{read} and | |
2802 | @code{write} are done. They are set by @code{open}, and can be fetched or | |
2803 | changed with @code{fcntl}. | |
2804 | @end itemize | |
2805 | ||
2806 | The symbols in this section are defined in the header file | |
2807 | @file{fcntl.h}. | |
2808 | @pindex fcntl.h | |
2809 | ||
2810 | @menu | |
2811 | * Access Modes:: Whether the descriptor can read or write. | |
2812 | * Open-time Flags:: Details of @code{open}. | |
2813 | * Operating Modes:: Special modes to control I/O operations. | |
2814 | * Getting File Status Flags:: Fetching and changing these flags. | |
2815 | @end menu | |
2816 | ||
2817 | @node Access Modes | |
2818 | @subsection File Access Modes | |
2819 | ||
2820 | The file access modes allow a file descriptor to be used for reading, | |
2821 | writing, or both. (In the GNU system, they can also allow none of these, | |
2822 | and allow execution of the file as a program.) The access modes are chosen | |
2823 | when the file is opened, and never change. | |
2824 | ||
2825 | @comment fcntl.h | |
2826 | @comment POSIX.1 | |
2827 | @deftypevr Macro int O_RDONLY | |
2828 | Open the file for read access. | |
2829 | @end deftypevr | |
2830 | ||
2831 | @comment fcntl.h | |
2832 | @comment POSIX.1 | |
2833 | @deftypevr Macro int O_WRONLY | |
2834 | Open the file for write access. | |
2835 | @end deftypevr | |
2836 | ||
2837 | @comment fcntl.h | |
2838 | @comment POSIX.1 | |
2839 | @deftypevr Macro int O_RDWR | |
2840 | Open the file for both reading and writing. | |
2841 | @end deftypevr | |
2842 | ||
2843 | In the GNU system (and not in other systems), @code{O_RDONLY} and | |
2844 | @code{O_WRONLY} are independent bits that can be bitwise-ORed together, | |
2845 | and it is valid for either bit to be set or clear. This means that | |
2846 | @code{O_RDWR} is the same as @code{O_RDONLY|O_WRONLY}. A file access | |
2847 | mode of zero is permissible; it allows no operations that do input or | |
2848 | output to the file, but does allow other operations such as | |
2849 | @code{fchmod}. On the GNU system, since ``read-only'' or ``write-only'' | |
2850 | is a misnomer, @file{fcntl.h} defines additional names for the file | |
2851 | access modes. These names are preferred when writing GNU-specific code. | |
2852 | But most programs will want to be portable to other POSIX.1 systems and | |
2853 | should use the POSIX.1 names above instead. | |
2854 | ||
2855 | @comment fcntl.h | |
2856 | @comment GNU | |
2857 | @deftypevr Macro int O_READ | |
2858 | Open the file for reading. Same as @code{O_RDWR}; only defined on GNU. | |
2859 | @end deftypevr | |
2860 | ||
2861 | @comment fcntl.h | |
2862 | @comment GNU | |
2863 | @deftypevr Macro int O_WRITE | |
2864 | Open the file for reading. Same as @code{O_WRONLY}; only defined on GNU. | |
2865 | @end deftypevr | |
2866 | ||
2867 | @comment fcntl.h | |
2868 | @comment GNU | |
2869 | @deftypevr Macro int O_EXEC | |
2870 | Open the file for executing. Only defined on GNU. | |
2871 | @end deftypevr | |
2872 | ||
2873 | To determine the file access mode with @code{fcntl}, you must extract | |
2874 | the access mode bits from the retrieved file status flags. In the GNU | |
2875 | system, you can just test the @code{O_READ} and @code{O_WRITE} bits in | |
2876 | the flags word. But in other POSIX.1 systems, reading and writing | |
2877 | access modes are not stored as distinct bit flags. The portable way to | |
2878 | extract the file access mode bits is with @code{O_ACCMODE}. | |
2879 | ||
2880 | @comment fcntl.h | |
2881 | @comment POSIX.1 | |
2882 | @deftypevr Macro int O_ACCMODE | |
2883 | This macro stands for a mask that can be bitwise-ANDed with the file | |
2884 | status flag value to produce a value representing the file access mode. | |
2885 | The mode will be @code{O_RDONLY}, @code{O_WRONLY}, or @code{O_RDWR}. | |
2886 | (In the GNU system it could also be zero, and it never includes the | |
2887 | @code{O_EXEC} bit.) | |
2888 | @end deftypevr | |
2889 | ||
2890 | @node Open-time Flags | |
2891 | @subsection Open-time Flags | |
2892 | ||
2893 | The open-time flags specify options affecting how @code{open} will behave. | |
2894 | These options are not preserved once the file is open. The exception to | |
2895 | this is @code{O_NONBLOCK}, which is also an I/O operating mode and so it | |
2896 | @emph{is} saved. @xref{Opening and Closing Files}, for how to call | |
2897 | @code{open}. | |
2898 | ||
2899 | There are two sorts of options specified by open-time flags. | |
2900 | ||
2901 | @itemize @bullet | |
2902 | @item | |
2903 | @dfn{File name translation flags} affect how @code{open} looks up the | |
2904 | file name to locate the file, and whether the file can be created. | |
2905 | @cindex file name translation flags | |
2906 | @cindex flags, file name translation | |
2907 | ||
2908 | @item | |
2909 | @dfn{Open-time action flags} specify extra operations that @code{open} will | |
2910 | perform on the file once it is open. | |
2911 | @cindex open-time action flags | |
2912 | @cindex flags, open-time action | |
2913 | @end itemize | |
2914 | ||
2915 | Here are the file name translation flags. | |
2916 | ||
2917 | @comment fcntl.h | |
2918 | @comment POSIX.1 | |
2919 | @deftypevr Macro int O_CREAT | |
2920 | If set, the file will be created if it doesn't already exist. | |
2921 | @c !!! mode arg, umask | |
2922 | @cindex create on open (file status flag) | |
2923 | @end deftypevr | |
2924 | ||
2925 | @comment fcntl.h | |
2926 | @comment POSIX.1 | |
2927 | @deftypevr Macro int O_EXCL | |
2928 | If both @code{O_CREAT} and @code{O_EXCL} are set, then @code{open} fails | |
2929 | if the specified file already exists. This is guaranteed to never | |
2930 | clobber an existing file. | |
2931 | @end deftypevr | |
2932 | ||
2933 | @comment fcntl.h | |
2934 | @comment POSIX.1 | |
2935 | @deftypevr Macro int O_NONBLOCK | |
2936 | @cindex non-blocking open | |
2937 | This prevents @code{open} from blocking for a ``long time'' to open the | |
2938 | file. This is only meaningful for some kinds of files, usually devices | |
2939 | such as serial ports; when it is not meaningful, it is harmless and | |
2940 | ignored. Often opening a port to a modem blocks until the modem reports | |
2941 | carrier detection; if @code{O_NONBLOCK} is specified, @code{open} will | |
2942 | return immediately without a carrier. | |
2943 | ||
2944 | Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O operating | |
2945 | mode and a file name translation flag. This means that specifying | |
2946 | @code{O_NONBLOCK} in @code{open} also sets nonblocking I/O mode; | |
2947 | @pxref{Operating Modes}. To open the file without blocking but do normal | |
2948 | I/O that blocks, you must call @code{open} with @code{O_NONBLOCK} set and | |
2949 | then call @code{fcntl} to turn the bit off. | |
2950 | @end deftypevr | |
2951 | ||
2952 | @comment fcntl.h | |
2953 | @comment POSIX.1 | |
2954 | @deftypevr Macro int O_NOCTTY | |
2955 | If the named file is a terminal device, don't make it the controlling | |
2956 | terminal for the process. @xref{Job Control}, for information about | |
2957 | what it means to be the controlling terminal. | |
2958 | ||
2959 | In the GNU system and 4.4 BSD, opening a file never makes it the | |
2960 | controlling terminal and @code{O_NOCTTY} is zero. However, other | |
2961 | systems may use a nonzero value for @code{O_NOCTTY} and set the | |
2962 | controlling terminal when you open a file that is a terminal device; so | |
2963 | to be portable, use @code{O_NOCTTY} when it is important to avoid this. | |
2964 | @cindex controlling terminal, setting | |
2965 | @end deftypevr | |
2966 | ||
2967 | The following three file name translation flags exist only in the GNU system. | |
2968 | ||
2969 | @comment fcntl.h | |
2970 | @comment GNU | |
2971 | @deftypevr Macro int O_IGNORE_CTTY | |
2972 | Do not recognize the named file as the controlling terminal, even if it | |
2973 | refers to the process's existing controlling terminal device. Operations | |
2974 | on the new file descriptor will never induce job control signals. | |
2975 | @xref{Job Control}. | |
2976 | @end deftypevr | |
2977 | ||
2978 | @comment fcntl.h | |
2979 | @comment GNU | |
2980 | @deftypevr Macro int O_NOLINK | |
2981 | If the named file is a symbolic link, open the link itself instead of | |
2982 | the file it refers to. (@code{fstat} on the new file descriptor will | |
2983 | return the information returned by @code{lstat} on the link's name.) | |
2984 | @cindex symbolic link, opening | |
2985 | @end deftypevr | |
2986 | ||
2987 | @comment fcntl.h | |
2988 | @comment GNU | |
2989 | @deftypevr Macro int O_NOTRANS | |
2990 | If the named file is specially translated, do not invoke the translator. | |
2991 | Open the bare file the translator itself sees. | |
2992 | @end deftypevr | |
2993 | ||
2994 | ||
2995 | The open-time action flags tell @code{open} to do additional operations | |
2996 | which are not really related to opening the file. The reason to do them | |
2997 | as part of @code{open} instead of in separate calls is that @code{open} | |
2998 | can do them @i{atomically}. | |
2999 | ||
3000 | @comment fcntl.h | |
3001 | @comment POSIX.1 | |
3002 | @deftypevr Macro int O_TRUNC | |
3003 | Truncate the file to zero length. This option is only useful for | |
3004 | regular files, not special files such as directories or FIFOs. POSIX.1 | |
3005 | requires that you open the file for writing to use @code{O_TRUNC}. In | |
3006 | BSD and GNU you must have permission to write the file to truncate it, | |
3007 | but you need not open for write access. | |
3008 | ||
3009 | This is the only open-time action flag specified by POSIX.1. There is | |
3010 | no good reason for truncation to be done by @code{open}, instead of by | |
3011 | calling @code{ftruncate} afterwards. The @code{O_TRUNC} flag existed in | |
3012 | Unix before @code{ftruncate} was invented, and is retained for backward | |
3013 | compatibility. | |
3014 | @end deftypevr | |
3015 | ||
27e309c1 UD |
3016 | The remaining operating modes are BSD extensions. They exist only |
3017 | on some systems. On other systems, these macros are not defined. | |
3018 | ||
28f540f4 RM |
3019 | @comment fcntl.h |
3020 | @comment BSD | |
3021 | @deftypevr Macro int O_SHLOCK | |
3022 | Acquire a shared lock on the file, as with @code{flock}. | |
3023 | @xref{File Locks}. | |
3024 | ||
3025 | If @code{O_CREAT} is specified, the locking is done atomically when | |
3026 | creating the file. You are guaranteed that no other process will get | |
3027 | the lock on the new file first. | |
3028 | @end deftypevr | |
3029 | ||
3030 | @comment fcntl.h | |
3031 | @comment BSD | |
3032 | @deftypevr Macro int O_EXLOCK | |
3033 | Acquire an exclusive lock on the file, as with @code{flock}. | |
3034 | @xref{File Locks}. This is atomic like @code{O_SHLOCK}. | |
3035 | @end deftypevr | |
3036 | ||
3037 | @node Operating Modes | |
3038 | @subsection I/O Operating Modes | |
3039 | ||
3040 | The operating modes affect how input and output operations using a file | |
3041 | descriptor work. These flags are set by @code{open} and can be fetched | |
3042 | and changed with @code{fcntl}. | |
3043 | ||
3044 | @comment fcntl.h | |
3045 | @comment POSIX.1 | |
3046 | @deftypevr Macro int O_APPEND | |
3047 | The bit that enables append mode for the file. If set, then all | |
3048 | @code{write} operations write the data at the end of the file, extending | |
3049 | it, regardless of the current file position. This is the only reliable | |
3050 | way to append to a file. In append mode, you are guaranteed that the | |
3051 | data you write will always go to the current end of the file, regardless | |
3052 | of other processes writing to the file. Conversely, if you simply set | |
3053 | the file position to the end of file and write, then another process can | |
3054 | extend the file after you set the file position but before you write, | |
3055 | resulting in your data appearing someplace before the real end of file. | |
3056 | @end deftypevr | |
3057 | ||
3058 | @comment fcntl.h | |
3059 | @comment POSIX.1 | |
2c6fe0bd | 3060 | @deftypevr Macro int O_NONBLOCK |
28f540f4 RM |
3061 | The bit that enables nonblocking mode for the file. If this bit is set, |
3062 | @code{read} requests on the file can return immediately with a failure | |
3063 | status if there is no input immediately available, instead of blocking. | |
3064 | Likewise, @code{write} requests can also return immediately with a | |
3065 | failure status if the output can't be written immediately. | |
3066 | ||
3067 | Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O | |
3068 | operating mode and a file name translation flag; @pxref{Open-time Flags}. | |
3069 | @end deftypevr | |
3070 | ||
3071 | @comment fcntl.h | |
3072 | @comment BSD | |
3073 | @deftypevr Macro int O_NDELAY | |
3074 | This is an obsolete name for @code{O_NONBLOCK}, provided for | |
3075 | compatibility with BSD. It is not defined by the POSIX.1 standard. | |
3076 | @end deftypevr | |
3077 | ||
3078 | The remaining operating modes are BSD and GNU extensions. They exist only | |
3079 | on some systems. On other systems, these macros are not defined. | |
3080 | ||
3081 | @comment fcntl.h | |
3082 | @comment BSD | |
3083 | @deftypevr Macro int O_ASYNC | |
3084 | The bit that enables asynchronous input mode. If set, then @code{SIGIO} | |
3085 | signals will be generated when input is available. @xref{Interrupt Input}. | |
3086 | ||
3087 | Asynchronous input mode is a BSD feature. | |
3088 | @end deftypevr | |
3089 | ||
3090 | @comment fcntl.h | |
3091 | @comment BSD | |
3092 | @deftypevr Macro int O_FSYNC | |
3093 | The bit that enables synchronous writing for the file. If set, each | |
3094 | @code{write} call will make sure the data is reliably stored on disk before | |
3095 | returning. @c !!! xref fsync | |
3096 | ||
3097 | Synchronous writing is a BSD feature. | |
3098 | @end deftypevr | |
3099 | ||
3100 | @comment fcntl.h | |
3101 | @comment BSD | |
3102 | @deftypevr Macro int O_SYNC | |
3103 | This is another name for @code{O_FSYNC}. They have the same value. | |
3104 | @end deftypevr | |
3105 | ||
3106 | @comment fcntl.h | |
3107 | @comment GNU | |
3108 | @deftypevr Macro int O_NOATIME | |
3109 | If this bit is set, @code{read} will not update the access time of the | |
3110 | file. @xref{File Times}. This is used by programs that do backups, so | |
3111 | that backing a file up does not count as reading it. | |
3112 | Only the owner of the file or the superuser may use this bit. | |
3113 | ||
3114 | This is a GNU extension. | |
3115 | @end deftypevr | |
3116 | ||
3117 | @node Getting File Status Flags | |
3118 | @subsection Getting and Setting File Status Flags | |
3119 | ||
3120 | The @code{fcntl} function can fetch or change file status flags. | |
3121 | ||
3122 | @comment fcntl.h | |
3123 | @comment POSIX.1 | |
3124 | @deftypevr Macro int F_GETFL | |
3125 | This macro is used as the @var{command} argument to @code{fcntl}, to | |
3126 | read the file status flags for the open file with descriptor | |
3127 | @var{filedes}. | |
3128 | ||
3129 | The normal return value from @code{fcntl} with this command is a | |
3130 | nonnegative number which can be interpreted as the bitwise OR of the | |
3131 | individual flags. Since the file access modes are not single-bit values, | |
3132 | you can mask off other bits in the returned flags with @code{O_ACCMODE} | |
3133 | to compare them. | |
3134 | ||
07435eb4 | 3135 | In case of an error, @code{fcntl} returns @math{-1}. The following |
28f540f4 RM |
3136 | @code{errno} error conditions are defined for this command: |
3137 | ||
3138 | @table @code | |
3139 | @item EBADF | |
3140 | The @var{filedes} argument is invalid. | |
3141 | @end table | |
3142 | @end deftypevr | |
3143 | ||
3144 | @comment fcntl.h | |
3145 | @comment POSIX.1 | |
3146 | @deftypevr Macro int F_SETFL | |
3147 | This macro is used as the @var{command} argument to @code{fcntl}, to set | |
3148 | the file status flags for the open file corresponding to the | |
3149 | @var{filedes} argument. This command requires a third @code{int} | |
3150 | argument to specify the new flags, so the call looks like this: | |
3151 | ||
3152 | @smallexample | |
3153 | fcntl (@var{filedes}, F_SETFL, @var{new-flags}) | |
3154 | @end smallexample | |
3155 | ||
3156 | You can't change the access mode for the file in this way; that is, | |
3157 | whether the file descriptor was opened for reading or writing. | |
3158 | ||
3159 | The normal return value from @code{fcntl} with this command is an | |
07435eb4 | 3160 | unspecified value other than @math{-1}, which indicates an error. The |
28f540f4 RM |
3161 | error conditions are the same as for the @code{F_GETFL} command. |
3162 | @end deftypevr | |
3163 | ||
3164 | If you want to modify the file status flags, you should get the current | |
3165 | flags with @code{F_GETFL} and modify the value. Don't assume that the | |
3166 | flags listed here are the only ones that are implemented; your program | |
3167 | may be run years from now and more flags may exist then. For example, | |
3168 | here is a function to set or clear the flag @code{O_NONBLOCK} without | |
3169 | altering any other flags: | |
3170 | ||
3171 | @smallexample | |
3172 | @group | |
3173 | /* @r{Set the @code{O_NONBLOCK} flag of @var{desc} if @var{value} is nonzero,} | |
3174 | @r{or clear the flag if @var{value} is 0.} | |
2c6fe0bd | 3175 | @r{Return 0 on success, or -1 on error with @code{errno} set.} */ |
28f540f4 RM |
3176 | |
3177 | int | |
3178 | set_nonblock_flag (int desc, int value) | |
3179 | @{ | |
3180 | int oldflags = fcntl (desc, F_GETFL, 0); | |
3181 | /* @r{If reading the flags failed, return error indication now.} */ | |
3182 | if (oldflags == -1) | |
3183 | return -1; | |
3184 | /* @r{Set just the flag we want to set.} */ | |
3185 | if (value != 0) | |
3186 | oldflags |= O_NONBLOCK; | |
3187 | else | |
3188 | oldflags &= ~O_NONBLOCK; | |
3189 | /* @r{Store modified flag word in the descriptor.} */ | |
3190 | return fcntl (desc, F_SETFL, oldflags); | |
3191 | @} | |
3192 | @end group | |
3193 | @end smallexample | |
3194 | ||
3195 | @node File Locks | |
3196 | @section File Locks | |
3197 | ||
3198 | @cindex file locks | |
3199 | @cindex record locking | |
3200 | The remaining @code{fcntl} commands are used to support @dfn{record | |
3201 | locking}, which permits multiple cooperating programs to prevent each | |
3202 | other from simultaneously accessing parts of a file in error-prone | |
3203 | ways. | |
3204 | ||
3205 | @cindex exclusive lock | |
3206 | @cindex write lock | |
3207 | An @dfn{exclusive} or @dfn{write} lock gives a process exclusive access | |
3208 | for writing to the specified part of the file. While a write lock is in | |
3209 | place, no other process can lock that part of the file. | |
3210 | ||
3211 | @cindex shared lock | |
3212 | @cindex read lock | |
3213 | A @dfn{shared} or @dfn{read} lock prohibits any other process from | |
3214 | requesting a write lock on the specified part of the file. However, | |
3215 | other processes can request read locks. | |
3216 | ||
3217 | The @code{read} and @code{write} functions do not actually check to see | |
3218 | whether there are any locks in place. If you want to implement a | |
3219 | locking protocol for a file shared by multiple processes, your application | |
3220 | must do explicit @code{fcntl} calls to request and clear locks at the | |
3221 | appropriate points. | |
3222 | ||
3223 | Locks are associated with processes. A process can only have one kind | |
3224 | of lock set for each byte of a given file. When any file descriptor for | |
3225 | that file is closed by the process, all of the locks that process holds | |
3226 | on that file are released, even if the locks were made using other | |
3227 | descriptors that remain open. Likewise, locks are released when a | |
3228 | process exits, and are not inherited by child processes created using | |
3229 | @code{fork} (@pxref{Creating a Process}). | |
3230 | ||
3231 | When making a lock, use a @code{struct flock} to specify what kind of | |
3232 | lock and where. This data type and the associated macros for the | |
3233 | @code{fcntl} function are declared in the header file @file{fcntl.h}. | |
3234 | @pindex fcntl.h | |
3235 | ||
3236 | @comment fcntl.h | |
3237 | @comment POSIX.1 | |
3238 | @deftp {Data Type} {struct flock} | |
3239 | This structure is used with the @code{fcntl} function to describe a file | |
3240 | lock. It has these members: | |
3241 | ||
3242 | @table @code | |
3243 | @item short int l_type | |
3244 | Specifies the type of the lock; one of @code{F_RDLCK}, @code{F_WRLCK}, or | |
3245 | @code{F_UNLCK}. | |
3246 | ||
3247 | @item short int l_whence | |
3248 | This corresponds to the @var{whence} argument to @code{fseek} or | |
3249 | @code{lseek}, and specifies what the offset is relative to. Its value | |
3250 | can be one of @code{SEEK_SET}, @code{SEEK_CUR}, or @code{SEEK_END}. | |
3251 | ||
3252 | @item off_t l_start | |
3253 | This specifies the offset of the start of the region to which the lock | |
3254 | applies, and is given in bytes relative to the point specified by | |
3255 | @code{l_whence} member. | |
3256 | ||
3257 | @item off_t l_len | |
3258 | This specifies the length of the region to be locked. A value of | |
3259 | @code{0} is treated specially; it means the region extends to the end of | |
3260 | the file. | |
3261 | ||
3262 | @item pid_t l_pid | |
3263 | This field is the process ID (@pxref{Process Creation Concepts}) of the | |
3264 | process holding the lock. It is filled in by calling @code{fcntl} with | |
3265 | the @code{F_GETLK} command, but is ignored when making a lock. | |
3266 | @end table | |
3267 | @end deftp | |
3268 | ||
3269 | @comment fcntl.h | |
3270 | @comment POSIX.1 | |
3271 | @deftypevr Macro int F_GETLK | |
3272 | This macro is used as the @var{command} argument to @code{fcntl}, to | |
3273 | specify that it should get information about a lock. This command | |
3274 | requires a third argument of type @w{@code{struct flock *}} to be passed | |
3275 | to @code{fcntl}, so that the form of the call is: | |
3276 | ||
3277 | @smallexample | |
3278 | fcntl (@var{filedes}, F_GETLK, @var{lockp}) | |
3279 | @end smallexample | |
3280 | ||
3281 | If there is a lock already in place that would block the lock described | |
3282 | by the @var{lockp} argument, information about that lock overwrites | |
3283 | @code{*@var{lockp}}. Existing locks are not reported if they are | |
3284 | compatible with making a new lock as specified. Thus, you should | |
3285 | specify a lock type of @code{F_WRLCK} if you want to find out about both | |
3286 | read and write locks, or @code{F_RDLCK} if you want to find out about | |
3287 | write locks only. | |
3288 | ||
3289 | There might be more than one lock affecting the region specified by the | |
3290 | @var{lockp} argument, but @code{fcntl} only returns information about | |
3291 | one of them. The @code{l_whence} member of the @var{lockp} structure is | |
3292 | set to @code{SEEK_SET} and the @code{l_start} and @code{l_len} fields | |
3293 | set to identify the locked region. | |
3294 | ||
3295 | If no lock applies, the only change to the @var{lockp} structure is to | |
3296 | update the @code{l_type} to a value of @code{F_UNLCK}. | |
3297 | ||
3298 | The normal return value from @code{fcntl} with this command is an | |
07435eb4 | 3299 | unspecified value other than @math{-1}, which is reserved to indicate an |
28f540f4 RM |
3300 | error. The following @code{errno} error conditions are defined for |
3301 | this command: | |
3302 | ||
3303 | @table @code | |
3304 | @item EBADF | |
3305 | The @var{filedes} argument is invalid. | |
3306 | ||
3307 | @item EINVAL | |
3308 | Either the @var{lockp} argument doesn't specify valid lock information, | |
3309 | or the file associated with @var{filedes} doesn't support locks. | |
3310 | @end table | |
3311 | @end deftypevr | |
3312 | ||
3313 | @comment fcntl.h | |
3314 | @comment POSIX.1 | |
3315 | @deftypevr Macro int F_SETLK | |
3316 | This macro is used as the @var{command} argument to @code{fcntl}, to | |
3317 | specify that it should set or clear a lock. This command requires a | |
3318 | third argument of type @w{@code{struct flock *}} to be passed to | |
3319 | @code{fcntl}, so that the form of the call is: | |
3320 | ||
3321 | @smallexample | |
3322 | fcntl (@var{filedes}, F_SETLK, @var{lockp}) | |
3323 | @end smallexample | |
3324 | ||
3325 | If the process already has a lock on any part of the region, the old lock | |
3326 | on that part is replaced with the new lock. You can remove a lock | |
3327 | by specifying a lock type of @code{F_UNLCK}. | |
3328 | ||
3329 | If the lock cannot be set, @code{fcntl} returns immediately with a value | |
07435eb4 | 3330 | of @math{-1}. This function does not block waiting for other processes |
28f540f4 | 3331 | to release locks. If @code{fcntl} succeeds, it return a value other |
07435eb4 | 3332 | than @math{-1}. |
28f540f4 RM |
3333 | |
3334 | The following @code{errno} error conditions are defined for this | |
3335 | function: | |
3336 | ||
3337 | @table @code | |
3338 | @item EAGAIN | |
3339 | @itemx EACCES | |
3340 | The lock cannot be set because it is blocked by an existing lock on the | |
3341 | file. Some systems use @code{EAGAIN} in this case, and other systems | |
3342 | use @code{EACCES}; your program should treat them alike, after | |
3343 | @code{F_SETLK}. (The GNU system always uses @code{EAGAIN}.) | |
3344 | ||
3345 | @item EBADF | |
3346 | Either: the @var{filedes} argument is invalid; you requested a read lock | |
3347 | but the @var{filedes} is not open for read access; or, you requested a | |
3348 | write lock but the @var{filedes} is not open for write access. | |
3349 | ||
3350 | @item EINVAL | |
3351 | Either the @var{lockp} argument doesn't specify valid lock information, | |
3352 | or the file associated with @var{filedes} doesn't support locks. | |
3353 | ||
3354 | @item ENOLCK | |
3355 | The system has run out of file lock resources; there are already too | |
3356 | many file locks in place. | |
3357 | ||
3358 | Well-designed file systems never report this error, because they have no | |
3359 | limitation on the number of locks. However, you must still take account | |
3360 | of the possibility of this error, as it could result from network access | |
3361 | to a file system on another machine. | |
3362 | @end table | |
3363 | @end deftypevr | |
3364 | ||
3365 | @comment fcntl.h | |
3366 | @comment POSIX.1 | |
3367 | @deftypevr Macro int F_SETLKW | |
3368 | This macro is used as the @var{command} argument to @code{fcntl}, to | |
3369 | specify that it should set or clear a lock. It is just like the | |
3370 | @code{F_SETLK} command, but causes the process to block (or wait) | |
3371 | until the request can be specified. | |
3372 | ||
3373 | This command requires a third argument of type @code{struct flock *}, as | |
3374 | for the @code{F_SETLK} command. | |
3375 | ||
3376 | The @code{fcntl} return values and errors are the same as for the | |
3377 | @code{F_SETLK} command, but these additional @code{errno} error conditions | |
3378 | are defined for this command: | |
3379 | ||
3380 | @table @code | |
3381 | @item EINTR | |
3382 | The function was interrupted by a signal while it was waiting. | |
3383 | @xref{Interrupted Primitives}. | |
3384 | ||
3385 | @item EDEADLK | |
3386 | The specified region is being locked by another process. But that | |
3387 | process is waiting to lock a region which the current process has | |
3388 | locked, so waiting for the lock would result in deadlock. The system | |
3389 | does not guarantee that it will detect all such conditions, but it lets | |
3390 | you know if it notices one. | |
3391 | @end table | |
3392 | @end deftypevr | |
3393 | ||
3394 | ||
3395 | The following macros are defined for use as values for the @code{l_type} | |
3396 | member of the @code{flock} structure. The values are integer constants. | |
3397 | ||
3398 | @table @code | |
3399 | @comment fcntl.h | |
3400 | @comment POSIX.1 | |
3401 | @vindex F_RDLCK | |
3402 | @item F_RDLCK | |
3403 | This macro is used to specify a read (or shared) lock. | |
3404 | ||
3405 | @comment fcntl.h | |
3406 | @comment POSIX.1 | |
3407 | @vindex F_WRLCK | |
3408 | @item F_WRLCK | |
3409 | This macro is used to specify a write (or exclusive) lock. | |
3410 | ||
3411 | @comment fcntl.h | |
3412 | @comment POSIX.1 | |
3413 | @vindex F_UNLCK | |
3414 | @item F_UNLCK | |
3415 | This macro is used to specify that the region is unlocked. | |
3416 | @end table | |
3417 | ||
3418 | As an example of a situation where file locking is useful, consider a | |
3419 | program that can be run simultaneously by several different users, that | |
3420 | logs status information to a common file. One example of such a program | |
3421 | might be a game that uses a file to keep track of high scores. Another | |
3422 | example might be a program that records usage or accounting information | |
3423 | for billing purposes. | |
3424 | ||
3425 | Having multiple copies of the program simultaneously writing to the | |
3426 | file could cause the contents of the file to become mixed up. But | |
3427 | you can prevent this kind of problem by setting a write lock on the | |
2c6fe0bd | 3428 | file before actually writing to the file. |
28f540f4 RM |
3429 | |
3430 | If the program also needs to read the file and wants to make sure that | |
3431 | the contents of the file are in a consistent state, then it can also use | |
3432 | a read lock. While the read lock is set, no other process can lock | |
3433 | that part of the file for writing. | |
3434 | ||
3435 | @c ??? This section could use an example program. | |
3436 | ||
3437 | Remember that file locks are only a @emph{voluntary} protocol for | |
3438 | controlling access to a file. There is still potential for access to | |
3439 | the file by programs that don't use the lock protocol. | |
3440 | ||
3441 | @node Interrupt Input | |
3442 | @section Interrupt-Driven Input | |
3443 | ||
3444 | @cindex interrupt-driven input | |
3445 | If you set the @code{O_ASYNC} status flag on a file descriptor | |
3446 | (@pxref{File Status Flags}), a @code{SIGIO} signal is sent whenever | |
3447 | input or output becomes possible on that file descriptor. The process | |
3448 | or process group to receive the signal can be selected by using the | |
3449 | @code{F_SETOWN} command to the @code{fcntl} function. If the file | |
3450 | descriptor is a socket, this also selects the recipient of @code{SIGURG} | |
3451 | signals that are delivered when out-of-band data arrives on that socket; | |
3452 | see @ref{Out-of-Band Data}. (@code{SIGURG} is sent in any situation | |
3453 | where @code{select} would report the socket as having an ``exceptional | |
3454 | condition''. @xref{Waiting for I/O}.) | |
3455 | ||
3456 | If the file descriptor corresponds to a terminal device, then @code{SIGIO} | |
2c6fe0bd | 3457 | signals are sent to the foreground process group of the terminal. |
28f540f4 RM |
3458 | @xref{Job Control}. |
3459 | ||
3460 | @pindex fcntl.h | |
3461 | The symbols in this section are defined in the header file | |
3462 | @file{fcntl.h}. | |
3463 | ||
3464 | @comment fcntl.h | |
3465 | @comment BSD | |
3466 | @deftypevr Macro int F_GETOWN | |
3467 | This macro is used as the @var{command} argument to @code{fcntl}, to | |
3468 | specify that it should get information about the process or process | |
3469 | group to which @code{SIGIO} signals are sent. (For a terminal, this is | |
3470 | actually the foreground process group ID, which you can get using | |
3471 | @code{tcgetpgrp}; see @ref{Terminal Access Functions}.) | |
3472 | ||
3473 | The return value is interpreted as a process ID; if negative, its | |
3474 | absolute value is the process group ID. | |
3475 | ||
3476 | The following @code{errno} error condition is defined for this command: | |
3477 | ||
3478 | @table @code | |
3479 | @item EBADF | |
3480 | The @var{filedes} argument is invalid. | |
3481 | @end table | |
3482 | @end deftypevr | |
3483 | ||
3484 | @comment fcntl.h | |
3485 | @comment BSD | |
3486 | @deftypevr Macro int F_SETOWN | |
3487 | This macro is used as the @var{command} argument to @code{fcntl}, to | |
3488 | specify that it should set the process or process group to which | |
3489 | @code{SIGIO} signals are sent. This command requires a third argument | |
3490 | of type @code{pid_t} to be passed to @code{fcntl}, so that the form of | |
3491 | the call is: | |
3492 | ||
3493 | @smallexample | |
3494 | fcntl (@var{filedes}, F_SETOWN, @var{pid}) | |
3495 | @end smallexample | |
3496 | ||
3497 | The @var{pid} argument should be a process ID. You can also pass a | |
3498 | negative number whose absolute value is a process group ID. | |
3499 | ||
07435eb4 | 3500 | The return value from @code{fcntl} with this command is @math{-1} |
28f540f4 RM |
3501 | in case of error and some other value if successful. The following |
3502 | @code{errno} error conditions are defined for this command: | |
3503 | ||
3504 | @table @code | |
3505 | @item EBADF | |
3506 | The @var{filedes} argument is invalid. | |
3507 | ||
3508 | @item ESRCH | |
3509 | There is no process or process group corresponding to @var{pid}. | |
3510 | @end table | |
3511 | @end deftypevr | |
3512 | ||
3513 | @c ??? This section could use an example program. | |
07435eb4 UD |
3514 | |
3515 | @node IOCTLs | |
3516 | @section Generic I/O Control operations | |
3517 | @cindex generic i/o control operations | |
3518 | @cindex IOCTLs | |
3519 | ||
3520 | The GNU system can handle most input/output operations on many different | |
3521 | devices and objects in terms of a few file primitives - @code{read}, | |
3522 | @code{write} and @code{lseek}. However, most devices also have a few | |
3523 | peculiar operations which do not fit into this model. Such as: | |
3524 | ||
3525 | @itemize @bullet | |
3526 | ||
3527 | @item | |
3528 | Changing the character font used on a terminal. | |
3529 | ||
3530 | @item | |
3531 | Telling a magnetic tape system to rewind or fast forward. (Since they | |
3532 | cannot move in byte increments, @code{lseek} is inapplicable). | |
3533 | ||
3534 | @item | |
3535 | Ejecting a disk from a drive. | |
3536 | ||
3537 | @item | |
3538 | Playing an audio track from a CD-ROM drive. | |
3539 | ||
3540 | @item | |
3541 | Maintaining routing tables for a network. | |
3542 | ||
3543 | @end itemize | |
3544 | ||
3545 | Although some such objects such as sockets and terminals | |
3546 | @footnote{Actually, the terminal-specific functions are implemented with | |
3547 | IOCTLs on many platforms.} have special functions of their own, it would | |
3548 | not be practical to create functions for all these cases. | |
3549 | ||
3550 | Instead these minor operations, known as @dfn{IOCTL}s, are assigned code | |
3551 | numbers and multiplexed through the @code{ioctl} function, defined in | |
3552 | @code{sys/ioctl.h}. The code numbers themselves are defined in many | |
3553 | different headers. | |
3554 | ||
3555 | @deftypefun int ioctl (int @var{filedes}, int @var{command}, @dots{}) | |
3556 | ||
3557 | The @code{ioctl} function performs the generic I/O operation | |
3558 | @var{command} on @var{filedes}. | |
3559 | ||
3560 | A third argument is usually present, either a single number or a pointer | |
3561 | to a structure. The meaning of this argument, the returned value, and | |
3562 | any error codes depends upon the command used. Often @math{-1} is | |
3563 | returned for a failure. | |
3564 | ||
3565 | @end deftypefun | |
3566 | ||
3567 | On some systems, IOCTLs used by different devices share the same numbers. | |
3568 | Thus, although use of an inappropriate IOCTL @emph{usually} only produces | |
3569 | an error, you should not attempt to use device-specific IOCTLs on an | |
3570 | unknown device. | |
3571 | ||
3572 | Most IOCTLs are OS-specific and/or only used in special system utilities, | |
3573 | and are thus beyond the scope of this document. For an example of the use | |
8b7fb588 | 3574 | of an IOCTL, see @ref{Out-of-Band Data}. |