]>
Commit | Line | Data |
---|---|---|
28f540f4 RM |
1 | @node Pattern Matching, I/O Overview, Searching and Sorting, Top |
2 | @chapter Pattern Matching | |
3 | ||
4 | The GNU C Library provides pattern matching facilities for two kinds of | |
5 | patterns: regular expressions and file-name wildcards. The library also | |
6 | provides a facility for expanding variable and command references and | |
7 | parsing text into words in the way the shell does. | |
8 | ||
9 | @menu | |
10 | * Wildcard Matching:: Matching a wildcard pattern against a single string. | |
11 | * Globbing:: Finding the files that match a wildcard pattern. | |
12 | * Regular Expressions:: Matching regular expressions against strings. | |
13 | * Word Expansion:: Expanding shell variables, nested commands, | |
14 | arithmetic, and wildcards. | |
15 | This is what the shell does with shell commands. | |
16 | @end menu | |
17 | ||
18 | @node Wildcard Matching | |
19 | @section Wildcard Matching | |
20 | ||
21 | @pindex fnmatch.h | |
22 | This section describes how to match a wildcard pattern against a | |
23 | particular string. The result is a yes or no answer: does the | |
24 | string fit the pattern or not. The symbols described here are all | |
25 | declared in @file{fnmatch.h}. | |
26 | ||
27 | @comment fnmatch.h | |
28 | @comment POSIX.2 | |
29 | @deftypefun int fnmatch (const char *@var{pattern}, const char *@var{string}, int @var{flags}) | |
30 | This function tests whether the string @var{string} matches the pattern | |
31 | @var{pattern}. It returns @code{0} if they do match; otherwise, it | |
32 | returns the nonzero value @code{FNM_NOMATCH}. The arguments | |
33 | @var{pattern} and @var{string} are both strings. | |
34 | ||
35 | The argument @var{flags} is a combination of flag bits that alter the | |
36 | details of matching. See below for a list of the defined flags. | |
37 | ||
38 | In the GNU C Library, @code{fnmatch} cannot experience an ``error''---it | |
39 | always returns an answer for whether the match succeeds. However, other | |
40 | implementations of @code{fnmatch} might sometimes report ``errors''. | |
41 | They would do so by returning nonzero values that are not equal to | |
42 | @code{FNM_NOMATCH}. | |
43 | @end deftypefun | |
44 | ||
45 | These are the available flags for the @var{flags} argument: | |
46 | ||
47 | @table @code | |
48 | @comment fnmatch.h | |
49 | @comment GNU | |
50 | @item FNM_FILE_NAME | |
51 | Treat the @samp{/} character specially, for matching file names. If | |
52 | this flag is set, wildcard constructs in @var{pattern} cannot match | |
53 | @samp{/} in @var{string}. Thus, the only way to match @samp{/} is with | |
54 | an explicit @samp{/} in @var{pattern}. | |
55 | ||
56 | @comment fnmatch.h | |
57 | @comment POSIX.2 | |
58 | @item FNM_PATHNAME | |
59 | This is an alias for @code{FNM_FILE_NAME}; it comes from POSIX.2. We | |
60 | don't recommend this name because we don't use the term ``pathname'' for | |
61 | file names. | |
62 | ||
63 | @comment fnmatch.h | |
64 | @comment POSIX.2 | |
65 | @item FNM_PERIOD | |
66 | Treat the @samp{.} character specially if it appears at the beginning of | |
67 | @var{string}. If this flag is set, wildcard constructs in @var{pattern} | |
68 | cannot match @samp{.} as the first character of @var{string}. | |
69 | ||
70 | If you set both @code{FNM_PERIOD} and @code{FNM_FILE_NAME}, then the | |
71 | special treatment applies to @samp{.} following @samp{/} as well as to | |
72 | @samp{.} at the beginning of @var{string}. (The shell uses the | |
6952e59e | 73 | @code{FNM_PERIOD} and @code{FNM_FILE_NAME} flags together for matching |
28f540f4 RM |
74 | file names.) |
75 | ||
76 | @comment fnmatch.h | |
77 | @comment POSIX.2 | |
78 | @item FNM_NOESCAPE | |
79 | Don't treat the @samp{\} character specially in patterns. Normally, | |
80 | @samp{\} quotes the following character, turning off its special meaning | |
81 | (if any) so that it matches only itself. When quoting is enabled, the | |
82 | pattern @samp{\?} matches only the string @samp{?}, because the question | |
83 | mark in the pattern acts like an ordinary character. | |
84 | ||
85 | If you use @code{FNM_NOESCAPE}, then @samp{\} is an ordinary character. | |
86 | ||
87 | @comment fnmatch.h | |
88 | @comment GNU | |
89 | @item FNM_LEADING_DIR | |
90 | Ignore a trailing sequence of characters starting with a @samp{/} in | |
91 | @var{string}; that is to say, test whether @var{string} starts with a | |
92 | directory name that @var{pattern} matches. | |
93 | ||
94 | If this flag is set, either @samp{foo*} or @samp{foobar} as a pattern | |
95 | would match the string @samp{foobar/frobozz}. | |
96 | ||
97 | @comment fnmatch.h | |
98 | @comment GNU | |
99 | @item FNM_CASEFOLD | |
100 | Ignore case in comparing @var{string} to @var{pattern}. | |
101 | @end table | |
102 | ||
103 | @node Globbing | |
104 | @section Globbing | |
105 | ||
106 | @cindex globbing | |
107 | The archetypal use of wildcards is for matching against the files in a | |
108 | directory, and making a list of all the matches. This is called | |
109 | @dfn{globbing}. | |
110 | ||
111 | You could do this using @code{fnmatch}, by reading the directory entries | |
112 | one by one and testing each one with @code{fnmatch}. But that would be | |
113 | slow (and complex, since you would have to handle subdirectories by | |
114 | hand). | |
115 | ||
116 | The library provides a function @code{glob} to make this particular use | |
117 | of wildcards convenient. @code{glob} and the other symbols in this | |
118 | section are declared in @file{glob.h}. | |
119 | ||
120 | @menu | |
714a562f UD |
121 | * Calling Glob:: Basic use of @code{glob}. |
122 | * Flags for Globbing:: Flags that enable various options in @code{glob}. | |
123 | * More Flags for Globbing:: GNU specific extensions to @code{glob}. | |
28f540f4 RM |
124 | @end menu |
125 | ||
126 | @node Calling Glob | |
127 | @subsection Calling @code{glob} | |
128 | ||
129 | The result of globbing is a vector of file names (strings). To return | |
130 | this vector, @code{glob} uses a special data type, @code{glob_t}, which | |
131 | is a structure. You pass @code{glob} the address of the structure, and | |
132 | it fills in the structure's fields to tell you about the results. | |
133 | ||
134 | @comment glob.h | |
135 | @comment POSIX.2 | |
136 | @deftp {Data Type} glob_t | |
137 | This data type holds a pointer to a word vector. More precisely, it | |
714a562f UD |
138 | records both the address of the word vector and its size. The GNU |
139 | implementation contains some more fields which are non-standard | |
140 | extensions. | |
28f540f4 RM |
141 | |
142 | @table @code | |
143 | @item gl_pathc | |
144 | The number of elements in the vector. | |
145 | ||
146 | @item gl_pathv | |
147 | The address of the vector. This field has type @w{@code{char **}}. | |
148 | ||
149 | @item gl_offs | |
150 | The offset of the first real element of the vector, from its nominal | |
151 | address in the @code{gl_pathv} field. Unlike the other fields, this | |
152 | is always an input to @code{glob}, rather than an output from it. | |
153 | ||
154 | If you use a nonzero offset, then that many elements at the beginning of | |
155 | the vector are left empty. (The @code{glob} function fills them with | |
156 | null pointers.) | |
157 | ||
158 | The @code{gl_offs} field is meaningful only if you use the | |
159 | @code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero | |
160 | regardless of what is in this field, and the first real element comes at | |
161 | the beginning of the vector. | |
714a562f UD |
162 | |
163 | @item gl_closedir | |
164 | The address of an alternative implementation of the @code{closedir} | |
165 | function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in | |
166 | the flag parameter. The type of this field is | |
167 | @w{@code{void (*) (void *)}}. | |
168 | ||
169 | This is a GNU extension. | |
170 | ||
171 | @item gl_readdir | |
172 | The address of an alternative implementation of the @code{readdir} | |
173 | function used to read the contents of a directory. It is used if the | |
174 | @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of | |
175 | this field is @w{@code{struct dirent *(*) (void *)}}. | |
176 | ||
177 | This is a GNU extension. | |
178 | ||
179 | @item gl_opendir | |
180 | The address of an alternative implementation of the @code{opendir} | |
181 | function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in | |
182 | the flag parameter. The type of this field is | |
183 | @w{@code{void *(*) (const char *)}}. | |
184 | ||
185 | This is a GNU extension. | |
186 | ||
187 | @item gl_stat | |
188 | The address of an alternative implementation of the @code{stat} function | |
189 | to get information about an object in the filesystem. It is used if the | |
190 | @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of | |
191 | this field is @w{@code{int (*) (const char *, struct stat *)}}. | |
192 | ||
193 | This is a GNU extension. | |
194 | ||
195 | @item gl_lstat | |
196 | The address of an alternative implementation of the @code{lstat} | |
197 | function to get information about an object in the filesystems, not | |
198 | following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit | |
199 | is set in the flag parameter. The type of this field is @w{@code{int | |
200 | (*) (const char *, struct stat *)}}. | |
201 | ||
202 | This is a GNU extension. | |
28f540f4 RM |
203 | @end table |
204 | @end deftp | |
205 | ||
206 | @comment glob.h | |
207 | @comment POSIX.2 | |
208 | @deftypefun int glob (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob_t *@var{vector-ptr}) | |
209 | The function @code{glob} does globbing using the pattern @var{pattern} | |
210 | in the current directory. It puts the result in a newly allocated | |
211 | vector, and stores the size and address of this vector into | |
212 | @code{*@var{vector-ptr}}. The argument @var{flags} is a combination of | |
213 | bit flags; see @ref{Flags for Globbing}, for details of the flags. | |
214 | ||
215 | The result of globbing is a sequence of file names. The function | |
216 | @code{glob} allocates a string for each resulting word, then | |
217 | allocates a vector of type @code{char **} to store the addresses of | |
218 | these strings. The last element of the vector is a null pointer. | |
219 | This vector is called the @dfn{word vector}. | |
220 | ||
221 | To return this vector, @code{glob} stores both its address and its | |
222 | length (number of elements, not counting the terminating null pointer) | |
223 | into @code{*@var{vector-ptr}}. | |
224 | ||
6d52618b | 225 | Normally, @code{glob} sorts the file names alphabetically before |
28f540f4 RM |
226 | returning them. You can turn this off with the flag @code{GLOB_NOSORT} |
227 | if you want to get the information as fast as possible. Usually it's | |
228 | a good idea to let @code{glob} sort them---if you process the files in | |
229 | alphabetical order, the users will have a feel for the rate of progress | |
230 | that your application is making. | |
231 | ||
232 | If @code{glob} succeeds, it returns 0. Otherwise, it returns one | |
233 | of these error codes: | |
234 | ||
235 | @table @code | |
236 | @comment glob.h | |
237 | @comment POSIX.2 | |
238 | @item GLOB_ABORTED | |
239 | There was an error opening a directory, and you used the flag | |
240 | @code{GLOB_ERR} or your specified @var{errfunc} returned a nonzero | |
241 | value. | |
242 | @iftex | |
243 | See below | |
244 | @end iftex | |
245 | @ifinfo | |
246 | @xref{Flags for Globbing}, | |
247 | @end ifinfo | |
248 | for an explanation of the @code{GLOB_ERR} flag and @var{errfunc}. | |
249 | ||
250 | @comment glob.h | |
251 | @comment POSIX.2 | |
252 | @item GLOB_NOMATCH | |
253 | The pattern didn't match any existing files. If you use the | |
254 | @code{GLOB_NOCHECK} flag, then you never get this error code, because | |
255 | that flag tells @code{glob} to @emph{pretend} that the pattern matched | |
256 | at least one file. | |
257 | ||
258 | @comment glob.h | |
259 | @comment POSIX.2 | |
260 | @item GLOB_NOSPACE | |
261 | It was impossible to allocate memory to hold the result. | |
262 | @end table | |
263 | ||
264 | In the event of an error, @code{glob} stores information in | |
265 | @code{*@var{vector-ptr}} about all the matches it has found so far. | |
266 | @end deftypefun | |
267 | ||
268 | @node Flags for Globbing | |
269 | @subsection Flags for Globbing | |
270 | ||
6d52618b | 271 | This section describes the flags that you can specify in the |
28f540f4 RM |
272 | @var{flags} argument to @code{glob}. Choose the flags you want, |
273 | and combine them with the C bitwise OR operator @code{|}. | |
274 | ||
275 | @table @code | |
276 | @comment glob.h | |
277 | @comment POSIX.2 | |
278 | @item GLOB_APPEND | |
279 | Append the words from this expansion to the vector of words produced by | |
280 | previous calls to @code{glob}. This way you can effectively expand | |
281 | several words as if they were concatenated with spaces between them. | |
282 | ||
283 | In order for appending to work, you must not modify the contents of the | |
284 | word vector structure between calls to @code{glob}. And, if you set | |
285 | @code{GLOB_DOOFFS} in the first call to @code{glob}, you must also | |
286 | set it when you append to the results. | |
287 | ||
288 | Note that the pointer stored in @code{gl_pathv} may no longer be valid | |
289 | after you call @code{glob} the second time, because @code{glob} might | |
290 | have relocated the vector. So always fetch @code{gl_pathv} from the | |
291 | @code{glob_t} structure after each @code{glob} call; @strong{never} save | |
292 | the pointer across calls. | |
293 | ||
294 | @comment glob.h | |
295 | @comment POSIX.2 | |
296 | @item GLOB_DOOFFS | |
297 | Leave blank slots at the beginning of the vector of words. | |
298 | The @code{gl_offs} field says how many slots to leave. | |
299 | The blank slots contain null pointers. | |
300 | ||
301 | @comment glob.h | |
302 | @comment POSIX.2 | |
303 | @item GLOB_ERR | |
304 | Give up right away and report an error if there is any difficulty | |
305 | reading the directories that must be read in order to expand @var{pattern} | |
306 | fully. Such difficulties might include a directory in which you don't | |
307 | have the requisite access. Normally, @code{glob} tries its best to keep | |
308 | on going despite any errors, reading whatever directories it can. | |
309 | ||
310 | You can exercise even more control than this by specifying an | |
311 | error-handler function @var{errfunc} when you call @code{glob}. If | |
312 | @var{errfunc} is not a null pointer, then @code{glob} doesn't give up | |
313 | right away when it can't read a directory; instead, it calls | |
314 | @var{errfunc} with two arguments, like this: | |
315 | ||
316 | @smallexample | |
317 | (*@var{errfunc}) (@var{filename}, @var{error-code}) | |
318 | @end smallexample | |
319 | ||
320 | @noindent | |
321 | The argument @var{filename} is the name of the directory that | |
322 | @code{glob} couldn't open or couldn't read, and @var{error-code} is the | |
323 | @code{errno} value that was reported to @code{glob}. | |
324 | ||
325 | If the error handler function returns nonzero, then @code{glob} gives up | |
326 | right away. Otherwise, it continues. | |
327 | ||
328 | @comment glob.h | |
329 | @comment POSIX.2 | |
330 | @item GLOB_MARK | |
331 | If the pattern matches the name of a directory, append @samp{/} to the | |
332 | directory's name when returning it. | |
333 | ||
334 | @comment glob.h | |
335 | @comment POSIX.2 | |
336 | @item GLOB_NOCHECK | |
337 | If the pattern doesn't match any file names, return the pattern itself | |
338 | as if it were a file name that had been matched. (Normally, when the | |
339 | pattern doesn't match anything, @code{glob} returns that there were no | |
340 | matches.) | |
341 | ||
342 | @comment glob.h | |
343 | @comment POSIX.2 | |
344 | @item GLOB_NOSORT | |
345 | Don't sort the file names; return them in no particular order. | |
346 | (In practice, the order will depend on the order of the entries in | |
347 | the directory.) The only reason @emph{not} to sort is to save time. | |
348 | ||
349 | @comment glob.h | |
350 | @comment POSIX.2 | |
351 | @item GLOB_NOESCAPE | |
352 | Don't treat the @samp{\} character specially in patterns. Normally, | |
353 | @samp{\} quotes the following character, turning off its special meaning | |
354 | (if any) so that it matches only itself. When quoting is enabled, the | |
355 | pattern @samp{\?} matches only the string @samp{?}, because the question | |
356 | mark in the pattern acts like an ordinary character. | |
357 | ||
358 | If you use @code{GLOB_NOESCAPE}, then @samp{\} is an ordinary character. | |
359 | ||
360 | @code{glob} does its work by calling the function @code{fnmatch} | |
361 | repeatedly. It handles the flag @code{GLOB_NOESCAPE} by turning on the | |
362 | @code{FNM_NOESCAPE} flag in calls to @code{fnmatch}. | |
363 | @end table | |
364 | ||
714a562f UD |
365 | @node More Flags for Globbing |
366 | @subsection More Flags for Globbing | |
367 | ||
368 | Beside the flags descibed in the last section, the GNU implementation of | |
369 | @code{glob} allows a few more flags which are also defined in the | |
370 | @file{glob.h} file. Some of the extensions implement functionality | |
371 | which is available in modern shell implementations. | |
372 | ||
373 | @table @code | |
374 | @comment glob.h | |
375 | @comment GNU | |
376 | @item GLOB_PERIOD | |
377 | The @code{.} character (period) is treated special. It cannot be | |
378 | matched by wildcards. @xref{Wildcard Matching}, @code{FNM_PERIOD}. | |
379 | ||
380 | @comment glob.h | |
381 | @comment GNU | |
382 | @item GLOB_MAGCHAR | |
383 | The @code{GLOB_MAGCHAR} value is not to be given to @code{glob} in the | |
384 | @var{flags} parameter. Instead, @code{glob} sets this bit in the | |
385 | @var{gl_flags} element of the @var{glob_t} structure provided as the | |
386 | result if the pattern used for matching contains any wildcard character. | |
387 | ||
388 | @comment glob.h | |
389 | @comment GNU | |
390 | @item GLOB_ALTDIRFUNC | |
391 | Instead of the using the using the normal functions for accessing the | |
392 | filesystem the @code{glob} implementation uses the user-supplied | |
393 | functions specified in the structure pointed to by @var{pglob} | |
394 | parameter. For more information about the functions refer to the | |
395 | sections about directory handling @ref{Accessing Directories} and | |
396 | @ref{Reading Attributes}. | |
397 | ||
398 | @comment glob.h | |
399 | @comment GNU | |
400 | @item GLOB_BRACE | |
401 | If this flag is given the handling of braces in the pattern is changed. | |
402 | It is now required that braces appear correctly grouped. I.e., for each | |
403 | opening brace there must be a closing one. Braces can be used | |
404 | recursively. So it is possible to define one brace expression in | |
405 | another one. It is important to note that the range of each brace | |
406 | expression is completely contained in the outer brace expression (if | |
407 | there is one). | |
408 | ||
409 | The string between the mathing braces is separated into single | |
410 | expressions by splitting at @code{,} (comma) characters. The commas | |
411 | themself are discarded. Please note what we said above about recursive | |
412 | brace expressions. The commas used to separate the subexpressions must | |
413 | be at the same level. Commas in brace subexpressions are not matched. | |
414 | They are used during expansion of the brace expression of the deeper | |
415 | level. The example below shows this | |
416 | ||
417 | @smallexample | |
418 | glob ("@{foo/@{,bar,biz@},baz@}", GLOB_BRACE, NULL, &result) | |
419 | @end smallexample | |
420 | ||
421 | @noindent | |
422 | is equivalent to the sequence | |
423 | ||
424 | @smallexample | |
425 | glob ("foo/", GLOB_BRACE, NULL, &result) | |
426 | glob ("foo/bar", GLOB_BRACE|GLOB_APPEND, NULL, &result) | |
427 | glob ("foo/biz", GLOB_BRACE|GLOB_APPEND, NULL, &result) | |
428 | glob ("baz", GLOB_BRACE|GLOB_APPEND, NULL, &result) | |
429 | @end smallexample | |
430 | ||
431 | @noindent | |
432 | if we leave aside error handling. | |
433 | ||
434 | @comment glob.h | |
435 | @comment GNU | |
436 | @item GLOB_NOMAGIC | |
437 | If the pattern contains no wildcard constructs (it is a literal file name), | |
438 | return it as the sole ``matching'' word, even if no file exists by that name. | |
439 | ||
440 | @comment glob.h | |
441 | @comment GNU | |
442 | @item GLOB_TILDE | |
443 | If this flag is used the character @code{~} (tilde) is handled special | |
444 | if it appears at the beginning of the pattern. Instead of being taken | |
445 | verbatim it is used to represent the home directory of a known user. | |
446 | ||
447 | If @code{~} is the only character in pattern or it is followed by a | |
448 | @code{/} (slash), the home directory of the process owner is | |
449 | substituted. Using @code{getlogin} and @code{getpwnam} the information | |
450 | is read from the system databases. As an example take user @code{bart} | |
451 | with his home directory at @file{/home/bart}. For him a call like | |
452 | ||
453 | @smallexample | |
454 | glob ("~/bin/*", GLOB_TILDE, NULL, &result) | |
455 | @end smallexample | |
456 | ||
457 | @noindent | |
458 | would return the contents of the directory @file{/home/bart/bin}. | |
459 | Instead of referring to the own home directory it is also possible to | |
460 | name the home directory of other users. To do so one has to append the | |
461 | user name after the tilde character. So the contents of user | |
462 | @code{homer}'s @file{bin} directory can be retrieved by | |
463 | ||
464 | @smallexample | |
465 | glob ("~homer/bin/*", GLOB_TILDE, NULL, &result) | |
466 | @end smallexample | |
467 | ||
468 | This functionality is equivalent to what is available in C-shells. | |
469 | @end table | |
470 | ||
471 | ||
28f540f4 RM |
472 | @node Regular Expressions |
473 | @section Regular Expression Matching | |
474 | ||
475 | The GNU C library supports two interfaces for matching regular | |
476 | expressions. One is the standard POSIX.2 interface, and the other is | |
477 | what the GNU system has had for many years. | |
478 | ||
479 | Both interfaces are declared in the header file @file{regex.h}. | |
480 | If you define @w{@code{_POSIX_C_SOURCE}}, then only the POSIX.2 | |
481 | functions, structures, and constants are declared. | |
482 | @c !!! we only document the POSIX.2 interface here!! | |
483 | ||
484 | @menu | |
485 | * POSIX Regexp Compilation:: Using @code{regcomp} to prepare to match. | |
486 | * Flags for POSIX Regexps:: Syntax variations for @code{regcomp}. | |
487 | * Matching POSIX Regexps:: Using @code{regexec} to match the compiled | |
488 | pattern that you get from @code{regcomp}. | |
489 | * Regexp Subexpressions:: Finding which parts of the string were matched. | |
490 | * Subexpression Complications:: Find points of which parts were matched. | |
491 | * Regexp Cleanup:: Freeing storage; reporting errors. | |
492 | @end menu | |
493 | ||
494 | @node POSIX Regexp Compilation | |
495 | @subsection POSIX Regular Expression Compilation | |
496 | ||
497 | Before you can actually match a regular expression, you must | |
498 | @dfn{compile} it. This is not true compilation---it produces a special | |
499 | data structure, not machine instructions. But it is like ordinary | |
500 | compilation in that its purpose is to enable you to ``execute'' the | |
501 | pattern fast. (@xref{Matching POSIX Regexps}, for how to use the | |
502 | compiled regular expression for matching.) | |
503 | ||
504 | There is a special data type for compiled regular expressions: | |
505 | ||
506 | @comment regex.h | |
507 | @comment POSIX.2 | |
508 | @deftp {Data Type} regex_t | |
509 | This type of object holds a compiled regular expression. | |
510 | It is actually a structure. It has just one field that your programs | |
511 | should look at: | |
512 | ||
513 | @table @code | |
514 | @item re_nsub | |
515 | This field holds the number of parenthetical subexpressions in the | |
516 | regular expression that was compiled. | |
517 | @end table | |
518 | ||
519 | There are several other fields, but we don't describe them here, because | |
520 | only the functions in the library should use them. | |
521 | @end deftp | |
522 | ||
523 | After you create a @code{regex_t} object, you can compile a regular | |
524 | expression into it by calling @code{regcomp}. | |
525 | ||
526 | @comment regex.h | |
527 | @comment POSIX.2 | |
528 | @deftypefun int regcomp (regex_t *@var{compiled}, const char *@var{pattern}, int @var{cflags}) | |
529 | The function @code{regcomp} ``compiles'' a regular expression into a | |
530 | data structure that you can use with @code{regexec} to match against a | |
531 | string. The compiled regular expression format is designed for | |
532 | efficient matching. @code{regcomp} stores it into @code{*@var{compiled}}. | |
533 | ||
534 | It's up to you to allocate an object of type @code{regex_t} and pass its | |
535 | address to @code{regcomp}. | |
536 | ||
537 | The argument @var{cflags} lets you specify various options that control | |
538 | the syntax and semantics of regular expressions. @xref{Flags for POSIX | |
539 | Regexps}. | |
540 | ||
541 | If you use the flag @code{REG_NOSUB}, then @code{regcomp} omits from | |
542 | the compiled regular expression the information necessary to record | |
543 | how subexpressions actually match. In this case, you might as well | |
544 | pass @code{0} for the @var{matchptr} and @var{nmatch} arguments when | |
545 | you call @code{regexec}. | |
546 | ||
547 | If you don't use @code{REG_NOSUB}, then the compiled regular expression | |
548 | does have the capacity to record how subexpressions match. Also, | |
549 | @code{regcomp} tells you how many subexpressions @var{pattern} has, by | |
550 | storing the number in @code{@var{compiled}->re_nsub}. You can use that | |
551 | value to decide how long an array to allocate to hold information about | |
552 | subexpression matches. | |
553 | ||
554 | @code{regcomp} returns @code{0} if it succeeds in compiling the regular | |
555 | expression; otherwise, it returns a nonzero error code (see the table | |
556 | below). You can use @code{regerror} to produce an error message string | |
557 | describing the reason for a nonzero value; see @ref{Regexp Cleanup}. | |
558 | ||
559 | @end deftypefun | |
560 | ||
561 | Here are the possible nonzero values that @code{regcomp} can return: | |
562 | ||
563 | @table @code | |
564 | @comment regex.h | |
565 | @comment POSIX.2 | |
566 | @item REG_BADBR | |
567 | There was an invalid @samp{\@{@dots{}\@}} construct in the regular | |
568 | expression. A valid @samp{\@{@dots{}\@}} construct must contain either | |
569 | a single number, or two numbers in increasing order separated by a | |
570 | comma. | |
571 | ||
572 | @comment regex.h | |
573 | @comment POSIX.2 | |
574 | @item REG_BADPAT | |
575 | There was a syntax error in the regular expression. | |
576 | ||
577 | @comment regex.h | |
578 | @comment POSIX.2 | |
579 | @item REG_BADRPT | |
580 | A repetition operator such as @samp{?} or @samp{*} appeared in a bad | |
581 | position (with no preceding subexpression to act on). | |
582 | ||
583 | @comment regex.h | |
584 | @comment POSIX.2 | |
585 | @item REG_ECOLLATE | |
586 | The regular expression referred to an invalid collating element (one not | |
587 | defined in the current locale for string collation). @xref{Locale | |
588 | Categories}. | |
589 | ||
590 | @comment regex.h | |
591 | @comment POSIX.2 | |
592 | @item REG_ECTYPE | |
593 | The regular expression referred to an invalid character class name. | |
594 | ||
595 | @comment regex.h | |
596 | @comment POSIX.2 | |
597 | @item REG_EESCAPE | |
598 | The regular expression ended with @samp{\}. | |
599 | ||
600 | @comment regex.h | |
601 | @comment POSIX.2 | |
602 | @item REG_ESUBREG | |
603 | There was an invalid number in the @samp{\@var{digit}} construct. | |
604 | ||
605 | @comment regex.h | |
606 | @comment POSIX.2 | |
607 | @item REG_EBRACK | |
608 | There were unbalanced square brackets in the regular expression. | |
609 | ||
610 | @comment regex.h | |
611 | @comment POSIX.2 | |
612 | @item REG_EPAREN | |
613 | An extended regular expression had unbalanced parentheses, | |
614 | or a basic regular expression had unbalanced @samp{\(} and @samp{\)}. | |
615 | ||
616 | @comment regex.h | |
617 | @comment POSIX.2 | |
618 | @item REG_EBRACE | |
619 | The regular expression had unbalanced @samp{\@{} and @samp{\@}}. | |
620 | ||
621 | @comment regex.h | |
622 | @comment POSIX.2 | |
623 | @item REG_ERANGE | |
624 | One of the endpoints in a range expression was invalid. | |
625 | ||
626 | @comment regex.h | |
627 | @comment POSIX.2 | |
628 | @item REG_ESPACE | |
629 | @code{regcomp} ran out of memory. | |
630 | @end table | |
631 | ||
632 | @node Flags for POSIX Regexps | |
633 | @subsection Flags for POSIX Regular Expressions | |
634 | ||
635 | These are the bit flags that you can use in the @var{cflags} operand when | |
636 | compiling a regular expression with @code{regcomp}. | |
6d52618b | 637 | |
28f540f4 RM |
638 | @table @code |
639 | @comment regex.h | |
640 | @comment POSIX.2 | |
641 | @item REG_EXTENDED | |
642 | Treat the pattern as an extended regular expression, rather than as a | |
643 | basic regular expression. | |
644 | ||
645 | @comment regex.h | |
646 | @comment POSIX.2 | |
647 | @item REG_ICASE | |
648 | Ignore case when matching letters. | |
649 | ||
650 | @comment regex.h | |
651 | @comment POSIX.2 | |
652 | @item REG_NOSUB | |
653 | Don't bother storing the contents of the @var{matches-ptr} array. | |
654 | ||
655 | @comment regex.h | |
656 | @comment POSIX.2 | |
657 | @item REG_NEWLINE | |
658 | Treat a newline in @var{string} as dividing @var{string} into multiple | |
659 | lines, so that @samp{$} can match before the newline and @samp{^} can | |
660 | match after. Also, don't permit @samp{.} to match a newline, and don't | |
661 | permit @samp{[^@dots{}]} to match a newline. | |
662 | ||
663 | Otherwise, newline acts like any other ordinary character. | |
664 | @end table | |
665 | ||
666 | @node Matching POSIX Regexps | |
667 | @subsection Matching a Compiled POSIX Regular Expression | |
668 | ||
669 | Once you have compiled a regular expression, as described in @ref{POSIX | |
670 | Regexp Compilation}, you can match it against strings using | |
671 | @code{regexec}. A match anywhere inside the string counts as success, | |
672 | unless the regular expression contains anchor characters (@samp{^} or | |
673 | @samp{$}). | |
674 | ||
675 | @comment regex.h | |
676 | @comment POSIX.2 | |
677 | @deftypefun int regexec (regex_t *@var{compiled}, char *@var{string}, size_t @var{nmatch}, regmatch_t @var{matchptr} @t{[]}, int @var{eflags}) | |
678 | This function tries to match the compiled regular expression | |
679 | @code{*@var{compiled}} against @var{string}. | |
680 | ||
681 | @code{regexec} returns @code{0} if the regular expression matches; | |
682 | otherwise, it returns a nonzero value. See the table below for | |
683 | what nonzero values mean. You can use @code{regerror} to produce an | |
6d52618b | 684 | error message string describing the reason for a nonzero value; |
28f540f4 RM |
685 | see @ref{Regexp Cleanup}. |
686 | ||
687 | The argument @var{eflags} is a word of bit flags that enable various | |
688 | options. | |
689 | ||
690 | If you want to get information about what part of @var{string} actually | |
691 | matched the regular expression or its subexpressions, use the arguments | |
6d52618b | 692 | @var{matchptr} and @var{nmatch}. Otherwise, pass @code{0} for |
28f540f4 RM |
693 | @var{nmatch}, and @code{NULL} for @var{matchptr}. @xref{Regexp |
694 | Subexpressions}. | |
695 | @end deftypefun | |
696 | ||
697 | You must match the regular expression with the same set of current | |
698 | locales that were in effect when you compiled the regular expression. | |
699 | ||
700 | The function @code{regexec} accepts the following flags in the | |
701 | @var{eflags} argument: | |
702 | ||
6d52618b | 703 | @table @code |
28f540f4 RM |
704 | @comment regex.h |
705 | @comment POSIX.2 | |
706 | @item REG_NOTBOL | |
707 | Do not regard the beginning of the specified string as the beginning of | |
708 | a line; more generally, don't make any assumptions about what text might | |
709 | precede it. | |
710 | ||
711 | @comment regex.h | |
712 | @comment POSIX.2 | |
713 | @item REG_NOTEOL | |
714 | Do not regard the end of the specified string as the end of a line; more | |
715 | generally, don't make any assumptions about what text might follow it. | |
716 | @end table | |
717 | ||
718 | Here are the possible nonzero values that @code{regexec} can return: | |
719 | ||
720 | @table @code | |
721 | @comment regex.h | |
722 | @comment POSIX.2 | |
723 | @item REG_NOMATCH | |
724 | The pattern didn't match the string. This isn't really an error. | |
725 | ||
726 | @comment regex.h | |
727 | @comment POSIX.2 | |
728 | @item REG_ESPACE | |
729 | @code{regexec} ran out of memory. | |
730 | @end table | |
731 | ||
732 | @node Regexp Subexpressions | |
733 | @subsection Match Results with Subexpressions | |
734 | ||
735 | When @code{regexec} matches parenthetical subexpressions of | |
736 | @var{pattern}, it records which parts of @var{string} they match. It | |
737 | returns that information by storing the offsets into an array whose | |
738 | elements are structures of type @code{regmatch_t}. The first element of | |
739 | the array (index @code{0}) records the part of the string that matched | |
740 | the entire regular expression. Each other element of the array records | |
741 | the beginning and end of the part that matched a single parenthetical | |
742 | subexpression. | |
743 | ||
744 | @comment regex.h | |
745 | @comment POSIX.2 | |
746 | @deftp {Data Type} regmatch_t | |
747 | This is the data type of the @var{matcharray} array that you pass to | |
6d52618b | 748 | @code{regexec}. It contains two structure fields, as follows: |
28f540f4 RM |
749 | |
750 | @table @code | |
751 | @item rm_so | |
752 | The offset in @var{string} of the beginning of a substring. Add this | |
753 | value to @var{string} to get the address of that part. | |
754 | ||
755 | @item rm_eo | |
756 | The offset in @var{string} of the end of the substring. | |
757 | @end table | |
758 | @end deftp | |
759 | ||
760 | @comment regex.h | |
761 | @comment POSIX.2 | |
762 | @deftp {Data Type} regoff_t | |
763 | @code{regoff_t} is an alias for another signed integer type. | |
764 | The fields of @code{regmatch_t} have type @code{regoff_t}. | |
765 | @end deftp | |
766 | ||
767 | The @code{regmatch_t} elements correspond to subexpressions | |
768 | positionally; the first element (index @code{1}) records where the first | |
769 | subexpression matched, the second element records the second | |
770 | subexpression, and so on. The order of the subexpressions is the order | |
771 | in which they begin. | |
772 | ||
773 | When you call @code{regexec}, you specify how long the @var{matchptr} | |
774 | array is, with the @var{nmatch} argument. This tells @code{regexec} how | |
775 | many elements to store. If the actual regular expression has more than | |
776 | @var{nmatch} subexpressions, then you won't get offset information about | |
777 | the rest of them. But this doesn't alter whether the pattern matches a | |
778 | particular string or not. | |
779 | ||
780 | If you don't want @code{regexec} to return any information about where | |
781 | the subexpressions matched, you can either supply @code{0} for | |
782 | @var{nmatch}, or use the flag @code{REG_NOSUB} when you compile the | |
783 | pattern with @code{regcomp}. | |
784 | ||
785 | @node Subexpression Complications | |
786 | @subsection Complications in Subexpression Matching | |
787 | ||
788 | Sometimes a subexpression matches a substring of no characters. This | |
789 | happens when @samp{f\(o*\)} matches the string @samp{fum}. (It really | |
790 | matches just the @samp{f}.) In this case, both of the offsets identify | |
791 | the point in the string where the null substring was found. In this | |
792 | example, the offsets are both @code{1}. | |
793 | ||
794 | Sometimes the entire regular expression can match without using some of | |
795 | its subexpressions at all---for example, when @samp{ba\(na\)*} matches the | |
796 | string @samp{ba}, the parenthetical subexpression is not used. When | |
797 | this happens, @code{regexec} stores @code{-1} in both fields of the | |
798 | element for that subexpression. | |
799 | ||
800 | Sometimes matching the entire regular expression can match a particular | |
801 | subexpression more than once---for example, when @samp{ba\(na\)*} | |
802 | matches the string @samp{bananana}, the parenthetical subexpression | |
803 | matches three times. When this happens, @code{regexec} usually stores | |
804 | the offsets of the last part of the string that matched the | |
805 | subexpression. In the case of @samp{bananana}, these offsets are | |
806 | @code{6} and @code{8}. | |
807 | ||
808 | But the last match is not always the one that is chosen. It's more | |
809 | accurate to say that the last @emph{opportunity} to match is the one | |
810 | that takes precedence. What this means is that when one subexpression | |
811 | appears within another, then the results reported for the inner | |
812 | subexpression reflect whatever happened on the last match of the outer | |
813 | subexpression. For an example, consider @samp{\(ba\(na\)*s \)*} matching | |
814 | the string @samp{bananas bas }. The last time the inner expression | |
6d52618b | 815 | actually matches is near the end of the first word. But it is |
28f540f4 RM |
816 | @emph{considered} again in the second word, and fails to match there. |
817 | @code{regexec} reports nonuse of the ``na'' subexpression. | |
818 | ||
819 | Another place where this rule applies is when the regular expression | |
820 | @w{@samp{\(ba\(na\)*s \|nefer\(ti\)* \)*}} matches @samp{bananas nefertiti}. | |
821 | The ``na'' subexpression does match in the first word, but it doesn't | |
822 | match in the second word because the other alternative is used there. | |
823 | Once again, the second repetition of the outer subexpression overrides | |
824 | the first, and within that second repetition, the ``na'' subexpression | |
825 | is not used. So @code{regexec} reports nonuse of the ``na'' | |
826 | subexpression. | |
827 | ||
828 | @node Regexp Cleanup | |
829 | @subsection POSIX Regexp Matching Cleanup | |
830 | ||
831 | When you are finished using a compiled regular expression, you can | |
832 | free the storage it uses by calling @code{regfree}. | |
833 | ||
834 | @comment regex.h | |
835 | @comment POSIX.2 | |
836 | @deftypefun void regfree (regex_t *@var{compiled}) | |
837 | Calling @code{regfree} frees all the storage that @code{*@var{compiled}} | |
838 | points to. This includes various internal fields of the @code{regex_t} | |
839 | structure that aren't documented in this manual. | |
840 | ||
841 | @code{regfree} does not free the object @code{*@var{compiled}} itself. | |
842 | @end deftypefun | |
843 | ||
844 | You should always free the space in a @code{regex_t} structure with | |
845 | @code{regfree} before using the structure to compile another regular | |
846 | expression. | |
847 | ||
848 | When @code{regcomp} or @code{regexec} reports an error, you can use | |
849 | the function @code{regerror} to turn it into an error message string. | |
850 | ||
851 | @comment regex.h | |
852 | @comment POSIX.2 | |
853 | @deftypefun size_t regerror (int @var{errcode}, regex_t *@var{compiled}, char *@var{buffer}, size_t @var{length}) | |
854 | This function produces an error message string for the error code | |
855 | @var{errcode}, and stores the string in @var{length} bytes of memory | |
856 | starting at @var{buffer}. For the @var{compiled} argument, supply the | |
857 | same compiled regular expression structure that @code{regcomp} or | |
858 | @code{regexec} was working with when it got the error. Alternatively, | |
859 | you can supply @code{NULL} for @var{compiled}; you will still get a | |
860 | meaningful error message, but it might not be as detailed. | |
861 | ||
862 | If the error message can't fit in @var{length} bytes (including a | |
863 | terminating null character), then @code{regerror} truncates it. | |
864 | The string that @code{regerror} stores is always null-terminated | |
865 | even if it has been truncated. | |
866 | ||
867 | The return value of @code{regerror} is the minimum length needed to | |
868 | store the entire error message. If this is less than @var{length}, then | |
869 | the error message was not truncated, and you can use it. Otherwise, you | |
870 | should call @code{regerror} again with a larger buffer. | |
871 | ||
872 | Here is a function which uses @code{regerror}, but always dynamically | |
873 | allocates a buffer for the error message: | |
874 | ||
875 | @smallexample | |
876 | char *get_regerror (int errcode, regex_t *compiled) | |
877 | @{ | |
878 | size_t length = regerror (errcode, compiled, NULL, 0); | |
879 | char *buffer = xmalloc (length); | |
880 | (void) regerror (errcode, compiled, buffer, length); | |
881 | return buffer; | |
882 | @} | |
883 | @end smallexample | |
884 | @end deftypefun | |
885 | ||
886 | @c !!!! this is not actually in the library.... | |
887 | @node Word Expansion | |
888 | @section Shell-Style Word Expansion | |
889 | @cindex word expansion | |
890 | @cindex expansion of shell words | |
891 | ||
6d52618b | 892 | @dfn{Word expansion} means the process of splitting a string into |
28f540f4 RM |
893 | @dfn{words} and substituting for variables, commands, and wildcards |
894 | just as the shell does. | |
895 | ||
896 | For example, when you write @samp{ls -l foo.c}, this string is split | |
897 | into three separate words---@samp{ls}, @samp{-l} and @samp{foo.c}. | |
898 | This is the most basic function of word expansion. | |
899 | ||
900 | When you write @samp{ls *.c}, this can become many words, because | |
901 | the word @samp{*.c} can be replaced with any number of file names. | |
902 | This is called @dfn{wildcard expansion}, and it is also a part of | |
903 | word expansion. | |
904 | ||
905 | When you use @samp{echo $PATH} to print your path, you are taking | |
906 | advantage of @dfn{variable substitution}, which is also part of word | |
907 | expansion. | |
908 | ||
909 | Ordinary programs can perform word expansion just like the shell by | |
910 | calling the library function @code{wordexp}. | |
911 | ||
912 | @menu | |
913 | * Expansion Stages:: What word expansion does to a string. | |
914 | * Calling Wordexp:: How to call @code{wordexp}. | |
915 | * Flags for Wordexp:: Options you can enable in @code{wordexp}. | |
916 | * Wordexp Example:: A sample program that does word expansion. | |
917 | @end menu | |
918 | ||
919 | @node Expansion Stages | |
920 | @subsection The Stages of Word Expansion | |
921 | ||
922 | When word expansion is applied to a sequence of words, it performs the | |
923 | following transformations in the order shown here: | |
924 | ||
925 | @enumerate | |
926 | @item | |
927 | @cindex tilde expansion | |
928 | @dfn{Tilde expansion}: Replacement of @samp{~foo} with the name of | |
929 | the home directory of @samp{foo}. | |
930 | ||
931 | @item | |
932 | Next, three different transformations are applied in the same step, | |
933 | from left to right: | |
934 | ||
935 | @itemize @bullet | |
936 | @item | |
937 | @cindex variable substitution | |
938 | @cindex substitution of variables and commands | |
939 | @dfn{Variable substitution}: Environment variables are substituted for | |
940 | references such as @samp{$foo}. | |
941 | ||
942 | @item | |
943 | @cindex command substitution | |
944 | @dfn{Command substitution}: Constructs such as @w{@samp{`cat foo`}} and | |
945 | the equivalent @w{@samp{$(cat foo)}} are replaced with the output from | |
946 | the inner command. | |
947 | ||
948 | @item | |
949 | @cindex arithmetic expansion | |
950 | @dfn{Arithmetic expansion}: Constructs such as @samp{$(($x-1))} are | |
951 | replaced with the result of the arithmetic computation. | |
952 | @end itemize | |
953 | ||
954 | @item | |
955 | @cindex field splitting | |
956 | @dfn{Field splitting}: subdivision of the text into @dfn{words}. | |
957 | ||
958 | @item | |
959 | @cindex wildcard expansion | |
960 | @dfn{Wildcard expansion}: The replacement of a construct such as @samp{*.c} | |
961 | with a list of @samp{.c} file names. Wildcard expansion applies to an | |
962 | entire word at a time, and replaces that word with 0 or more file names | |
963 | that are themselves words. | |
964 | ||
965 | @item | |
966 | @cindex quote removal | |
967 | @cindex removal of quotes | |
968 | @dfn{Quote removal}: The deletion of string-quotes, now that they have | |
969 | done their job by inhibiting the above transformations when appropriate. | |
970 | @end enumerate | |
971 | ||
972 | For the details of these transformations, and how to write the constructs | |
973 | that use them, see @w{@cite{The BASH Manual}} (to appear). | |
974 | ||
975 | @node Calling Wordexp | |
976 | @subsection Calling @code{wordexp} | |
977 | ||
978 | All the functions, constants and data types for word expansion are | |
979 | declared in the header file @file{wordexp.h}. | |
980 | ||
981 | Word expansion produces a vector of words (strings). To return this | |
982 | vector, @code{wordexp} uses a special data type, @code{wordexp_t}, which | |
983 | is a structure. You pass @code{wordexp} the address of the structure, | |
984 | and it fills in the structure's fields to tell you about the results. | |
985 | ||
986 | @comment wordexp.h | |
987 | @comment POSIX.2 | |
988 | @deftp {Data Type} {wordexp_t} | |
989 | This data type holds a pointer to a word vector. More precisely, it | |
990 | records both the address of the word vector and its size. | |
991 | ||
992 | @table @code | |
993 | @item we_wordc | |
994 | The number of elements in the vector. | |
995 | ||
996 | @item we_wordv | |
997 | The address of the vector. This field has type @w{@code{char **}}. | |
998 | ||
999 | @item we_offs | |
1000 | The offset of the first real element of the vector, from its nominal | |
1001 | address in the @code{we_wordv} field. Unlike the other fields, this | |
1002 | is always an input to @code{wordexp}, rather than an output from it. | |
1003 | ||
1004 | If you use a nonzero offset, then that many elements at the beginning of | |
1005 | the vector are left empty. (The @code{wordexp} function fills them with | |
1006 | null pointers.) | |
1007 | ||
1008 | The @code{we_offs} field is meaningful only if you use the | |
1009 | @code{WRDE_DOOFFS} flag. Otherwise, the offset is always zero | |
1010 | regardless of what is in this field, and the first real element comes at | |
1011 | the beginning of the vector. | |
1012 | @end table | |
1013 | @end deftp | |
1014 | ||
1015 | @comment wordexp.h | |
1016 | @comment POSIX.2 | |
1017 | @deftypefun int wordexp (const char *@var{words}, wordexp_t *@var{word-vector-ptr}, int @var{flags}) | |
1018 | Perform word expansion on the string @var{words}, putting the result in | |
1019 | a newly allocated vector, and store the size and address of this vector | |
1020 | into @code{*@var{word-vector-ptr}}. The argument @var{flags} is a | |
1021 | combination of bit flags; see @ref{Flags for Wordexp}, for details of | |
1022 | the flags. | |
1023 | ||
1024 | You shouldn't use any of the characters @samp{|&;<>} in the string | |
1025 | @var{words} unless they are quoted; likewise for newline. If you use | |
1026 | these characters unquoted, you will get the @code{WRDE_BADCHAR} error | |
1027 | code. Don't use parentheses or braces unless they are quoted or part of | |
1028 | a word expansion construct. If you use quotation characters @samp{'"`}, | |
1029 | they should come in pairs that balance. | |
1030 | ||
1031 | The results of word expansion are a sequence of words. The function | |
1032 | @code{wordexp} allocates a string for each resulting word, then | |
1033 | allocates a vector of type @code{char **} to store the addresses of | |
1034 | these strings. The last element of the vector is a null pointer. | |
1035 | This vector is called the @dfn{word vector}. | |
1036 | ||
1037 | To return this vector, @code{wordexp} stores both its address and its | |
1038 | length (number of elements, not counting the terminating null pointer) | |
1039 | into @code{*@var{word-vector-ptr}}. | |
1040 | ||
1041 | If @code{wordexp} succeeds, it returns 0. Otherwise, it returns one | |
1042 | of these error codes: | |
1043 | ||
1044 | @table @code | |
1045 | @comment wordexp.h | |
1046 | @comment POSIX.2 | |
1047 | @item WRDE_BADCHAR | |
1048 | The input string @var{words} contains an unquoted invalid character such | |
1049 | as @samp{|}. | |
1050 | ||
1051 | @comment wordexp.h | |
1052 | @comment POSIX.2 | |
1053 | @item WRDE_BADVAL | |
1054 | The input string refers to an undefined shell variable, and you used the flag | |
1055 | @code{WRDE_UNDEF} to forbid such references. | |
1056 | ||
1057 | @comment wordexp.h | |
1058 | @comment POSIX.2 | |
1059 | @item WRDE_CMDSUB | |
1060 | The input string uses command substitution, and you used the flag | |
1061 | @code{WRDE_NOCMD} to forbid command substitution. | |
1062 | ||
1063 | @comment wordexp.h | |
1064 | @comment POSIX.2 | |
1065 | @item WRDE_NOSPACE | |
1066 | It was impossible to allocate memory to hold the result. In this case, | |
1067 | @code{wordexp} can store part of the results---as much as it could | |
1068 | allocate room for. | |
1069 | ||
1070 | @comment wordexp.h | |
1071 | @comment POSIX.2 | |
1072 | @item WRDE_SYNTAX | |
1073 | There was a syntax error in the input string. For example, an unmatched | |
1074 | quoting character is a syntax error. | |
1075 | @end table | |
1076 | @end deftypefun | |
1077 | ||
1078 | @comment wordexp.h | |
1079 | @comment POSIX.2 | |
1080 | @deftypefun void wordfree (wordexp_t *@var{word-vector-ptr}) | |
1081 | Free the storage used for the word-strings and vector that | |
1082 | @code{*@var{word-vector-ptr}} points to. This does not free the | |
1083 | structure @code{*@var{word-vector-ptr}} itself---only the other | |
1084 | data it points to. | |
1085 | @end deftypefun | |
1086 | ||
1087 | @node Flags for Wordexp | |
1088 | @subsection Flags for Word Expansion | |
1089 | ||
6d52618b | 1090 | This section describes the flags that you can specify in the |
28f540f4 RM |
1091 | @var{flags} argument to @code{wordexp}. Choose the flags you want, |
1092 | and combine them with the C operator @code{|}. | |
1093 | ||
1094 | @table @code | |
1095 | @comment wordexp.h | |
1096 | @comment POSIX.2 | |
1097 | @item WRDE_APPEND | |
1098 | Append the words from this expansion to the vector of words produced by | |
1099 | previous calls to @code{wordexp}. This way you can effectively expand | |
1100 | several words as if they were concatenated with spaces between them. | |
1101 | ||
1102 | In order for appending to work, you must not modify the contents of the | |
1103 | word vector structure between calls to @code{wordexp}. And, if you set | |
1104 | @code{WRDE_DOOFFS} in the first call to @code{wordexp}, you must also | |
1105 | set it when you append to the results. | |
1106 | ||
1107 | @comment wordexp.h | |
1108 | @comment POSIX.2 | |
1109 | @item WRDE_DOOFFS | |
1110 | Leave blank slots at the beginning of the vector of words. | |
1111 | The @code{we_offs} field says how many slots to leave. | |
1112 | The blank slots contain null pointers. | |
1113 | ||
1114 | @comment wordexp.h | |
1115 | @comment POSIX.2 | |
1116 | @item WRDE_NOCMD | |
1117 | Don't do command substitution; if the input requests command substitution, | |
1118 | report an error. | |
1119 | ||
1120 | @comment wordexp.h | |
1121 | @comment POSIX.2 | |
1122 | @item WRDE_REUSE | |
1123 | Reuse a word vector made by a previous call to @code{wordexp}. | |
1124 | Instead of allocating a new vector of words, this call to @code{wordexp} | |
1125 | will use the vector that already exists (making it larger if necessary). | |
1126 | ||
1127 | Note that the vector may move, so it is not safe to save an old pointer | |
1128 | and use it again after calling @code{wordexp}. You must fetch | |
1129 | @code{we_pathv} anew after each call. | |
1130 | ||
1131 | @comment wordexp.h | |
1132 | @comment POSIX.2 | |
1133 | @item WRDE_SHOWERR | |
1134 | Do show any error messages printed by commands run by command substitution. | |
1135 | More precisely, allow these commands to inherit the standard error output | |
1136 | stream of the current process. By default, @code{wordexp} gives these | |
1137 | commands a standard error stream that discards all output. | |
1138 | ||
1139 | @comment wordexp.h | |
1140 | @comment POSIX.2 | |
1141 | @item WRDE_UNDEF | |
1142 | If the input refers to a shell variable that is not defined, report an | |
1143 | error. | |
1144 | @end table | |
1145 | ||
1146 | @node Wordexp Example | |
1147 | @subsection @code{wordexp} Example | |
1148 | ||
1149 | Here is an example of using @code{wordexp} to expand several strings | |
1150 | and use the results to run a shell command. It also shows the use of | |
1151 | @code{WRDE_APPEND} to concatenate the expansions and of @code{wordfree} | |
1152 | to free the space allocated by @code{wordexp}. | |
1153 | ||
1154 | @smallexample | |
1155 | int | |
1156 | expand_and_execute (const char *program, const char *options) | |
1157 | @{ | |
1158 | wordexp_t result; | |
1159 | pid_t pid | |
1160 | int status, i; | |
1161 | ||
1162 | /* @r{Expand the string for the program to run.} */ | |
1163 | switch (wordexp (program, &result, 0)) | |
1164 | @{ | |
1165 | case 0: /* @r{Successful}. */ | |
1166 | break; | |
1167 | case WRDE_NOSPACE: | |
1168 | /* @r{If the error was @code{WRDE_NOSPACE},} | |
1169 | @r{then perhaps part of the result was allocated.} */ | |
1170 | wordfree (&result); | |
1171 | default: /* @r{Some other error.} */ | |
1172 | return -1; | |
1173 | @} | |
1174 | ||
1175 | /* @r{Expand the strings specified for the arguments.} */ | |
1176 | for (i = 0; args[i]; i++) | |
1177 | @{ | |
1178 | if (wordexp (options, &result, WRDE_APPEND)) | |
1179 | @{ | |
1180 | wordfree (&result); | |
1181 | return -1; | |
1182 | @} | |
1183 | @} | |
1184 | ||
1185 | pid = fork (); | |
1186 | if (pid == 0) | |
1187 | @{ | |
1188 | /* @r{This is the child process. Execute the command.} */ | |
1189 | execv (result.we_wordv[0], result.we_wordv); | |
1190 | exit (EXIT_FAILURE); | |
1191 | @} | |
1192 | else if (pid < 0) | |
1193 | /* @r{The fork failed. Report failure.} */ | |
1194 | status = -1; | |
1195 | else | |
1196 | /* @r{This is the parent process. Wait for the child to complete.} */ | |
1197 | if (waitpid (pid, &status, 0) != pid) | |
1198 | status = -1; | |
1199 | ||
1200 | wordfree (&result); | |
1201 | return status; | |
1202 | @} | |
1203 | @end smallexample | |
1204 | ||
28f540f4 RM |
1205 | |
1206 | @c No sense finishing this for here. | |
1207 | @ignore | |
1208 | @node Tilde Expansion | |
1209 | @subsection Details of Tilde Expansion | |
1210 | ||
1211 | It's a standard part of shell syntax that you can use @samp{~} at the | |
1212 | beginning of a file name to stand for your own home directory. You | |
1213 | can use @samp{~@var{user}} to stand for @var{user}'s home directory. | |
1214 | ||
1215 | @dfn{Tilde expansion} is the process of converting these abbreviations | |
1216 | to the directory names that they stand for. | |
1217 | ||
1218 | Tilde expansion applies to the @samp{~} plus all following characters up | |
1219 | to whitespace or a slash. It takes place only at the beginning of a | |
1220 | word, and only if none of the characters to be transformed is quoted in | |
1221 | any way. | |
1222 | ||
1223 | Plain @samp{~} uses the value of the environment variable @code{HOME} | |
1224 | as the proper home directory name. @samp{~} followed by a user name | |
1225 | uses @code{getpwname} to look up that user in the user database, and | |
1226 | uses whatever directory is recorded there. Thus, @samp{~} followed | |
1227 | by your own name can give different results from plain @samp{~}, if | |
1228 | the value of @code{HOME} is not really your home directory. | |
1229 | ||
1230 | @node Variable Substitution | |
1231 | @subsection Details of Variable Substitution | |
1232 | ||
1233 | Part of ordinary shell syntax is the use of @samp{$@var{variable}} to | |
1234 | substitute the value of a shell variable into a command. This is called | |
1235 | @dfn{variable substitution}, and it is one part of doing word expansion. | |
1236 | ||
1237 | There are two basic ways you can write a variable reference for | |
1238 | substitution: | |
1239 | ||
1240 | @table @code | |
1241 | @item $@{@var{variable}@} | |
1242 | If you write braces around the variable name, then it is completely | |
1243 | unambiguous where the variable name ends. You can concatenate | |
1244 | additional letters onto the end of the variable value by writing them | |
1245 | immediately after the close brace. For example, @samp{$@{foo@}s} | |
1246 | expands into @samp{tractors}. | |
1247 | ||
1248 | @item $@var{variable} | |
1249 | If you do not put braces around the variable name, then the variable | |
1250 | name consists of all the alphanumeric characters and underscores that | |
1251 | follow the @samp{$}. The next punctuation character ends the variable | |
1252 | name. Thus, @samp{$foo-bar} refers to the variable @code{foo} and expands | |
1253 | into @samp{tractor-bar}. | |
1254 | @end table | |
1255 | ||
1256 | When you use braces, you can also use various constructs to modify the | |
1257 | value that is substituted, or test it in various ways. | |
1258 | ||
1259 | @table @code | |
1260 | @item $@{@var{variable}:-@var{default}@} | |
1261 | Substitute the value of @var{variable}, but if that is empty or | |
1262 | undefined, use @var{default} instead. | |
1263 | ||
1264 | @item $@{@var{variable}:=@var{default}@} | |
1265 | Substitute the value of @var{variable}, but if that is empty or | |
1266 | undefined, use @var{default} instead and set the variable to | |
1267 | @var{default}. | |
1268 | ||
1269 | @item $@{@var{variable}:?@var{message}@} | |
1270 | If @var{variable} is defined and not empty, substitute its value. | |
1271 | ||
1272 | Otherwise, print @var{message} as an error message on the standard error | |
1273 | stream, and consider word expansion a failure. | |
1274 | ||
1275 | @c ??? How does wordexp report such an error? | |
1276 | ||
1277 | @item $@{@var{variable}:+@var{replacement}@} | |
1278 | Substitute @var{replacement}, but only if @var{variable} is defined and | |
1279 | nonempty. Otherwise, substitute nothing for this construct. | |
1280 | @end table | |
1281 | ||
1282 | @table @code | |
1283 | @item $@{#@var{variable}@} | |
1284 | Substitute a numeral which expresses in base ten the number of | |
1285 | characters in the value of @var{variable}. @samp{$@{#foo@}} stands for | |
1286 | @samp{7}, because @samp{tractor} is seven characters. | |
1287 | @end table | |
1288 | ||
1289 | These variants of variable substitution let you remove part of the | |
6d52618b | 1290 | variable's value before substituting it. The @var{prefix} and |
28f540f4 RM |
1291 | @var{suffix} are not mere strings; they are wildcard patterns, just |
1292 | like the patterns that you use to match multiple file names. But | |
1293 | in this context, they match against parts of the variable value | |
1294 | rather than against file names. | |
1295 | ||
1296 | @table @code | |
1297 | @item $@{@var{variable}%%@var{suffix}@} | |
1298 | Substitute the value of @var{variable}, but first discard from that | |
1299 | variable any portion at the end that matches the pattern @var{suffix}. | |
1300 | ||
1301 | If there is more than one alternative for how to match against | |
1302 | @var{suffix}, this construct uses the longest possible match. | |
1303 | ||
1304 | Thus, @samp{$@{foo%%r*@}} substitutes @samp{t}, because the largest | |
1305 | match for @samp{r*} at the end of @samp{tractor} is @samp{ractor}. | |
1306 | ||
1307 | @item $@{@var{variable}%@var{suffix}@} | |
1308 | Substitute the value of @var{variable}, but first discard from that | |
1309 | variable any portion at the end that matches the pattern @var{suffix}. | |
1310 | ||
1311 | If there is more than one alternative for how to match against | |
1312 | @var{suffix}, this construct uses the shortest possible alternative. | |
1313 | ||
1314 | Thus, @samp{$@{foo%%r*@}} substitutes @samp{tracto}, because the shortest | |
1315 | match for @samp{r*} at the end of @samp{tractor} is just @samp{r}. | |
1316 | ||
1317 | @item $@{@var{variable}##@var{prefix}@} | |
1318 | Substitute the value of @var{variable}, but first discard from that | |
1319 | variable any portion at the beginning that matches the pattern @var{prefix}. | |
1320 | ||
1321 | If there is more than one alternative for how to match against | |
1322 | @var{prefix}, this construct uses the longest possible match. | |
1323 | ||
1324 | Thus, @samp{$@{foo%%r*@}} substitutes @samp{t}, because the largest | |
1325 | match for @samp{r*} at the end of @samp{tractor} is @samp{ractor}. | |
1326 | ||
1327 | @item $@{@var{variable}#@var{prefix}@} | |
1328 | Substitute the value of @var{variable}, but first discard from that | |
1329 | variable any portion at the beginning that matches the pattern @var{prefix}. | |
1330 | ||
1331 | If there is more than one alternative for how to match against | |
1332 | @var{prefix}, this construct uses the shortest possible alternative. | |
1333 | ||
1334 | Thus, @samp{$@{foo%%r*@}} substitutes @samp{tracto}, because the shortest | |
1335 | match for @samp{r*} at the end of @samp{tractor} is just @samp{r}. | |
1336 | ||
1337 | @end ignore |