]>
Commit | Line | Data |
---|---|---|
28f540f4 | 1 | @node Pattern Matching, I/O Overview, Searching and Sorting, Top |
7a68c94a | 2 | @c %MENU% Matching shell ``globs'' and regular expressions |
28f540f4 RM |
3 | @chapter Pattern Matching |
4 | ||
1f77f049 | 5 | @Theglibc{} provides pattern matching facilities for two kinds of |
28f540f4 RM |
6 | patterns: regular expressions and file-name wildcards. The library also |
7 | provides a facility for expanding variable and command references and | |
8 | parsing text into words in the way the shell does. | |
9 | ||
10 | @menu | |
11 | * Wildcard Matching:: Matching a wildcard pattern against a single string. | |
12 | * Globbing:: Finding the files that match a wildcard pattern. | |
13 | * Regular Expressions:: Matching regular expressions against strings. | |
14 | * Word Expansion:: Expanding shell variables, nested commands, | |
15 | arithmetic, and wildcards. | |
16 | This is what the shell does with shell commands. | |
17 | @end menu | |
18 | ||
19 | @node Wildcard Matching | |
20 | @section Wildcard Matching | |
21 | ||
22 | @pindex fnmatch.h | |
23 | This section describes how to match a wildcard pattern against a | |
24 | particular string. The result is a yes or no answer: does the | |
25 | string fit the pattern or not. The symbols described here are all | |
26 | declared in @file{fnmatch.h}. | |
27 | ||
28f540f4 | 28 | @deftypefun int fnmatch (const char *@var{pattern}, const char *@var{string}, int @var{flags}) |
d08a7e4c | 29 | @standards{POSIX.2, fnmatch.h} |
03483ada AO |
30 | @safety{@prelim{}@mtsafe{@mtsenv{} @mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
31 | @c fnmatch @mtsenv @mtslocale @ascuheap @acsmem | |
32 | @c strnlen dup ok | |
33 | @c mbsrtowcs | |
34 | @c memset dup ok | |
35 | @c malloc dup @ascuheap @acsmem | |
36 | @c mbsinit dup ok | |
37 | @c free dup @ascuheap @acsmem | |
38 | @c FCT = internal_fnwmatch @mtsenv @mtslocale @ascuheap @acsmem | |
39 | @c FOLD @mtslocale | |
40 | @c towlower @mtslocale | |
41 | @c EXT @mtsenv @mtslocale @ascuheap @acsmem | |
42 | @c STRLEN = wcslen dup ok | |
43 | @c getenv @mtsenv | |
44 | @c malloc dup @ascuheap @acsmem | |
45 | @c MEMPCPY = wmempcpy dup ok | |
46 | @c FCT dup @mtsenv @mtslocale @ascuheap @acsmem | |
47 | @c STRCAT = wcscat dup ok | |
48 | @c free dup @ascuheap @acsmem | |
49 | @c END @mtsenv | |
50 | @c getenv @mtsenv | |
51 | @c MEMCHR = wmemchr dup ok | |
52 | @c getenv @mtsenv | |
53 | @c IS_CHAR_CLASS = is_char_class @mtslocale | |
54 | @c wctype @mtslocale | |
55 | @c BTOWC ok | |
56 | @c ISWCTYPE ok | |
57 | @c auto findidx dup ok | |
58 | @c elem_hash dup ok | |
59 | @c memcmp dup ok | |
60 | @c collseq_table_lookup dup ok | |
61 | @c NO_LEADING_PERIOD ok | |
28f540f4 RM |
62 | This function tests whether the string @var{string} matches the pattern |
63 | @var{pattern}. It returns @code{0} if they do match; otherwise, it | |
64 | returns the nonzero value @code{FNM_NOMATCH}. The arguments | |
65 | @var{pattern} and @var{string} are both strings. | |
66 | ||
67 | The argument @var{flags} is a combination of flag bits that alter the | |
68 | details of matching. See below for a list of the defined flags. | |
69 | ||
ca42d35e OB |
70 | In @theglibc{}, @code{fnmatch} might sometimes report ``errors'' by |
71 | returning nonzero values that are not equal to @code{FNM_NOMATCH}. | |
28f540f4 RM |
72 | @end deftypefun |
73 | ||
74 | These are the available flags for the @var{flags} argument: | |
75 | ||
a449fc68 | 76 | @vtable @code |
28f540f4 | 77 | @item FNM_FILE_NAME |
d08a7e4c | 78 | @standards{GNU, fnmatch.h} |
28f540f4 RM |
79 | Treat the @samp{/} character specially, for matching file names. If |
80 | this flag is set, wildcard constructs in @var{pattern} cannot match | |
81 | @samp{/} in @var{string}. Thus, the only way to match @samp{/} is with | |
82 | an explicit @samp{/} in @var{pattern}. | |
83 | ||
28f540f4 | 84 | @item FNM_PATHNAME |
d08a7e4c | 85 | @standards{POSIX.2, fnmatch.h} |
28f540f4 RM |
86 | This is an alias for @code{FNM_FILE_NAME}; it comes from POSIX.2. We |
87 | don't recommend this name because we don't use the term ``pathname'' for | |
88 | file names. | |
89 | ||
28f540f4 | 90 | @item FNM_PERIOD |
d08a7e4c | 91 | @standards{POSIX.2, fnmatch.h} |
28f540f4 RM |
92 | Treat the @samp{.} character specially if it appears at the beginning of |
93 | @var{string}. If this flag is set, wildcard constructs in @var{pattern} | |
94 | cannot match @samp{.} as the first character of @var{string}. | |
95 | ||
96 | If you set both @code{FNM_PERIOD} and @code{FNM_FILE_NAME}, then the | |
97 | special treatment applies to @samp{.} following @samp{/} as well as to | |
98 | @samp{.} at the beginning of @var{string}. (The shell uses the | |
6952e59e | 99 | @code{FNM_PERIOD} and @code{FNM_FILE_NAME} flags together for matching |
28f540f4 RM |
100 | file names.) |
101 | ||
28f540f4 | 102 | @item FNM_NOESCAPE |
d08a7e4c | 103 | @standards{POSIX.2, fnmatch.h} |
28f540f4 RM |
104 | Don't treat the @samp{\} character specially in patterns. Normally, |
105 | @samp{\} quotes the following character, turning off its special meaning | |
106 | (if any) so that it matches only itself. When quoting is enabled, the | |
107 | pattern @samp{\?} matches only the string @samp{?}, because the question | |
108 | mark in the pattern acts like an ordinary character. | |
109 | ||
110 | If you use @code{FNM_NOESCAPE}, then @samp{\} is an ordinary character. | |
111 | ||
28f540f4 | 112 | @item FNM_LEADING_DIR |
d08a7e4c | 113 | @standards{GNU, fnmatch.h} |
28f540f4 RM |
114 | Ignore a trailing sequence of characters starting with a @samp{/} in |
115 | @var{string}; that is to say, test whether @var{string} starts with a | |
116 | directory name that @var{pattern} matches. | |
117 | ||
118 | If this flag is set, either @samp{foo*} or @samp{foobar} as a pattern | |
119 | would match the string @samp{foobar/frobozz}. | |
120 | ||
28f540f4 | 121 | @item FNM_CASEFOLD |
d08a7e4c | 122 | @standards{GNU, fnmatch.h} |
28f540f4 | 123 | Ignore case in comparing @var{string} to @var{pattern}. |
821a6bb4 | 124 | |
821a6bb4 | 125 | @item FNM_EXTMATCH |
d08a7e4c | 126 | @standards{GNU, fnmatch.h} |
821a6bb4 UD |
127 | @cindex Korn Shell |
128 | @pindex ksh | |
f45eb078 | 129 | Besides the normal patterns, also recognize the extended patterns |
821a6bb4 UD |
130 | introduced in @file{ksh}. The patterns are written in the form |
131 | explained in the following table where @var{pattern-list} is a @code{|} | |
132 | separated list of patterns. | |
133 | ||
134 | @table @code | |
135 | @item ?(@var{pattern-list}) | |
0bc93a2f | 136 | The pattern matches if zero or one occurrences of any of the patterns |
821a6bb4 UD |
137 | in the @var{pattern-list} allow matching the input string. |
138 | ||
139 | @item *(@var{pattern-list}) | |
0bc93a2f | 140 | The pattern matches if zero or more occurrences of any of the patterns |
821a6bb4 UD |
141 | in the @var{pattern-list} allow matching the input string. |
142 | ||
143 | @item +(@var{pattern-list}) | |
0bc93a2f | 144 | The pattern matches if one or more occurrences of any of the patterns |
821a6bb4 UD |
145 | in the @var{pattern-list} allow matching the input string. |
146 | ||
147 | @item @@(@var{pattern-list}) | |
0bc93a2f | 148 | The pattern matches if exactly one occurrence of any of the patterns in |
821a6bb4 UD |
149 | the @var{pattern-list} allows matching the input string. |
150 | ||
151 | @item !(@var{pattern-list}) | |
152 | The pattern matches if the input string cannot be matched with any of | |
153 | the patterns in the @var{pattern-list}. | |
154 | @end table | |
a449fc68 | 155 | @end vtable |
28f540f4 RM |
156 | |
157 | @node Globbing | |
158 | @section Globbing | |
159 | ||
160 | @cindex globbing | |
161 | The archetypal use of wildcards is for matching against the files in a | |
162 | directory, and making a list of all the matches. This is called | |
163 | @dfn{globbing}. | |
164 | ||
165 | You could do this using @code{fnmatch}, by reading the directory entries | |
166 | one by one and testing each one with @code{fnmatch}. But that would be | |
167 | slow (and complex, since you would have to handle subdirectories by | |
168 | hand). | |
169 | ||
170 | The library provides a function @code{glob} to make this particular use | |
171 | of wildcards convenient. @code{glob} and the other symbols in this | |
172 | section are declared in @file{glob.h}. | |
173 | ||
174 | @menu | |
714a562f UD |
175 | * Calling Glob:: Basic use of @code{glob}. |
176 | * Flags for Globbing:: Flags that enable various options in @code{glob}. | |
177 | * More Flags for Globbing:: GNU specific extensions to @code{glob}. | |
28f540f4 RM |
178 | @end menu |
179 | ||
180 | @node Calling Glob | |
181 | @subsection Calling @code{glob} | |
182 | ||
183 | The result of globbing is a vector of file names (strings). To return | |
184 | this vector, @code{glob} uses a special data type, @code{glob_t}, which | |
185 | is a structure. You pass @code{glob} the address of the structure, and | |
186 | it fills in the structure's fields to tell you about the results. | |
187 | ||
28f540f4 | 188 | @deftp {Data Type} glob_t |
d08a7e4c | 189 | @standards{POSIX.2, glob.h} |
28f540f4 | 190 | This data type holds a pointer to a word vector. More precisely, it |
714a562f UD |
191 | records both the address of the word vector and its size. The GNU |
192 | implementation contains some more fields which are non-standard | |
193 | extensions. | |
28f540f4 RM |
194 | |
195 | @table @code | |
196 | @item gl_pathc | |
460adbb8 UD |
197 | The number of elements in the vector, excluding the initial null entries |
198 | if the GLOB_DOOFFS flag is used (see gl_offs below). | |
28f540f4 RM |
199 | |
200 | @item gl_pathv | |
201 | The address of the vector. This field has type @w{@code{char **}}. | |
202 | ||
203 | @item gl_offs | |
204 | The offset of the first real element of the vector, from its nominal | |
205 | address in the @code{gl_pathv} field. Unlike the other fields, this | |
206 | is always an input to @code{glob}, rather than an output from it. | |
207 | ||
208 | If you use a nonzero offset, then that many elements at the beginning of | |
209 | the vector are left empty. (The @code{glob} function fills them with | |
210 | null pointers.) | |
211 | ||
212 | The @code{gl_offs} field is meaningful only if you use the | |
213 | @code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero | |
214 | regardless of what is in this field, and the first real element comes at | |
215 | the beginning of the vector. | |
714a562f UD |
216 | |
217 | @item gl_closedir | |
218 | The address of an alternative implementation of the @code{closedir} | |
219 | function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in | |
220 | the flag parameter. The type of this field is | |
221 | @w{@code{void (*) (void *)}}. | |
222 | ||
223 | This is a GNU extension. | |
224 | ||
225 | @item gl_readdir | |
226 | The address of an alternative implementation of the @code{readdir} | |
227 | function used to read the contents of a directory. It is used if the | |
228 | @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of | |
229 | this field is @w{@code{struct dirent *(*) (void *)}}. | |
230 | ||
137fe72e FW |
231 | An implementation of @code{gl_readdir} needs to initialize the following |
232 | members of the @code{struct dirent} object: | |
233 | ||
234 | @table @code | |
235 | @item d_type | |
236 | This member should be set to the file type of the entry if it is known. | |
237 | Otherwise, the value @code{DT_UNKNOWN} can be used. The @code{glob} | |
238 | function may use the specified file type to avoid callbacks in cases | |
239 | where the file type indicates that the data is not required. | |
240 | ||
241 | @item d_ino | |
242 | This member needs to be non-zero, otherwise @code{glob} may skip the | |
243 | current entry and call the @code{gl_readdir} callback function again to | |
244 | retrieve another entry. | |
245 | ||
246 | @item d_name | |
247 | This member must be set to the name of the entry. It must be | |
248 | null-terminated. | |
249 | @end table | |
250 | ||
251 | The example below shows how to allocate a @code{struct dirent} object | |
252 | containing a given name. | |
253 | ||
254 | @smallexample | |
255 | @include mkdirent.c.texi | |
256 | @end smallexample | |
257 | ||
258 | The @code{glob} function reads the @code{struct dirent} members listed | |
259 | above and makes a copy of the file name in the @code{d_name} member | |
260 | immediately after the @code{gl_readdir} callback function returns. | |
261 | Future invocations of any of the callback functions may dealloacte or | |
262 | reuse the buffer. It is the responsibility of the caller of the | |
263 | @code{glob} function to allocate and deallocate the buffer, around the | |
264 | call to @code{glob} or using the callback functions. For example, an | |
265 | application could allocate the buffer in the @code{gl_readdir} callback | |
266 | function, and deallocate it in the @code{gl_closedir} callback function. | |
267 | ||
268 | The @code{gl_readdir} member is a GNU extension. | |
714a562f UD |
269 | |
270 | @item gl_opendir | |
271 | The address of an alternative implementation of the @code{opendir} | |
272 | function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in | |
273 | the flag parameter. The type of this field is | |
274 | @w{@code{void *(*) (const char *)}}. | |
275 | ||
276 | This is a GNU extension. | |
277 | ||
278 | @item gl_stat | |
279 | The address of an alternative implementation of the @code{stat} function | |
280 | to get information about an object in the filesystem. It is used if the | |
281 | @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of | |
282 | this field is @w{@code{int (*) (const char *, struct stat *)}}. | |
283 | ||
284 | This is a GNU extension. | |
285 | ||
286 | @item gl_lstat | |
287 | The address of an alternative implementation of the @code{lstat} | |
288 | function to get information about an object in the filesystems, not | |
289 | following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit | |
838e5ffe UD |
290 | is set in the flag parameter. The type of this field is @code{@w{int |
291 | (*) (const char *,} @w{struct stat *)}}. | |
714a562f | 292 | |
0428cec9 MF |
293 | This is a GNU extension. |
294 | ||
295 | @item gl_flags | |
296 | The flags used when @code{glob} was called. In addition, @code{GLOB_MAGCHAR} | |
297 | might be set. See @ref{Flags for Globbing} for more details. | |
298 | ||
714a562f | 299 | This is a GNU extension. |
28f540f4 RM |
300 | @end table |
301 | @end deftp | |
302 | ||
0fc95b82 UD |
303 | For use in the @code{glob64} function @file{glob.h} contains another |
304 | definition for a very similar type. @code{glob64_t} differs from | |
305 | @code{glob_t} only in the types of the members @code{gl_readdir}, | |
306 | @code{gl_stat}, and @code{gl_lstat}. | |
307 | ||
0fc95b82 | 308 | @deftp {Data Type} glob64_t |
d08a7e4c | 309 | @standards{GNU, glob.h} |
0fc95b82 UD |
310 | This data type holds a pointer to a word vector. More precisely, it |
311 | records both the address of the word vector and its size. The GNU | |
312 | implementation contains some more fields which are non-standard | |
313 | extensions. | |
314 | ||
315 | @table @code | |
316 | @item gl_pathc | |
317 | The number of elements in the vector, excluding the initial null entries | |
318 | if the GLOB_DOOFFS flag is used (see gl_offs below). | |
319 | ||
320 | @item gl_pathv | |
321 | The address of the vector. This field has type @w{@code{char **}}. | |
322 | ||
323 | @item gl_offs | |
324 | The offset of the first real element of the vector, from its nominal | |
325 | address in the @code{gl_pathv} field. Unlike the other fields, this | |
326 | is always an input to @code{glob}, rather than an output from it. | |
327 | ||
328 | If you use a nonzero offset, then that many elements at the beginning of | |
329 | the vector are left empty. (The @code{glob} function fills them with | |
330 | null pointers.) | |
331 | ||
332 | The @code{gl_offs} field is meaningful only if you use the | |
333 | @code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero | |
334 | regardless of what is in this field, and the first real element comes at | |
335 | the beginning of the vector. | |
336 | ||
337 | @item gl_closedir | |
338 | The address of an alternative implementation of the @code{closedir} | |
339 | function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in | |
340 | the flag parameter. The type of this field is | |
341 | @w{@code{void (*) (void *)}}. | |
342 | ||
343 | This is a GNU extension. | |
344 | ||
345 | @item gl_readdir | |
346 | The address of an alternative implementation of the @code{readdir64} | |
347 | function used to read the contents of a directory. It is used if the | |
348 | @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of | |
349 | this field is @w{@code{struct dirent64 *(*) (void *)}}. | |
350 | ||
351 | This is a GNU extension. | |
352 | ||
353 | @item gl_opendir | |
354 | The address of an alternative implementation of the @code{opendir} | |
355 | function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in | |
356 | the flag parameter. The type of this field is | |
357 | @w{@code{void *(*) (const char *)}}. | |
358 | ||
359 | This is a GNU extension. | |
360 | ||
361 | @item gl_stat | |
362 | The address of an alternative implementation of the @code{stat64} function | |
363 | to get information about an object in the filesystem. It is used if the | |
364 | @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of | |
365 | this field is @w{@code{int (*) (const char *, struct stat64 *)}}. | |
366 | ||
367 | This is a GNU extension. | |
368 | ||
369 | @item gl_lstat | |
370 | The address of an alternative implementation of the @code{lstat64} | |
371 | function to get information about an object in the filesystems, not | |
372 | following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit | |
373 | is set in the flag parameter. The type of this field is @code{@w{int | |
374 | (*) (const char *,} @w{struct stat64 *)}}. | |
375 | ||
0428cec9 MF |
376 | This is a GNU extension. |
377 | ||
378 | @item gl_flags | |
379 | The flags used when @code{glob} was called. In addition, @code{GLOB_MAGCHAR} | |
380 | might be set. See @ref{Flags for Globbing} for more details. | |
381 | ||
0fc95b82 UD |
382 | This is a GNU extension. |
383 | @end table | |
384 | @end deftp | |
385 | ||
28f540f4 | 386 | @deftypefun int glob (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob_t *@var{vector-ptr}) |
d08a7e4c | 387 | @standards{POSIX.2, glob.h} |
03483ada AO |
388 | @safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @ascuplugin{} @asucorrupt{} @ascuheap{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
389 | @c glob @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @asucorrupt @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
390 | @c strlen dup ok | |
391 | @c strchr dup ok | |
392 | @c malloc dup @ascuheap @acsmem | |
393 | @c mempcpy dup ok | |
394 | @c next_brace_sub ok | |
395 | @c free dup @ascuheap @acsmem | |
396 | @c globfree dup @asucorrupt @ascuheap @acucorrupt @acsmem | |
397 | @c glob_pattern_p ok | |
398 | @c glob_pattern_type dup ok | |
399 | @c getenv dup @mtsenv | |
400 | @c GET_LOGIN_NAME_MAX ok | |
401 | @c getlogin_r dup @mtasurace:utent @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
402 | @c GETPW_R_SIZE_MAX ok | |
403 | @c getpwnam_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
404 | @c realloc dup @ascuheap @acsmem | |
405 | @c memcpy dup ok | |
406 | @c memchr dup ok | |
407 | @c *pglob->gl_stat user-supplied | |
408 | @c stat64 dup ok | |
409 | @c S_ISDIR dup ok | |
410 | @c strdup dup @ascuheap @acsmem | |
411 | @c glob_pattern_type ok | |
412 | @c glob_in_dir @mtsenv @mtslocale @asucorrupt @ascuheap @acucorrupt @acsfd @acsmem | |
413 | @c strlen dup ok | |
414 | @c glob_pattern_type dup ok | |
415 | @c malloc dup @ascuheap @acsmem | |
416 | @c mempcpy dup ok | |
417 | @c *pglob->gl_stat user-supplied | |
418 | @c stat64 dup ok | |
419 | @c free dup @ascuheap @acsmem | |
420 | @c *pglob->gl_opendir user-supplied | |
421 | @c opendir dup @ascuheap @acsmem @acsfd | |
422 | @c dirfd dup ok | |
423 | @c *pglob->gl_readdir user-supplied | |
424 | @c CONVERT_DIRENT_DIRENT64 ok | |
425 | @c readdir64 ok [protected by exclusive use of the stream] | |
426 | @c REAL_DIR_ENTRY ok | |
427 | @c DIRENT_MIGHT_BE_DIR ok | |
428 | @c fnmatch dup @mtsenv @mtslocale @ascuheap @acsmem | |
429 | @c DIRENT_MIGHT_BE_SYMLINK ok | |
430 | @c link_exists_p ok | |
431 | @c link_exists2_p ok | |
432 | @c strlen dup ok | |
433 | @c mempcpy dup ok | |
434 | @c *pglob->gl_stat user-supplied | |
435 | @c fxstatat64 dup ok | |
436 | @c realloc dup @ascuheap @acsmem | |
437 | @c pglob->gl_closedir user-supplied | |
438 | @c closedir @ascuheap @acsmem @acsfd | |
439 | @c prefix_array dup @asucorrupt @ascuheap @acucorrupt @acsmem | |
440 | @c strlen dup ok | |
441 | @c malloc dup @ascuheap @acsmem | |
442 | @c free dup @ascuheap @acsmem | |
443 | @c mempcpy dup ok | |
444 | @c strcpy dup ok | |
28f540f4 RM |
445 | The function @code{glob} does globbing using the pattern @var{pattern} |
446 | in the current directory. It puts the result in a newly allocated | |
447 | vector, and stores the size and address of this vector into | |
448 | @code{*@var{vector-ptr}}. The argument @var{flags} is a combination of | |
449 | bit flags; see @ref{Flags for Globbing}, for details of the flags. | |
450 | ||
451 | The result of globbing is a sequence of file names. The function | |
452 | @code{glob} allocates a string for each resulting word, then | |
453 | allocates a vector of type @code{char **} to store the addresses of | |
454 | these strings. The last element of the vector is a null pointer. | |
455 | This vector is called the @dfn{word vector}. | |
456 | ||
457 | To return this vector, @code{glob} stores both its address and its | |
458 | length (number of elements, not counting the terminating null pointer) | |
459 | into @code{*@var{vector-ptr}}. | |
460 | ||
6d52618b | 461 | Normally, @code{glob} sorts the file names alphabetically before |
28f540f4 RM |
462 | returning them. You can turn this off with the flag @code{GLOB_NOSORT} |
463 | if you want to get the information as fast as possible. Usually it's | |
464 | a good idea to let @code{glob} sort them---if you process the files in | |
465 | alphabetical order, the users will have a feel for the rate of progress | |
466 | that your application is making. | |
467 | ||
468 | If @code{glob} succeeds, it returns 0. Otherwise, it returns one | |
469 | of these error codes: | |
470 | ||
0fc95b82 | 471 | @vtable @code |
28f540f4 | 472 | @item GLOB_ABORTED |
d08a7e4c | 473 | @standards{POSIX.2, glob.h} |
28f540f4 RM |
474 | There was an error opening a directory, and you used the flag |
475 | @code{GLOB_ERR} or your specified @var{errfunc} returned a nonzero | |
476 | value. | |
477 | @iftex | |
478 | See below | |
479 | @end iftex | |
480 | @ifinfo | |
481 | @xref{Flags for Globbing}, | |
482 | @end ifinfo | |
483 | for an explanation of the @code{GLOB_ERR} flag and @var{errfunc}. | |
484 | ||
28f540f4 | 485 | @item GLOB_NOMATCH |
d08a7e4c | 486 | @standards{POSIX.2, glob.h} |
28f540f4 RM |
487 | The pattern didn't match any existing files. If you use the |
488 | @code{GLOB_NOCHECK} flag, then you never get this error code, because | |
489 | that flag tells @code{glob} to @emph{pretend} that the pattern matched | |
490 | at least one file. | |
491 | ||
28f540f4 | 492 | @item GLOB_NOSPACE |
d08a7e4c | 493 | @standards{POSIX.2, glob.h} |
28f540f4 | 494 | It was impossible to allocate memory to hold the result. |
0fc95b82 | 495 | @end vtable |
28f540f4 RM |
496 | |
497 | In the event of an error, @code{glob} stores information in | |
498 | @code{*@var{vector-ptr}} about all the matches it has found so far. | |
0fc95b82 | 499 | |
0bc93a2f | 500 | It is important to notice that the @code{glob} function will not fail if |
0fc95b82 UD |
501 | it encounters directories or files which cannot be handled without the |
502 | LFS interfaces. The implementation of @code{glob} is supposed to use | |
f45eb078 RJ |
503 | these functions internally. This at least is the assumption made by |
504 | the Unix standard. The GNU extension of allowing the user to provide their | |
0fc95b82 UD |
505 | own directory handling and @code{stat} functions complicates things a |
506 | bit. If these callback functions are used and a large file or directory | |
507 | is encountered @code{glob} @emph{can} fail. | |
508 | @end deftypefun | |
509 | ||
0fc95b82 | 510 | @deftypefun int glob64 (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob64_t *@var{vector-ptr}) |
d08a7e4c | 511 | @standards{GNU, glob.h} |
03483ada AO |
512 | @safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @asucorrupt{} @ascuheap{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
513 | @c Same code as glob, but with glob64_t #defined as glob_t. | |
0fc95b82 UD |
514 | The @code{glob64} function was added as part of the Large File Summit |
515 | extensions but is not part of the original LFS proposal. The reason for | |
516 | this is simple: it is not necessary. The necessity for a @code{glob64} | |
517 | function is added by the extensions of the GNU @code{glob} | |
f45eb078 | 518 | implementation which allows the user to provide their own directory handling |
0fc95b82 UD |
519 | and @code{stat} functions. The @code{readdir} and @code{stat} functions |
520 | do depend on the choice of @code{_FILE_OFFSET_BITS} since the definition | |
521 | of the types @code{struct dirent} and @code{struct stat} will change | |
522 | depending on the choice. | |
523 | ||
f45eb078 | 524 | Besides this difference, @code{glob64} works just like @code{glob} in |
0fc95b82 UD |
525 | all aspects. |
526 | ||
527 | This function is a GNU extension. | |
28f540f4 RM |
528 | @end deftypefun |
529 | ||
530 | @node Flags for Globbing | |
531 | @subsection Flags for Globbing | |
532 | ||
e557e9e5 | 533 | This section describes the standard flags that you can specify in the |
28f540f4 RM |
534 | @var{flags} argument to @code{glob}. Choose the flags you want, |
535 | and combine them with the C bitwise OR operator @code{|}. | |
536 | ||
e557e9e5 MF |
537 | Note that there are @ref{More Flags for Globbing} available as GNU extensions. |
538 | ||
0fc95b82 | 539 | @vtable @code |
28f540f4 | 540 | @item GLOB_APPEND |
d08a7e4c | 541 | @standards{POSIX.2, glob.h} |
28f540f4 RM |
542 | Append the words from this expansion to the vector of words produced by |
543 | previous calls to @code{glob}. This way you can effectively expand | |
544 | several words as if they were concatenated with spaces between them. | |
545 | ||
546 | In order for appending to work, you must not modify the contents of the | |
547 | word vector structure between calls to @code{glob}. And, if you set | |
548 | @code{GLOB_DOOFFS} in the first call to @code{glob}, you must also | |
549 | set it when you append to the results. | |
550 | ||
551 | Note that the pointer stored in @code{gl_pathv} may no longer be valid | |
552 | after you call @code{glob} the second time, because @code{glob} might | |
553 | have relocated the vector. So always fetch @code{gl_pathv} from the | |
554 | @code{glob_t} structure after each @code{glob} call; @strong{never} save | |
555 | the pointer across calls. | |
556 | ||
28f540f4 | 557 | @item GLOB_DOOFFS |
d08a7e4c | 558 | @standards{POSIX.2, glob.h} |
28f540f4 RM |
559 | Leave blank slots at the beginning of the vector of words. |
560 | The @code{gl_offs} field says how many slots to leave. | |
561 | The blank slots contain null pointers. | |
562 | ||
28f540f4 | 563 | @item GLOB_ERR |
d08a7e4c | 564 | @standards{POSIX.2, glob.h} |
28f540f4 RM |
565 | Give up right away and report an error if there is any difficulty |
566 | reading the directories that must be read in order to expand @var{pattern} | |
567 | fully. Such difficulties might include a directory in which you don't | |
568 | have the requisite access. Normally, @code{glob} tries its best to keep | |
569 | on going despite any errors, reading whatever directories it can. | |
570 | ||
571 | You can exercise even more control than this by specifying an | |
572 | error-handler function @var{errfunc} when you call @code{glob}. If | |
573 | @var{errfunc} is not a null pointer, then @code{glob} doesn't give up | |
574 | right away when it can't read a directory; instead, it calls | |
575 | @var{errfunc} with two arguments, like this: | |
576 | ||
577 | @smallexample | |
578 | (*@var{errfunc}) (@var{filename}, @var{error-code}) | |
579 | @end smallexample | |
580 | ||
581 | @noindent | |
582 | The argument @var{filename} is the name of the directory that | |
583 | @code{glob} couldn't open or couldn't read, and @var{error-code} is the | |
584 | @code{errno} value that was reported to @code{glob}. | |
585 | ||
586 | If the error handler function returns nonzero, then @code{glob} gives up | |
587 | right away. Otherwise, it continues. | |
588 | ||
28f540f4 | 589 | @item GLOB_MARK |
d08a7e4c | 590 | @standards{POSIX.2, glob.h} |
28f540f4 RM |
591 | If the pattern matches the name of a directory, append @samp{/} to the |
592 | directory's name when returning it. | |
593 | ||
28f540f4 | 594 | @item GLOB_NOCHECK |
d08a7e4c | 595 | @standards{POSIX.2, glob.h} |
28f540f4 RM |
596 | If the pattern doesn't match any file names, return the pattern itself |
597 | as if it were a file name that had been matched. (Normally, when the | |
598 | pattern doesn't match anything, @code{glob} returns that there were no | |
599 | matches.) | |
600 | ||
28f540f4 | 601 | @item GLOB_NOESCAPE |
d08a7e4c | 602 | @standards{POSIX.2, glob.h} |
28f540f4 RM |
603 | Don't treat the @samp{\} character specially in patterns. Normally, |
604 | @samp{\} quotes the following character, turning off its special meaning | |
605 | (if any) so that it matches only itself. When quoting is enabled, the | |
606 | pattern @samp{\?} matches only the string @samp{?}, because the question | |
607 | mark in the pattern acts like an ordinary character. | |
608 | ||
609 | If you use @code{GLOB_NOESCAPE}, then @samp{\} is an ordinary character. | |
610 | ||
611 | @code{glob} does its work by calling the function @code{fnmatch} | |
612 | repeatedly. It handles the flag @code{GLOB_NOESCAPE} by turning on the | |
613 | @code{FNM_NOESCAPE} flag in calls to @code{fnmatch}. | |
aba5e596 | 614 | |
aba5e596 | 615 | @item GLOB_NOSORT |
d08a7e4c | 616 | @standards{POSIX.2, glob.h} |
aba5e596 MF |
617 | Don't sort the file names; return them in no particular order. |
618 | (In practice, the order will depend on the order of the entries in | |
619 | the directory.) The only reason @emph{not} to sort is to save time. | |
0fc95b82 | 620 | @end vtable |
28f540f4 | 621 | |
714a562f UD |
622 | @node More Flags for Globbing |
623 | @subsection More Flags for Globbing | |
624 | ||
f2ea0f5b | 625 | Beside the flags described in the last section, the GNU implementation of |
714a562f UD |
626 | @code{glob} allows a few more flags which are also defined in the |
627 | @file{glob.h} file. Some of the extensions implement functionality | |
628 | which is available in modern shell implementations. | |
629 | ||
0fc95b82 | 630 | @vtable @code |
714a562f | 631 | @item GLOB_PERIOD |
d08a7e4c | 632 | @standards{GNU, glob.h} |
714a562f UD |
633 | The @code{.} character (period) is treated special. It cannot be |
634 | matched by wildcards. @xref{Wildcard Matching}, @code{FNM_PERIOD}. | |
635 | ||
714a562f | 636 | @item GLOB_MAGCHAR |
d08a7e4c | 637 | @standards{GNU, glob.h} |
714a562f UD |
638 | The @code{GLOB_MAGCHAR} value is not to be given to @code{glob} in the |
639 | @var{flags} parameter. Instead, @code{glob} sets this bit in the | |
640 | @var{gl_flags} element of the @var{glob_t} structure provided as the | |
641 | result if the pattern used for matching contains any wildcard character. | |
642 | ||
714a562f | 643 | @item GLOB_ALTDIRFUNC |
d08a7e4c | 644 | @standards{GNU, glob.h} |
f45eb078 | 645 | Instead of using the normal functions for accessing the |
714a562f UD |
646 | filesystem the @code{glob} implementation uses the user-supplied |
647 | functions specified in the structure pointed to by @var{pglob} | |
648 | parameter. For more information about the functions refer to the | |
8b7fb588 | 649 | sections about directory handling see @ref{Accessing Directories}, and |
714a562f UD |
650 | @ref{Reading Attributes}. |
651 | ||
714a562f | 652 | @item GLOB_BRACE |
d08a7e4c | 653 | @standards{GNU, glob.h} |
f45eb078 | 654 | If this flag is given, the handling of braces in the pattern is changed. |
714a562f UD |
655 | It is now required that braces appear correctly grouped. I.e., for each |
656 | opening brace there must be a closing one. Braces can be used | |
657 | recursively. So it is possible to define one brace expression in | |
658 | another one. It is important to note that the range of each brace | |
659 | expression is completely contained in the outer brace expression (if | |
660 | there is one). | |
661 | ||
f2ea0f5b | 662 | The string between the matching braces is separated into single |
714a562f | 663 | expressions by splitting at @code{,} (comma) characters. The commas |
0bc93a2f | 664 | themselves are discarded. Please note what we said above about recursive |
714a562f UD |
665 | brace expressions. The commas used to separate the subexpressions must |
666 | be at the same level. Commas in brace subexpressions are not matched. | |
667 | They are used during expansion of the brace expression of the deeper | |
668 | level. The example below shows this | |
669 | ||
670 | @smallexample | |
671 | glob ("@{foo/@{,bar,biz@},baz@}", GLOB_BRACE, NULL, &result) | |
672 | @end smallexample | |
673 | ||
674 | @noindent | |
675 | is equivalent to the sequence | |
676 | ||
677 | @smallexample | |
678 | glob ("foo/", GLOB_BRACE, NULL, &result) | |
679 | glob ("foo/bar", GLOB_BRACE|GLOB_APPEND, NULL, &result) | |
680 | glob ("foo/biz", GLOB_BRACE|GLOB_APPEND, NULL, &result) | |
681 | glob ("baz", GLOB_BRACE|GLOB_APPEND, NULL, &result) | |
682 | @end smallexample | |
683 | ||
684 | @noindent | |
685 | if we leave aside error handling. | |
686 | ||
714a562f | 687 | @item GLOB_NOMAGIC |
d08a7e4c | 688 | @standards{GNU, glob.h} |
714a562f UD |
689 | If the pattern contains no wildcard constructs (it is a literal file name), |
690 | return it as the sole ``matching'' word, even if no file exists by that name. | |
691 | ||
714a562f | 692 | @item GLOB_TILDE |
d08a7e4c | 693 | @standards{GNU, glob.h} |
f45eb078 | 694 | If this flag is used the character @code{~} (tilde) is handled specially |
714a562f UD |
695 | if it appears at the beginning of the pattern. Instead of being taken |
696 | verbatim it is used to represent the home directory of a known user. | |
697 | ||
698 | If @code{~} is the only character in pattern or it is followed by a | |
699 | @code{/} (slash), the home directory of the process owner is | |
700 | substituted. Using @code{getlogin} and @code{getpwnam} the information | |
701 | is read from the system databases. As an example take user @code{bart} | |
702 | with his home directory at @file{/home/bart}. For him a call like | |
703 | ||
704 | @smallexample | |
705 | glob ("~/bin/*", GLOB_TILDE, NULL, &result) | |
706 | @end smallexample | |
707 | ||
708 | @noindent | |
709 | would return the contents of the directory @file{/home/bart/bin}. | |
710 | Instead of referring to the own home directory it is also possible to | |
711 | name the home directory of other users. To do so one has to append the | |
712 | user name after the tilde character. So the contents of user | |
713 | @code{homer}'s @file{bin} directory can be retrieved by | |
714 | ||
715 | @smallexample | |
716 | glob ("~homer/bin/*", GLOB_TILDE, NULL, &result) | |
717 | @end smallexample | |
718 | ||
1bc21e7a UD |
719 | If the user name is not valid or the home directory cannot be determined |
720 | for some reason the pattern is left untouched and itself used as the | |
721 | result. I.e., if in the last example @code{home} is not available the | |
722 | tilde expansion yields to @code{"~homer/bin/*"} and @code{glob} is not | |
723 | looking for a directory named @code{~homer}. | |
724 | ||
725 | This functionality is equivalent to what is available in C-shells if the | |
726 | @code{nonomatch} flag is set. | |
727 | ||
1bc21e7a | 728 | @item GLOB_TILDE_CHECK |
d08a7e4c | 729 | @standards{GNU, glob.h} |
f45eb078 | 730 | If this flag is used @code{glob} behaves as if @code{GLOB_TILDE} is |
1bc21e7a UD |
731 | given. The only difference is that if the user name is not available or |
732 | the home directory cannot be determined for other reasons this leads to | |
733 | an error. @code{glob} will return @code{GLOB_NOMATCH} instead of using | |
734 | the pattern itself as the name. | |
735 | ||
ec986e23 | 736 | This functionality is equivalent to what is available in C-shells if |
f45eb078 | 737 | the @code{nonomatch} flag is not set. |
1cab5444 | 738 | |
1cab5444 | 739 | @item GLOB_ONLYDIR |
d08a7e4c | 740 | @standards{GNU, glob.h} |
1cab5444 UD |
741 | If this flag is used the globbing function takes this as a |
742 | @strong{hint} that the caller is only interested in directories | |
743 | matching the pattern. If the information about the type of the file | |
744 | is easily available non-directories will be rejected but no extra | |
745 | work will be done to determine the information for each file. I.e., | |
746 | the caller must still be able to filter directories out. | |
747 | ||
cc3fa755 | 748 | This functionality is only available with the GNU @code{glob} |
1cab5444 UD |
749 | implementation. It is mainly used internally to increase the |
750 | performance but might be useful for a user as well and therefore is | |
751 | documented here. | |
0fc95b82 | 752 | @end vtable |
714a562f | 753 | |
af6f3906 UD |
754 | Calling @code{glob} will in most cases allocate resources which are used |
755 | to represent the result of the function call. If the same object of | |
756 | type @code{glob_t} is used in multiple call to @code{glob} the resources | |
757 | are freed or reused so that no leaks appear. But this does not include | |
758 | the time when all @code{glob} calls are done. | |
759 | ||
af6f3906 | 760 | @deftypefun void globfree (glob_t *@var{pglob}) |
d08a7e4c | 761 | @standards{POSIX.2, glob.h} |
03483ada AO |
762 | @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} |
763 | @c globfree dup @asucorrupt @ascuheap @acucorrupt @acsmem | |
764 | @c free dup @ascuheap @acsmem | |
af6f3906 UD |
765 | The @code{globfree} function frees all resources allocated by previous |
766 | calls to @code{glob} associated with the object pointed to by | |
767 | @var{pglob}. This function should be called whenever the currently used | |
768 | @code{glob_t} typed object isn't used anymore. | |
769 | @end deftypefun | |
770 | ||
0fc95b82 | 771 | @deftypefun void globfree64 (glob64_t *@var{pglob}) |
d08a7e4c | 772 | @standards{GNU, glob.h} |
03483ada | 773 | @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
0fc95b82 UD |
774 | This function is equivalent to @code{globfree} but it frees records of |
775 | type @code{glob64_t} which were allocated by @code{glob64}. | |
776 | @end deftypefun | |
777 | ||
714a562f | 778 | |
28f540f4 RM |
779 | @node Regular Expressions |
780 | @section Regular Expression Matching | |
781 | ||
1f77f049 | 782 | @Theglibc{} supports two interfaces for matching regular |
28f540f4 | 783 | expressions. One is the standard POSIX.2 interface, and the other is |
a7a93d50 | 784 | what @theglibc{} has had for many years. |
28f540f4 RM |
785 | |
786 | Both interfaces are declared in the header file @file{regex.h}. | |
787 | If you define @w{@code{_POSIX_C_SOURCE}}, then only the POSIX.2 | |
788 | functions, structures, and constants are declared. | |
789 | @c !!! we only document the POSIX.2 interface here!! | |
790 | ||
791 | @menu | |
792 | * POSIX Regexp Compilation:: Using @code{regcomp} to prepare to match. | |
793 | * Flags for POSIX Regexps:: Syntax variations for @code{regcomp}. | |
794 | * Matching POSIX Regexps:: Using @code{regexec} to match the compiled | |
795 | pattern that you get from @code{regcomp}. | |
796 | * Regexp Subexpressions:: Finding which parts of the string were matched. | |
797 | * Subexpression Complications:: Find points of which parts were matched. | |
798 | * Regexp Cleanup:: Freeing storage; reporting errors. | |
799 | @end menu | |
800 | ||
801 | @node POSIX Regexp Compilation | |
802 | @subsection POSIX Regular Expression Compilation | |
803 | ||
804 | Before you can actually match a regular expression, you must | |
805 | @dfn{compile} it. This is not true compilation---it produces a special | |
806 | data structure, not machine instructions. But it is like ordinary | |
807 | compilation in that its purpose is to enable you to ``execute'' the | |
808 | pattern fast. (@xref{Matching POSIX Regexps}, for how to use the | |
809 | compiled regular expression for matching.) | |
810 | ||
811 | There is a special data type for compiled regular expressions: | |
812 | ||
28f540f4 | 813 | @deftp {Data Type} regex_t |
d08a7e4c | 814 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
815 | This type of object holds a compiled regular expression. |
816 | It is actually a structure. It has just one field that your programs | |
817 | should look at: | |
818 | ||
819 | @table @code | |
820 | @item re_nsub | |
821 | This field holds the number of parenthetical subexpressions in the | |
822 | regular expression that was compiled. | |
823 | @end table | |
824 | ||
825 | There are several other fields, but we don't describe them here, because | |
826 | only the functions in the library should use them. | |
827 | @end deftp | |
828 | ||
829 | After you create a @code{regex_t} object, you can compile a regular | |
830 | expression into it by calling @code{regcomp}. | |
831 | ||
2d87db5b | 832 | @deftypefun int regcomp (regex_t *restrict @var{compiled}, const char *restrict @var{pattern}, int @var{cflags}) |
d08a7e4c | 833 | @standards{POSIX.2, regex.h} |
03483ada AO |
834 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} |
835 | @c All of the issues have to do with memory allocation and multi-byte | |
836 | @c character handling present in the input string, or implied by ranges | |
837 | @c or inverted character classes. | |
838 | @c (re_)malloc @ascuheap @acsmem | |
839 | @c re_compile_internal @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
840 | @c (re_)realloc @ascuheap @acsmem [no @asucorrupt @acucorrupt for we zero the buffer] | |
841 | @c init_dfa @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
842 | @c (re_)malloc @ascuheap @acsmem | |
843 | @c calloc @ascuheap @acsmem | |
844 | @c _NL_CURRENT ok | |
845 | @c _NL_CURRENT_WORD ok | |
846 | @c btowc @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
847 | @c libc_lock_init ok | |
848 | @c re_string_construct @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
849 | @c re_string_construct_common ok | |
850 | @c re_string_realloc_buffers @ascuheap @acsmem | |
851 | @c (re_)realloc dup @ascuheap @acsmem | |
852 | @c build_wcs_upper_buffer @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
853 | @c isascii ok | |
854 | @c mbsinit ok | |
855 | @c toupper ok | |
856 | @c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
857 | @c iswlower @mtslocale | |
858 | @c towupper @mtslocale | |
859 | @c wcrtomb dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
860 | @c (re_)malloc dup @ascuheap @acsmem | |
861 | @c build_upper_buffer ok (@mtslocale but optimized) | |
862 | @c islower ok | |
863 | @c toupper ok | |
864 | @c build_wcs_buffer @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
865 | @c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
866 | @c re_string_translate_buffer ok | |
867 | @c parse @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
868 | @c fetch_token @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
869 | @c peek_token @mtslocale | |
870 | @c re_string_eoi ok | |
871 | @c re_string_peek_byte ok | |
872 | @c re_string_cur_idx ok | |
873 | @c re_string_length ok | |
874 | @c re_string_peek_byte_case @mtslocale | |
875 | @c re_string_peek_byte dup ok | |
876 | @c re_string_is_single_byte_char ok | |
877 | @c isascii ok | |
878 | @c re_string_peek_byte dup ok | |
879 | @c re_string_wchar_at ok | |
880 | @c re_string_skip_bytes ok | |
881 | @c re_string_skip_bytes dup ok | |
882 | @c parse_reg_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
883 | @c parse_branch @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
884 | @c parse_expression @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
885 | @c create_token_tree dup @ascuheap @acsmem | |
886 | @c re_string_eoi dup ok | |
887 | @c re_string_first_byte ok | |
888 | @c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
889 | @c create_tree dup @ascuheap @acsmem | |
890 | @c parse_sub_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
891 | @c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
892 | @c parse_reg_exp dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
893 | @c postorder() @ascuheap @acsmem | |
894 | @c free_tree @ascuheap @acsmem | |
895 | @c free_token dup @ascuheap @acsmem | |
896 | @c create_tree dup @ascuheap @acsmem | |
897 | @c parse_bracket_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
898 | @c _NL_CURRENT dup ok | |
899 | @c _NL_CURRENT_WORD dup ok | |
900 | @c calloc dup @ascuheap @acsmem | |
901 | @c (re_)free dup @ascuheap @acsmem | |
902 | @c peek_token_bracket ok | |
903 | @c re_string_eoi dup ok | |
904 | @c re_string_peek_byte dup ok | |
905 | @c re_string_first_byte dup ok | |
906 | @c re_string_cur_idx dup ok | |
907 | @c re_string_length dup ok | |
908 | @c re_string_skip_bytes dup ok | |
909 | @c bitset_set ok | |
910 | @c re_string_skip_bytes ok | |
911 | @c parse_bracket_element @mtslocale | |
912 | @c re_string_char_size_at ok | |
913 | @c re_string_wchar_at dup ok | |
914 | @c re_string_skip_bytes dup ok | |
915 | @c parse_bracket_symbol @mtslocale | |
916 | @c re_string_eoi dup ok | |
917 | @c re_string_fetch_byte_case @mtslocale | |
918 | @c re_string_fetch_byte ok | |
919 | @c re_string_first_byte dup ok | |
920 | @c isascii ok | |
921 | @c re_string_char_size_at dup ok | |
922 | @c re_string_skip_bytes dup ok | |
923 | @c re_string_fetch_byte dup ok | |
924 | @c re_string_peek_byte dup ok | |
925 | @c re_string_skip_bytes dup ok | |
926 | @c peek_token_bracket dup ok | |
927 | @c auto build_range_exp @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
928 | @c auto lookup_collation_sequence_value @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
929 | @c btowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
930 | @c collseq_table_lookup ok | |
931 | @c auto seek_collating_symbol_entry dup ok | |
932 | @c (re_)realloc dup @ascuheap @acsmem | |
933 | @c collseq_table_lookup dup ok | |
934 | @c bitset_set dup ok | |
935 | @c (re_)realloc dup @ascuheap @acsmem | |
936 | @c build_equiv_class @mtslocale @ascuheap @acsmem | |
937 | @c _NL_CURRENT ok | |
938 | @c auto findidx ok | |
939 | @c bitset_set dup ok | |
940 | @c (re_)realloc dup @ascuheap @acsmem | |
941 | @c auto build_collating_symbol @ascuheap @acsmem | |
942 | @c auto seek_collating_symbol_entry ok | |
943 | @c bitset_set dup ok | |
944 | @c (re_)realloc dup @ascuheap @acsmem | |
945 | @c build_charclass @mtslocale @ascuheap @acsmem | |
946 | @c (re_)realloc dup @ascuheap @acsmem | |
947 | @c bitset_set dup ok | |
948 | @c isalnum ok | |
949 | @c iscntrl ok | |
950 | @c isspace ok | |
951 | @c isalpha ok | |
952 | @c isdigit ok | |
953 | @c isprint ok | |
954 | @c isupper ok | |
955 | @c isblank ok | |
956 | @c isgraph ok | |
957 | @c ispunct ok | |
958 | @c isxdigit ok | |
959 | @c bitset_not ok | |
960 | @c bitset_mask ok | |
961 | @c create_token_tree dup @ascuheap @acsmem | |
962 | @c create_tree dup @ascuheap @acsmem | |
963 | @c free_charset dup @ascuheap @acsmem | |
964 | @c init_word_char @mtslocale | |
965 | @c isalnum ok | |
966 | @c build_charclass_op @mtslocale @ascuheap @acsmem | |
967 | @c calloc dup @ascuheap @acsmem | |
968 | @c build_charclass dup @mtslocale @ascuheap @acsmem | |
969 | @c (re_)free dup @ascuheap @acsmem | |
970 | @c free_charset dup @ascuheap @acsmem | |
971 | @c bitset_set dup ok | |
972 | @c bitset_not dup ok | |
973 | @c bitset_mask dup ok | |
974 | @c create_token_tree dup @ascuheap @acsmem | |
975 | @c create_tree dup @ascuheap @acsmem | |
976 | @c parse_dup_op @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
977 | @c re_string_cur_idx dup ok | |
978 | @c fetch_number @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
979 | @c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
980 | @c re_string_set_index ok | |
981 | @c postorder() @ascuheap @acsmem | |
982 | @c free_tree dup @ascuheap @acsmem | |
983 | @c mark_opt_subexp ok | |
984 | @c duplicate_tree @ascuheap @acsmem | |
985 | @c create_token_tree dup @ascuheap @acsmem | |
986 | @c create_tree dup @ascuheap @acsmem | |
987 | @c postorder() @ascuheap @acsmem | |
988 | @c free_tree dup @ascuheap @acsmem | |
989 | @c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
990 | @c parse_branch dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
991 | @c create_tree dup @ascuheap @acsmem | |
992 | @c create_tree @ascuheap @acsmem | |
993 | @c create_token_tree @ascuheap @acsmem | |
994 | @c (re_)malloc dup @ascuheap @acsmem | |
995 | @c analyze @ascuheap @acsmem | |
996 | @c (re_)malloc dup @ascuheap @acsmem | |
997 | @c preorder() @ascuheap @acsmem | |
998 | @c optimize_subexps ok | |
999 | @c calc_next ok | |
1000 | @c link_nfa_nodes @ascuheap @acsmem | |
1001 | @c re_node_set_init_1 @ascuheap @acsmem | |
1002 | @c (re_)malloc dup @ascuheap @acsmem | |
1003 | @c re_node_set_init_2 @ascuheap @acsmem | |
1004 | @c (re_)malloc dup @ascuheap @acsmem | |
1005 | @c postorder() @ascuheap @acsmem | |
1006 | @c lower_subexps @ascuheap @acsmem | |
1007 | @c lower_subexp @ascuheap @acsmem | |
1008 | @c create_tree dup @ascuheap @acsmem | |
1009 | @c calc_first @ascuheap @acsmem | |
1010 | @c re_dfa_add_node @ascuheap @acsmem | |
1011 | @c (re_)realloc dup @ascuheap @acsmem | |
1012 | @c re_node_set_init_empty ok | |
1013 | @c calc_eclosure @ascuheap @acsmem | |
1014 | @c calc_eclosure_iter @ascuheap @acsmem | |
1015 | @c re_node_set_alloc @ascuheap @acsmem | |
1016 | @c (re_)malloc dup @ascuheap @acsmem | |
1017 | @c duplicate_node_closure @ascuheap @acsmem | |
1018 | @c re_node_set_empty ok | |
1019 | @c duplicate_node @ascuheap @acsmem | |
1020 | @c re_dfa_add_node dup @ascuheap @acsmem | |
1021 | @c re_node_set_insert @ascuheap @acsmem | |
1022 | @c (re_)realloc dup @ascuheap @acsmem | |
1023 | @c search_duplicated_node ok | |
1024 | @c re_node_set_merge @ascuheap @acsmem | |
1025 | @c (re_)realloc dup @ascuheap @acsmem | |
1026 | @c re_node_set_free @ascuheap @acsmem | |
1027 | @c (re_)free dup @ascuheap @acsmem | |
1028 | @c re_node_set_insert dup @ascuheap @acsmem | |
1029 | @c re_node_set_free dup @ascuheap @acsmem | |
1030 | @c calc_inveclosure @ascuheap @acsmem | |
1031 | @c re_node_set_init_empty dup ok | |
1032 | @c re_node_set_insert_last @ascuheap @acsmem | |
1033 | @c (re_)realloc dup @ascuheap @acsmem | |
1034 | @c optimize_utf8 ok | |
1035 | @c create_initial_state @ascuheap @acsmem | |
1036 | @c re_node_set_init_copy @ascuheap @acsmem | |
1037 | @c (re_)malloc dup @ascuheap @acsmem | |
1038 | @c re_node_set_init_empty dup ok | |
1039 | @c re_node_set_contains ok | |
1040 | @c re_node_set_merge dup @ascuheap @acsmem | |
1041 | @c re_acquire_state_context @ascuheap @acsmem | |
1042 | @c calc_state_hash ok | |
1043 | @c re_node_set_compare ok | |
1044 | @c create_cd_newstate @ascuheap @acsmem | |
1045 | @c calloc dup @ascuheap @acsmem | |
1046 | @c re_node_set_init_copy dup @ascuheap @acsmem | |
1047 | @c (re_)free dup @ascuheap @acsmem | |
1048 | @c free_state @ascuheap @acsmem | |
1049 | @c re_node_set_free dup @ascuheap @acsmem | |
1050 | @c (re_)free dup @ascuheap @acsmem | |
1051 | @c NOT_SATISFY_PREV_CONSTRAINT ok | |
1052 | @c re_node_set_remove_at ok | |
1053 | @c register_state @ascuheap @acsmem | |
1054 | @c re_node_set_alloc dup @ascuheap @acsmem | |
1055 | @c re_node_set_insert_last dup @ascuheap @acsmem | |
1056 | @c (re_)realloc dup @ascuheap @acsmem | |
1057 | @c re_node_set_free dup @ascuheap @acsmem | |
1058 | @c free_workarea_compile @ascuheap @acsmem | |
1059 | @c (re_)free dup @ascuheap @acsmem | |
1060 | @c re_string_destruct @ascuheap @acsmem | |
1061 | @c (re_)free dup @ascuheap @acsmem | |
1062 | @c free_dfa_content @ascuheap @acsmem | |
1063 | @c free_token @ascuheap @acsmem | |
1064 | @c free_charset @ascuheap @acsmem | |
1065 | @c (re_)free dup @ascuheap @acsmem | |
1066 | @c (re_)free dup @ascuheap @acsmem | |
1067 | @c (re_)free dup @ascuheap @acsmem | |
1068 | @c re_node_set_free dup @ascuheap @acsmem | |
1069 | @c re_compile_fastmap @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1070 | @c re_compile_fastmap_iter @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1071 | @c re_set_fastmap ok | |
1072 | @c tolower ok | |
1073 | @c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1074 | @c wcrtomb dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1075 | @c towlower @mtslocale | |
1076 | @c _NL_CURRENT ok | |
1077 | @c (re_)free @ascuheap @acsmem | |
28f540f4 RM |
1078 | The function @code{regcomp} ``compiles'' a regular expression into a |
1079 | data structure that you can use with @code{regexec} to match against a | |
1080 | string. The compiled regular expression format is designed for | |
1081 | efficient matching. @code{regcomp} stores it into @code{*@var{compiled}}. | |
1082 | ||
1083 | It's up to you to allocate an object of type @code{regex_t} and pass its | |
1084 | address to @code{regcomp}. | |
1085 | ||
1086 | The argument @var{cflags} lets you specify various options that control | |
1087 | the syntax and semantics of regular expressions. @xref{Flags for POSIX | |
1088 | Regexps}. | |
1089 | ||
1090 | If you use the flag @code{REG_NOSUB}, then @code{regcomp} omits from | |
1091 | the compiled regular expression the information necessary to record | |
1092 | how subexpressions actually match. In this case, you might as well | |
1093 | pass @code{0} for the @var{matchptr} and @var{nmatch} arguments when | |
1094 | you call @code{regexec}. | |
1095 | ||
1096 | If you don't use @code{REG_NOSUB}, then the compiled regular expression | |
1097 | does have the capacity to record how subexpressions match. Also, | |
1098 | @code{regcomp} tells you how many subexpressions @var{pattern} has, by | |
1099 | storing the number in @code{@var{compiled}->re_nsub}. You can use that | |
1100 | value to decide how long an array to allocate to hold information about | |
1101 | subexpression matches. | |
1102 | ||
1103 | @code{regcomp} returns @code{0} if it succeeds in compiling the regular | |
1104 | expression; otherwise, it returns a nonzero error code (see the table | |
1105 | below). You can use @code{regerror} to produce an error message string | |
1106 | describing the reason for a nonzero value; see @ref{Regexp Cleanup}. | |
1107 | ||
1108 | @end deftypefun | |
1109 | ||
1110 | Here are the possible nonzero values that @code{regcomp} can return: | |
1111 | ||
a449fc68 | 1112 | @vtable @code |
28f540f4 | 1113 | @item REG_BADBR |
d08a7e4c | 1114 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1115 | There was an invalid @samp{\@{@dots{}\@}} construct in the regular |
1116 | expression. A valid @samp{\@{@dots{}\@}} construct must contain either | |
1117 | a single number, or two numbers in increasing order separated by a | |
1118 | comma. | |
1119 | ||
28f540f4 | 1120 | @item REG_BADPAT |
d08a7e4c | 1121 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1122 | There was a syntax error in the regular expression. |
1123 | ||
28f540f4 | 1124 | @item REG_BADRPT |
d08a7e4c | 1125 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1126 | A repetition operator such as @samp{?} or @samp{*} appeared in a bad |
1127 | position (with no preceding subexpression to act on). | |
1128 | ||
28f540f4 | 1129 | @item REG_ECOLLATE |
d08a7e4c | 1130 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1131 | The regular expression referred to an invalid collating element (one not |
1132 | defined in the current locale for string collation). @xref{Locale | |
1133 | Categories}. | |
1134 | ||
28f540f4 | 1135 | @item REG_ECTYPE |
d08a7e4c | 1136 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1137 | The regular expression referred to an invalid character class name. |
1138 | ||
28f540f4 | 1139 | @item REG_EESCAPE |
d08a7e4c | 1140 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1141 | The regular expression ended with @samp{\}. |
1142 | ||
28f540f4 | 1143 | @item REG_ESUBREG |
d08a7e4c | 1144 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1145 | There was an invalid number in the @samp{\@var{digit}} construct. |
1146 | ||
28f540f4 | 1147 | @item REG_EBRACK |
d08a7e4c | 1148 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1149 | There were unbalanced square brackets in the regular expression. |
1150 | ||
28f540f4 | 1151 | @item REG_EPAREN |
d08a7e4c | 1152 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1153 | An extended regular expression had unbalanced parentheses, |
1154 | or a basic regular expression had unbalanced @samp{\(} and @samp{\)}. | |
1155 | ||
28f540f4 | 1156 | @item REG_EBRACE |
d08a7e4c | 1157 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1158 | The regular expression had unbalanced @samp{\@{} and @samp{\@}}. |
1159 | ||
28f540f4 | 1160 | @item REG_ERANGE |
d08a7e4c | 1161 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1162 | One of the endpoints in a range expression was invalid. |
1163 | ||
28f540f4 | 1164 | @item REG_ESPACE |
d08a7e4c | 1165 | @standards{POSIX.2, regex.h} |
28f540f4 | 1166 | @code{regcomp} ran out of memory. |
a449fc68 | 1167 | @end vtable |
28f540f4 RM |
1168 | |
1169 | @node Flags for POSIX Regexps | |
1170 | @subsection Flags for POSIX Regular Expressions | |
1171 | ||
1172 | These are the bit flags that you can use in the @var{cflags} operand when | |
1173 | compiling a regular expression with @code{regcomp}. | |
6d52618b | 1174 | |
a449fc68 | 1175 | @vtable @code |
28f540f4 | 1176 | @item REG_EXTENDED |
d08a7e4c | 1177 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1178 | Treat the pattern as an extended regular expression, rather than as a |
1179 | basic regular expression. | |
1180 | ||
28f540f4 | 1181 | @item REG_ICASE |
d08a7e4c | 1182 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1183 | Ignore case when matching letters. |
1184 | ||
28f540f4 | 1185 | @item REG_NOSUB |
d08a7e4c | 1186 | @standards{POSIX.2, regex.h} |
f45eb078 | 1187 | Don't bother storing the contents of the @var{matchptr} array. |
28f540f4 | 1188 | |
28f540f4 | 1189 | @item REG_NEWLINE |
d08a7e4c | 1190 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1191 | Treat a newline in @var{string} as dividing @var{string} into multiple |
1192 | lines, so that @samp{$} can match before the newline and @samp{^} can | |
1193 | match after. Also, don't permit @samp{.} to match a newline, and don't | |
1194 | permit @samp{[^@dots{}]} to match a newline. | |
1195 | ||
1196 | Otherwise, newline acts like any other ordinary character. | |
a449fc68 | 1197 | @end vtable |
28f540f4 RM |
1198 | |
1199 | @node Matching POSIX Regexps | |
1200 | @subsection Matching a Compiled POSIX Regular Expression | |
1201 | ||
1202 | Once you have compiled a regular expression, as described in @ref{POSIX | |
1203 | Regexp Compilation}, you can match it against strings using | |
1204 | @code{regexec}. A match anywhere inside the string counts as success, | |
1205 | unless the regular expression contains anchor characters (@samp{^} or | |
1206 | @samp{$}). | |
1207 | ||
2d87db5b | 1208 | @deftypefun int regexec (const regex_t *restrict @var{compiled}, const char *restrict @var{string}, size_t @var{nmatch}, regmatch_t @var{matchptr}[restrict], int @var{eflags}) |
d08a7e4c | 1209 | @standards{POSIX.2, regex.h} |
03483ada AO |
1210 | @safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} |
1211 | @c libc_lock_lock @asulock @aculock | |
1212 | @c re_search_internal @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1213 | @c re_string_allocate @ascuheap @acsmem | |
1214 | @c re_string_construct_common dup ok | |
1215 | @c re_string_realloc_buffers dup @ascuheap @acsmem | |
1216 | @c match_ctx_init @ascuheap @acsmem | |
1217 | @c (re_)malloc dup @ascuheap @acsmem | |
1218 | @c re_string_byte_at ok | |
1219 | @c re_string_first_byte dup ok | |
1220 | @c check_matching @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1221 | @c re_string_cur_idx dup ok | |
1222 | @c acquire_init_state_context dup @ascuheap @acsmem | |
1223 | @c re_string_context_at ok | |
1224 | @c re_string_byte_at dup ok | |
1225 | @c bitset_contain ok | |
1226 | @c re_acquire_state_context dup @ascuheap @acsmem | |
1227 | @c check_subexp_matching_top @ascuheap @acsmem | |
1228 | @c match_ctx_add_subtop @ascuheap @acsmem | |
1229 | @c (re_)realloc dup @ascuheap @acsmem | |
1230 | @c calloc dup @ascuheap @acsmem | |
1231 | @c transit_state_bkref @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1232 | @c re_string_cur_idx dup ok | |
1233 | @c re_string_context_at dup ok | |
1234 | @c NOT_SATISFY_NEXT_CONSTRAINT ok | |
1235 | @c get_subexp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1236 | @c re_string_get_buffer ok | |
1237 | @c search_cur_bkref_entry ok | |
1238 | @c clean_state_log_if_needed @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1239 | @c extend_buffers @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1240 | @c re_string_realloc_buffers dup @ascuheap @acsmem | |
1241 | @c (re_)realloc dup @ascuheap @acsmem | |
1242 | @c build_wcs_upper_buffer dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1243 | @c build_upper_buffer dup ok (@mtslocale but optimized) | |
1244 | @c build_wcs_buffer dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1245 | @c re_string_translate_buffer dup ok | |
1246 | @c get_subexp_sub @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1247 | @c check_arrival @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1248 | @c (re_)realloc dup @ascuheap @acsmem | |
1249 | @c re_string_context_at dup ok | |
1250 | @c re_node_set_init_1 dup @ascuheap @acsmem | |
1251 | @c check_arrival_expand_ecl @ascuheap @acsmem | |
1252 | @c re_node_set_alloc dup @ascuheap @acsmem | |
1253 | @c find_subexp_node ok | |
1254 | @c re_node_set_merge dup @ascuheap @acsmem | |
1255 | @c re_node_set_free dup @ascuheap @acsmem | |
1256 | @c check_arrival_expand_ecl_sub @ascuheap @acsmem | |
1257 | @c re_node_set_contains dup ok | |
1258 | @c re_node_set_insert dup @ascuheap @acsmem | |
1259 | @c re_node_set_free dup @ascuheap @acsmem | |
1260 | @c re_node_set_init_copy dup @ascuheap @acsmem | |
1261 | @c re_node_set_init_empty dup ok | |
1262 | @c expand_bkref_cache @ascuheap @acsmem | |
1263 | @c search_cur_bkref_entry dup ok | |
1264 | @c re_node_set_contains dup ok | |
1265 | @c re_node_set_init_1 dup @ascuheap @acsmem | |
1266 | @c check_arrival_expand_ecl dup @ascuheap @acsmem | |
1267 | @c re_node_set_merge dup @ascuheap @acsmem | |
1268 | @c re_node_set_init_copy dup @ascuheap @acsmem | |
1269 | @c re_node_set_insert dup @ascuheap @acsmem | |
1270 | @c re_node_set_free dup @ascuheap @acsmem | |
1271 | @c re_acquire_state @ascuheap @acsmem | |
1272 | @c calc_state_hash dup ok | |
1273 | @c re_node_set_compare dup ok | |
1274 | @c create_ci_newstate @ascuheap @acsmem | |
1275 | @c calloc dup @ascuheap @acsmem | |
1276 | @c re_node_set_init_copy dup @ascuheap @acsmem | |
1277 | @c (re_)free dup @ascuheap @acsmem | |
1278 | @c register_state dup @ascuheap @acsmem | |
1279 | @c free_state dup @ascuheap @acsmem | |
1280 | @c re_acquire_state_context dup @ascuheap @acsmem | |
1281 | @c re_node_set_merge dup @ascuheap @acsmem | |
1282 | @c check_arrival_add_next_nodes @mtslocale @ascuheap @acsmem | |
1283 | @c re_node_set_init_empty dup ok | |
1284 | @c check_node_accept_bytes @mtslocale @ascuheap @acsmem | |
1285 | @c re_string_byte_at dup ok | |
1286 | @c re_string_char_size_at dup ok | |
1287 | @c re_string_elem_size_at @mtslocale | |
1288 | @c _NL_CURRENT_WORD dup ok | |
1289 | @c _NL_CURRENT dup ok | |
1290 | @c auto findidx dup ok | |
1291 | @c _NL_CURRENT_WORD dup ok | |
1292 | @c _NL_CURRENT dup ok | |
1293 | @c collseq_table_lookup dup ok | |
1294 | @c find_collation_sequence_value @mtslocale | |
1295 | @c _NL_CURRENT_WORD dup ok | |
1296 | @c _NL_CURRENT dup ok | |
1297 | @c auto findidx dup ok | |
1298 | @c wcscoll @mtslocale @ascuheap @acsmem | |
1299 | @c re_node_set_empty dup ok | |
1300 | @c re_node_set_merge dup @ascuheap @acsmem | |
1301 | @c re_node_set_free dup @ascuheap @acsmem | |
1302 | @c re_node_set_insert dup @ascuheap @acsmem | |
1303 | @c re_acquire_state dup @ascuheap @acsmem | |
1304 | @c check_node_accept ok | |
1305 | @c re_string_byte_at dup ok | |
1306 | @c bitset_contain dup ok | |
1307 | @c re_string_context_at dup ok | |
1308 | @c NOT_SATISFY_NEXT_CONSTRAINT dup ok | |
1309 | @c match_ctx_add_entry @ascuheap @acsmem | |
1310 | @c (re_)realloc dup @ascuheap @acsmem | |
1311 | @c (re_)free dup @ascuheap @acsmem | |
1312 | @c clean_state_log_if_needed dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1313 | @c extend_buffers dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1314 | @c find_subexp_node dup ok | |
1315 | @c calloc dup @ascuheap @acsmem | |
1316 | @c check_arrival dup *** | |
1317 | @c match_ctx_add_sublast @ascuheap @acsmem | |
1318 | @c (re_)realloc dup @ascuheap @acsmem | |
1319 | @c re_acquire_state_context dup @ascuheap @acsmem | |
1320 | @c re_node_set_init_union @ascuheap @acsmem | |
1321 | @c (re_)malloc dup @ascuheap @acsmem | |
1322 | @c re_node_set_init_copy dup @ascuheap @acsmem | |
1323 | @c re_node_set_init_empty dup ok | |
1324 | @c re_node_set_free dup @ascuheap @acsmem | |
1325 | @c check_subexp_matching_top dup @ascuheap @acsmem | |
1326 | @c check_halt_state_context ok | |
1327 | @c re_string_context_at dup ok | |
1328 | @c check_halt_node_context ok | |
1329 | @c NOT_SATISFY_NEXT_CONSTRAINT dup ok | |
1330 | @c re_string_eoi dup ok | |
1331 | @c extend_buffers dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1332 | @c transit_state @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1333 | @c transit_state_mb @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1334 | @c re_string_context_at dup ok | |
1335 | @c NOT_SATISFY_NEXT_CONSTRAINT dup ok | |
1336 | @c check_node_accept_bytes dup @mtslocale @ascuheap @acsmem | |
1337 | @c re_string_cur_idx dup ok | |
1338 | @c clean_state_log_if_needed @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1339 | @c re_node_set_init_union dup @ascuheap @acsmem | |
1340 | @c re_acquire_state_context dup @ascuheap @acsmem | |
1341 | @c re_string_fetch_byte dup ok | |
1342 | @c re_string_context_at dup ok | |
1343 | @c build_trtable @ascuheap @acsmem | |
1344 | @c (re_)malloc dup @ascuheap @acsmem | |
1345 | @c group_nodes_into_DFAstates @ascuheap @acsmem | |
1346 | @c bitset_empty dup ok | |
1347 | @c bitset_set dup ok | |
1348 | @c bitset_merge dup ok | |
1349 | @c bitset_set_all ok | |
1350 | @c bitset_clear ok | |
1351 | @c bitset_contain dup ok | |
1352 | @c bitset_copy ok | |
1353 | @c re_node_set_init_copy dup @ascuheap @acsmem | |
1354 | @c re_node_set_insert dup @ascuheap @acsmem | |
1355 | @c re_node_set_init_1 dup @ascuheap @acsmem | |
1356 | @c re_node_set_free dup @ascuheap @acsmem | |
1357 | @c re_node_set_alloc dup @ascuheap @acsmem | |
1358 | @c malloc dup @ascuheap @acsmem | |
1359 | @c free dup @ascuheap @acsmem | |
1360 | @c re_node_set_free dup @ascuheap @acsmem | |
1361 | @c bitset_empty ok | |
1362 | @c re_node_set_empty dup ok | |
1363 | @c re_node_set_merge dup @ascuheap @acsmem | |
1364 | @c re_acquire_state_context dup @ascuheap @acsmem | |
1365 | @c bitset_merge ok | |
1366 | @c calloc dup @ascuheap @acsmem | |
1367 | @c bitset_contain dup ok | |
1368 | @c merge_state_with_log @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1369 | @c re_string_cur_idx dup ok | |
1370 | @c re_node_set_init_union dup @ascuheap @acsmem | |
1371 | @c re_string_context_at dup ok | |
1372 | @c re_node_set_free dup @ascuheap @acsmem | |
1373 | @c check_subexp_matching_top @ascuheap @acsmem | |
1374 | @c match_ctx_add_subtop dup @ascuheap @acsmem | |
1375 | @c transit_state_bkref dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1376 | @c find_recover_state | |
1377 | @c re_string_cur_idx dup ok | |
1378 | @c re_string_skip_bytes dup ok | |
1379 | @c merge_state_with_log dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd | |
1380 | @c check_halt_state_context dup ok | |
1381 | @c prune_impossible_nodes @mtslocale @ascuheap @acsmem | |
1382 | @c (re_)malloc dup @ascuheap @acsmem | |
1383 | @c sift_ctx_init ok | |
1384 | @c re_node_set_init_empty dup ok | |
1385 | @c sift_states_backward @mtslocale @ascuheap @acsmem | |
1386 | @c re_node_set_init_1 dup @ascuheap @acsmem | |
1387 | @c update_cur_sifted_state @mtslocale @ascuheap @acsmem | |
1388 | @c add_epsilon_src_nodes @ascuheap @acsmem | |
1389 | @c re_acquire_state dup @ascuheap @acsmem | |
1390 | @c re_node_set_alloc dup @ascuheap @acsmem | |
1391 | @c re_node_set_merge dup @ascuheap @acsmem | |
1392 | @c re_node_set_add_intersect @ascuheap @acsmem | |
1393 | @c (re_)realloc dup @ascuheap @acsmem | |
1394 | @c check_subexp_limits @ascuheap @acsmem | |
1395 | @c sub_epsilon_src_nodes @ascuheap @acsmem | |
1396 | @c re_node_set_init_empty dup ok | |
1397 | @c re_node_set_contains dup ok | |
1398 | @c re_node_set_add_intersect dup @ascuheap @acsmem | |
1399 | @c re_node_set_free dup @ascuheap @acsmem | |
1400 | @c re_node_set_remove_at dup ok | |
1401 | @c re_node_set_contains dup ok | |
1402 | @c re_acquire_state dup @ascuheap @acsmem | |
1403 | @c sift_states_bkref @mtslocale @ascuheap @acsmem | |
1404 | @c search_cur_bkref_entry dup ok | |
1405 | @c check_dst_limits ok | |
1406 | @c search_cur_bkref_entry dup ok | |
1407 | @c check_dst_limits_calc_pos ok | |
1408 | @c check_dst_limits_calc_pos_1 ok | |
1409 | @c re_node_set_init_copy dup @ascuheap @acsmem | |
1410 | @c re_node_set_insert dup @ascuheap @acsmem | |
1411 | @c sift_states_backward dup @mtslocale @ascuheap @acsmem | |
1412 | @c merge_state_array dup @ascuheap @acsmem | |
1413 | @c re_node_set_remove ok | |
1414 | @c re_node_set_contains dup ok | |
1415 | @c re_node_set_remove_at dup ok | |
1416 | @c re_node_set_free dup @ascuheap @acsmem | |
1417 | @c re_node_set_free dup @ascuheap @acsmem | |
1418 | @c re_node_set_empty dup ok | |
1419 | @c build_sifted_states @mtslocale @ascuheap @acsmem | |
1420 | @c sift_states_iter_mb @mtslocale @ascuheap @acsmem | |
1421 | @c check_node_accept_bytes dup @mtslocale @ascuheap @acsmem | |
1422 | @c check_node_accept dup ok | |
1423 | @c check_dst_limits dup ok | |
1424 | @c re_node_set_insert dup @ascuheap @acsmem | |
1425 | @c re_node_set_free dup @ascuheap @acsmem | |
1426 | @c check_halt_state_context dup ok | |
1427 | @c merge_state_array @ascuheap @acsmem | |
1428 | @c re_node_set_init_union dup @ascuheap @acsmem | |
1429 | @c re_acquire_state dup @ascuheap @acsmem | |
1430 | @c re_node_set_free dup @ascuheap @acsmem | |
1431 | @c (re_)free dup @ascuheap @acsmem | |
1432 | @c set_regs @ascuheap @acsmem | |
1433 | @c (re_)malloc dup @ascuheap @acsmem | |
1434 | @c re_node_set_init_empty dup ok | |
1435 | @c free_fail_stack_return @ascuheap @acsmem | |
1436 | @c re_node_set_free dup @ascuheap @acsmem | |
1437 | @c (re_)free dup @ascuheap @acsmem | |
1438 | @c update_regs ok | |
1439 | @c re_node_set_free dup @ascuheap @acsmem | |
1440 | @c pop_fail_stack @ascuheap @acsmem | |
1441 | @c re_node_set_free dup @ascuheap @acsmem | |
1442 | @c (re_)free dup @ascuheap @acsmem | |
1443 | @c (re_)free dup @ascuheap @acsmem | |
1444 | @c (re_)free dup @ascuheap @acsmem | |
1445 | @c match_ctx_free @ascuheap @acsmem | |
1446 | @c match_ctx_clean @ascuheap @acsmem | |
1447 | @c (re_)free dup @ascuheap @acsmem | |
1448 | @c (re_)free dup @ascuheap @acsmem | |
1449 | @c re_string_destruct dup @ascuheap @acsmem | |
1450 | @c libc_lock_unlock @aculock | |
28f540f4 RM |
1451 | This function tries to match the compiled regular expression |
1452 | @code{*@var{compiled}} against @var{string}. | |
1453 | ||
1454 | @code{regexec} returns @code{0} if the regular expression matches; | |
1455 | otherwise, it returns a nonzero value. See the table below for | |
1456 | what nonzero values mean. You can use @code{regerror} to produce an | |
6d52618b | 1457 | error message string describing the reason for a nonzero value; |
28f540f4 RM |
1458 | see @ref{Regexp Cleanup}. |
1459 | ||
1460 | The argument @var{eflags} is a word of bit flags that enable various | |
1461 | options. | |
1462 | ||
1463 | If you want to get information about what part of @var{string} actually | |
1464 | matched the regular expression or its subexpressions, use the arguments | |
6d52618b | 1465 | @var{matchptr} and @var{nmatch}. Otherwise, pass @code{0} for |
28f540f4 RM |
1466 | @var{nmatch}, and @code{NULL} for @var{matchptr}. @xref{Regexp |
1467 | Subexpressions}. | |
1468 | @end deftypefun | |
1469 | ||
1470 | You must match the regular expression with the same set of current | |
1471 | locales that were in effect when you compiled the regular expression. | |
1472 | ||
1473 | The function @code{regexec} accepts the following flags in the | |
1474 | @var{eflags} argument: | |
1475 | ||
a449fc68 | 1476 | @vtable @code |
28f540f4 | 1477 | @item REG_NOTBOL |
d08a7e4c | 1478 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1479 | Do not regard the beginning of the specified string as the beginning of |
1480 | a line; more generally, don't make any assumptions about what text might | |
1481 | precede it. | |
1482 | ||
28f540f4 | 1483 | @item REG_NOTEOL |
d08a7e4c | 1484 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1485 | Do not regard the end of the specified string as the end of a line; more |
1486 | generally, don't make any assumptions about what text might follow it. | |
a449fc68 | 1487 | @end vtable |
28f540f4 RM |
1488 | |
1489 | Here are the possible nonzero values that @code{regexec} can return: | |
1490 | ||
a449fc68 | 1491 | @vtable @code |
28f540f4 | 1492 | @item REG_NOMATCH |
d08a7e4c | 1493 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1494 | The pattern didn't match the string. This isn't really an error. |
1495 | ||
28f540f4 | 1496 | @item REG_ESPACE |
d08a7e4c | 1497 | @standards{POSIX.2, regex.h} |
28f540f4 | 1498 | @code{regexec} ran out of memory. |
a449fc68 | 1499 | @end vtable |
28f540f4 RM |
1500 | |
1501 | @node Regexp Subexpressions | |
1502 | @subsection Match Results with Subexpressions | |
1503 | ||
1504 | When @code{regexec} matches parenthetical subexpressions of | |
1505 | @var{pattern}, it records which parts of @var{string} they match. It | |
1506 | returns that information by storing the offsets into an array whose | |
1507 | elements are structures of type @code{regmatch_t}. The first element of | |
1508 | the array (index @code{0}) records the part of the string that matched | |
1509 | the entire regular expression. Each other element of the array records | |
1510 | the beginning and end of the part that matched a single parenthetical | |
1511 | subexpression. | |
1512 | ||
28f540f4 | 1513 | @deftp {Data Type} regmatch_t |
d08a7e4c | 1514 | @standards{POSIX.2, regex.h} |
f45eb078 | 1515 | This is the data type of the @var{matchptr} array that you pass to |
6d52618b | 1516 | @code{regexec}. It contains two structure fields, as follows: |
28f540f4 RM |
1517 | |
1518 | @table @code | |
1519 | @item rm_so | |
1520 | The offset in @var{string} of the beginning of a substring. Add this | |
1521 | value to @var{string} to get the address of that part. | |
1522 | ||
1523 | @item rm_eo | |
1524 | The offset in @var{string} of the end of the substring. | |
1525 | @end table | |
1526 | @end deftp | |
1527 | ||
28f540f4 | 1528 | @deftp {Data Type} regoff_t |
d08a7e4c | 1529 | @standards{POSIX.2, regex.h} |
28f540f4 RM |
1530 | @code{regoff_t} is an alias for another signed integer type. |
1531 | The fields of @code{regmatch_t} have type @code{regoff_t}. | |
1532 | @end deftp | |
1533 | ||
1534 | The @code{regmatch_t} elements correspond to subexpressions | |
1535 | positionally; the first element (index @code{1}) records where the first | |
1536 | subexpression matched, the second element records the second | |
1537 | subexpression, and so on. The order of the subexpressions is the order | |
1538 | in which they begin. | |
1539 | ||
1540 | When you call @code{regexec}, you specify how long the @var{matchptr} | |
1541 | array is, with the @var{nmatch} argument. This tells @code{regexec} how | |
1542 | many elements to store. If the actual regular expression has more than | |
1543 | @var{nmatch} subexpressions, then you won't get offset information about | |
1544 | the rest of them. But this doesn't alter whether the pattern matches a | |
1545 | particular string or not. | |
1546 | ||
1547 | If you don't want @code{regexec} to return any information about where | |
1548 | the subexpressions matched, you can either supply @code{0} for | |
1549 | @var{nmatch}, or use the flag @code{REG_NOSUB} when you compile the | |
1550 | pattern with @code{regcomp}. | |
1551 | ||
1552 | @node Subexpression Complications | |
1553 | @subsection Complications in Subexpression Matching | |
1554 | ||
1555 | Sometimes a subexpression matches a substring of no characters. This | |
1556 | happens when @samp{f\(o*\)} matches the string @samp{fum}. (It really | |
1557 | matches just the @samp{f}.) In this case, both of the offsets identify | |
1558 | the point in the string where the null substring was found. In this | |
1559 | example, the offsets are both @code{1}. | |
1560 | ||
1561 | Sometimes the entire regular expression can match without using some of | |
1562 | its subexpressions at all---for example, when @samp{ba\(na\)*} matches the | |
1563 | string @samp{ba}, the parenthetical subexpression is not used. When | |
1564 | this happens, @code{regexec} stores @code{-1} in both fields of the | |
1565 | element for that subexpression. | |
1566 | ||
1567 | Sometimes matching the entire regular expression can match a particular | |
1568 | subexpression more than once---for example, when @samp{ba\(na\)*} | |
1569 | matches the string @samp{bananana}, the parenthetical subexpression | |
1570 | matches three times. When this happens, @code{regexec} usually stores | |
1571 | the offsets of the last part of the string that matched the | |
1572 | subexpression. In the case of @samp{bananana}, these offsets are | |
1573 | @code{6} and @code{8}. | |
1574 | ||
1575 | But the last match is not always the one that is chosen. It's more | |
1576 | accurate to say that the last @emph{opportunity} to match is the one | |
1577 | that takes precedence. What this means is that when one subexpression | |
1578 | appears within another, then the results reported for the inner | |
1579 | subexpression reflect whatever happened on the last match of the outer | |
1580 | subexpression. For an example, consider @samp{\(ba\(na\)*s \)*} matching | |
1581 | the string @samp{bananas bas }. The last time the inner expression | |
6d52618b | 1582 | actually matches is near the end of the first word. But it is |
28f540f4 RM |
1583 | @emph{considered} again in the second word, and fails to match there. |
1584 | @code{regexec} reports nonuse of the ``na'' subexpression. | |
1585 | ||
1586 | Another place where this rule applies is when the regular expression | |
838e5ffe UD |
1587 | @smallexample |
1588 | \(ba\(na\)*s \|nefer\(ti\)* \)* | |
1589 | @end smallexample | |
1590 | @noindent | |
1591 | matches @samp{bananas nefertiti}. The ``na'' subexpression does match | |
1592 | in the first word, but it doesn't match in the second word because the | |
1593 | other alternative is used there. Once again, the second repetition of | |
1594 | the outer subexpression overrides the first, and within that second | |
1595 | repetition, the ``na'' subexpression is not used. So @code{regexec} | |
1596 | reports nonuse of the ``na'' subexpression. | |
28f540f4 RM |
1597 | |
1598 | @node Regexp Cleanup | |
1599 | @subsection POSIX Regexp Matching Cleanup | |
1600 | ||
1601 | When you are finished using a compiled regular expression, you can | |
1602 | free the storage it uses by calling @code{regfree}. | |
1603 | ||
28f540f4 | 1604 | @deftypefun void regfree (regex_t *@var{compiled}) |
d08a7e4c | 1605 | @standards{POSIX.2, regex.h} |
03483ada AO |
1606 | @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} |
1607 | @c (re_)free dup @ascuheap @acsmem | |
1608 | @c free_dfa_content dup @ascuheap @acsmem | |
28f540f4 RM |
1609 | Calling @code{regfree} frees all the storage that @code{*@var{compiled}} |
1610 | points to. This includes various internal fields of the @code{regex_t} | |
1611 | structure that aren't documented in this manual. | |
1612 | ||
1613 | @code{regfree} does not free the object @code{*@var{compiled}} itself. | |
1614 | @end deftypefun | |
1615 | ||
1616 | You should always free the space in a @code{regex_t} structure with | |
1617 | @code{regfree} before using the structure to compile another regular | |
1618 | expression. | |
1619 | ||
1620 | When @code{regcomp} or @code{regexec} reports an error, you can use | |
1621 | the function @code{regerror} to turn it into an error message string. | |
1622 | ||
2d87db5b | 1623 | @deftypefun size_t regerror (int @var{errcode}, const regex_t *restrict @var{compiled}, char *restrict @var{buffer}, size_t @var{length}) |
d08a7e4c | 1624 | @standards{POSIX.2, regex.h} |
03483ada AO |
1625 | @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
1626 | @c regerror calls gettext, strcmp and mempcpy or memcpy. | |
28f540f4 RM |
1627 | This function produces an error message string for the error code |
1628 | @var{errcode}, and stores the string in @var{length} bytes of memory | |
1629 | starting at @var{buffer}. For the @var{compiled} argument, supply the | |
1630 | same compiled regular expression structure that @code{regcomp} or | |
1631 | @code{regexec} was working with when it got the error. Alternatively, | |
1632 | you can supply @code{NULL} for @var{compiled}; you will still get a | |
1633 | meaningful error message, but it might not be as detailed. | |
1634 | ||
1635 | If the error message can't fit in @var{length} bytes (including a | |
1636 | terminating null character), then @code{regerror} truncates it. | |
1637 | The string that @code{regerror} stores is always null-terminated | |
1638 | even if it has been truncated. | |
1639 | ||
1640 | The return value of @code{regerror} is the minimum length needed to | |
1641 | store the entire error message. If this is less than @var{length}, then | |
1642 | the error message was not truncated, and you can use it. Otherwise, you | |
1643 | should call @code{regerror} again with a larger buffer. | |
1644 | ||
1645 | Here is a function which uses @code{regerror}, but always dynamically | |
1646 | allocates a buffer for the error message: | |
1647 | ||
1648 | @smallexample | |
1649 | char *get_regerror (int errcode, regex_t *compiled) | |
1650 | @{ | |
1651 | size_t length = regerror (errcode, compiled, NULL, 0); | |
1652 | char *buffer = xmalloc (length); | |
1653 | (void) regerror (errcode, compiled, buffer, length); | |
1654 | return buffer; | |
1655 | @} | |
1656 | @end smallexample | |
1657 | @end deftypefun | |
1658 | ||
28f540f4 RM |
1659 | @node Word Expansion |
1660 | @section Shell-Style Word Expansion | |
1661 | @cindex word expansion | |
1662 | @cindex expansion of shell words | |
1663 | ||
6d52618b | 1664 | @dfn{Word expansion} means the process of splitting a string into |
28f540f4 RM |
1665 | @dfn{words} and substituting for variables, commands, and wildcards |
1666 | just as the shell does. | |
1667 | ||
1668 | For example, when you write @samp{ls -l foo.c}, this string is split | |
1669 | into three separate words---@samp{ls}, @samp{-l} and @samp{foo.c}. | |
1670 | This is the most basic function of word expansion. | |
1671 | ||
1672 | When you write @samp{ls *.c}, this can become many words, because | |
1673 | the word @samp{*.c} can be replaced with any number of file names. | |
1674 | This is called @dfn{wildcard expansion}, and it is also a part of | |
1675 | word expansion. | |
1676 | ||
1677 | When you use @samp{echo $PATH} to print your path, you are taking | |
1678 | advantage of @dfn{variable substitution}, which is also part of word | |
1679 | expansion. | |
1680 | ||
1681 | Ordinary programs can perform word expansion just like the shell by | |
1682 | calling the library function @code{wordexp}. | |
1683 | ||
1684 | @menu | |
14eb5d5d UD |
1685 | * Expansion Stages:: What word expansion does to a string. |
1686 | * Calling Wordexp:: How to call @code{wordexp}. | |
1687 | * Flags for Wordexp:: Options you can enable in @code{wordexp}. | |
1688 | * Wordexp Example:: A sample program that does word expansion. | |
1689 | * Tilde Expansion:: Details of how tilde expansion works. | |
1690 | * Variable Substitution:: Different types of variable substitution. | |
28f540f4 RM |
1691 | @end menu |
1692 | ||
1693 | @node Expansion Stages | |
1694 | @subsection The Stages of Word Expansion | |
1695 | ||
1696 | When word expansion is applied to a sequence of words, it performs the | |
1697 | following transformations in the order shown here: | |
1698 | ||
1699 | @enumerate | |
1700 | @item | |
1701 | @cindex tilde expansion | |
1702 | @dfn{Tilde expansion}: Replacement of @samp{~foo} with the name of | |
1703 | the home directory of @samp{foo}. | |
1704 | ||
1705 | @item | |
1706 | Next, three different transformations are applied in the same step, | |
1707 | from left to right: | |
1708 | ||
1709 | @itemize @bullet | |
1710 | @item | |
1711 | @cindex variable substitution | |
1712 | @cindex substitution of variables and commands | |
1713 | @dfn{Variable substitution}: Environment variables are substituted for | |
1714 | references such as @samp{$foo}. | |
1715 | ||
1716 | @item | |
1717 | @cindex command substitution | |
1718 | @dfn{Command substitution}: Constructs such as @w{@samp{`cat foo`}} and | |
1719 | the equivalent @w{@samp{$(cat foo)}} are replaced with the output from | |
1720 | the inner command. | |
1721 | ||
1722 | @item | |
1723 | @cindex arithmetic expansion | |
1724 | @dfn{Arithmetic expansion}: Constructs such as @samp{$(($x-1))} are | |
1725 | replaced with the result of the arithmetic computation. | |
1726 | @end itemize | |
1727 | ||
1728 | @item | |
1729 | @cindex field splitting | |
1730 | @dfn{Field splitting}: subdivision of the text into @dfn{words}. | |
1731 | ||
1732 | @item | |
1733 | @cindex wildcard expansion | |
1734 | @dfn{Wildcard expansion}: The replacement of a construct such as @samp{*.c} | |
1735 | with a list of @samp{.c} file names. Wildcard expansion applies to an | |
1736 | entire word at a time, and replaces that word with 0 or more file names | |
1737 | that are themselves words. | |
1738 | ||
1739 | @item | |
1740 | @cindex quote removal | |
1741 | @cindex removal of quotes | |
1742 | @dfn{Quote removal}: The deletion of string-quotes, now that they have | |
1743 | done their job by inhibiting the above transformations when appropriate. | |
1744 | @end enumerate | |
1745 | ||
1746 | For the details of these transformations, and how to write the constructs | |
1747 | that use them, see @w{@cite{The BASH Manual}} (to appear). | |
1748 | ||
1749 | @node Calling Wordexp | |
1750 | @subsection Calling @code{wordexp} | |
1751 | ||
1752 | All the functions, constants and data types for word expansion are | |
1753 | declared in the header file @file{wordexp.h}. | |
1754 | ||
1755 | Word expansion produces a vector of words (strings). To return this | |
1756 | vector, @code{wordexp} uses a special data type, @code{wordexp_t}, which | |
1757 | is a structure. You pass @code{wordexp} the address of the structure, | |
1758 | and it fills in the structure's fields to tell you about the results. | |
1759 | ||
28f540f4 | 1760 | @deftp {Data Type} {wordexp_t} |
d08a7e4c | 1761 | @standards{POSIX.2, wordexp.h} |
28f540f4 RM |
1762 | This data type holds a pointer to a word vector. More precisely, it |
1763 | records both the address of the word vector and its size. | |
1764 | ||
1765 | @table @code | |
1766 | @item we_wordc | |
1767 | The number of elements in the vector. | |
1768 | ||
1769 | @item we_wordv | |
1770 | The address of the vector. This field has type @w{@code{char **}}. | |
1771 | ||
1772 | @item we_offs | |
1773 | The offset of the first real element of the vector, from its nominal | |
1774 | address in the @code{we_wordv} field. Unlike the other fields, this | |
1775 | is always an input to @code{wordexp}, rather than an output from it. | |
1776 | ||
1777 | If you use a nonzero offset, then that many elements at the beginning of | |
1778 | the vector are left empty. (The @code{wordexp} function fills them with | |
1779 | null pointers.) | |
1780 | ||
1781 | The @code{we_offs} field is meaningful only if you use the | |
1782 | @code{WRDE_DOOFFS} flag. Otherwise, the offset is always zero | |
1783 | regardless of what is in this field, and the first real element comes at | |
1784 | the beginning of the vector. | |
1785 | @end table | |
1786 | @end deftp | |
1787 | ||
28f540f4 | 1788 | @deftypefun int wordexp (const char *@var{words}, wordexp_t *@var{word-vector-ptr}, int @var{flags}) |
d08a7e4c | 1789 | @standards{POSIX.2, wordexp.h} |
03483ada AO |
1790 | @safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtasuconst{:@mtsenv{}} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @ascuplugin{} @ascuintl{} @ascuheap{} @asucorrupt{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} |
1791 | @c wordexp @mtasurace:utent @mtasuconst:@mtsenv @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuintl @ascuheap @asucorrupt @asulock @acucorrupt @aculock @acsfd @acsmem | |
1792 | @c w_newword ok | |
1793 | @c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem | |
1794 | @c calloc dup @ascuheap @acsmem | |
1795 | @c getenv dup @mtsenv | |
1796 | @c strcpy dup ok | |
1797 | @c parse_backslash @ascuheap @acsmem | |
1798 | @c w_addchar dup @ascuheap @acsmem | |
1799 | @c parse_dollars @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1800 | @c w_addchar dup @ascuheap @acsmem | |
1801 | @c parse_arith @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1802 | @c w_newword dup ok | |
1803 | @c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1804 | @c parse_backtick dup @ascuplugin @ascuheap @aculock @acsfd @acsmem | |
1805 | @c parse_qtd_backslash dup @ascuheap @acsmem | |
1806 | @c eval_expr @mtslocale | |
1807 | @c eval_expr_multidiv @mtslocale | |
1808 | @c eval_expr_val @mtslocale | |
1809 | @c isspace dup @mtslocale | |
1810 | @c eval_expr dup @mtslocale | |
1811 | @c isspace dup @mtslocale | |
1812 | @c isspace dup @mtslocale | |
1813 | @c free dup @ascuheap @acsmem | |
1814 | @c w_addchar dup @ascuheap @acsmem | |
1815 | @c w_addstr dup @ascuheap @acsmem | |
1816 | @c itoa_word dup ok | |
1817 | @c parse_comm @ascuplugin @ascuheap @aculock @acsfd @acsmem | |
1818 | @c w_newword dup ok | |
1819 | @c pthread_setcancelstate @ascuplugin @ascuheap @acsmem | |
1820 | @c (disable cancellation around exec_comm; it may do_cancel the | |
1821 | @c second time, if async cancel is enabled) | |
1822 | @c THREAD_ATOMIC_CMPXCHG_VAL dup ok | |
03483ada AO |
1823 | @c do_cancel @ascuplugin @ascuheap @acsmem |
1824 | @c THREAD_ATOMIC_BIT_SET dup ok | |
1825 | @c pthread_unwind @ascuplugin @ascuheap @acsmem | |
1826 | @c Unwind_ForcedUnwind if available @ascuplugin @ascuheap @acsmem | |
1827 | @c libc_unwind_longjmp otherwise | |
1828 | @c cleanups | |
1829 | @c exec_comm @ascuplugin @ascuheap @aculock @acsfd @acsmem | |
1830 | @c pipe2 dup ok | |
1831 | @c pipe dup ok | |
1832 | @c fork dup @ascuplugin @aculock | |
1833 | @c close dup @acsfd | |
1834 | @c on child: exec_comm_child -> exec or abort | |
1835 | @c waitpid dup ok | |
1836 | @c read dup ok | |
1837 | @c w_addmem dup @ascuheap @acsmem | |
1838 | @c strchr dup ok | |
1839 | @c w_addword dup @ascuheap @acsmem | |
1840 | @c w_newword dup ok | |
1841 | @c w_addchar dup @ascuheap @acsmem | |
1842 | @c free dup @ascuheap @acsmem | |
1843 | @c kill dup ok | |
1844 | @c free dup @ascuheap @acsmem | |
1845 | @c parse_param @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1846 | @c reads from __libc_argc and __libc_argv without guards | |
1847 | @c w_newword dup ok | |
1848 | @c isalpha dup @mtslocale^^ | |
1849 | @c w_addchar dup @ascuheap @acsmem | |
1850 | @c isalnum dup @mtslocale^^ | |
1851 | @c isdigit dup @mtslocale^^ | |
1852 | @c strchr dup ok | |
1853 | @c itoa_word dup ok | |
1854 | @c atoi dup @mtslocale | |
1855 | @c getpid dup ok | |
1856 | @c w_addstr dup @ascuheap @acsmem | |
1857 | @c free dup @ascuheap @acsmem | |
1858 | @c strlen dup ok | |
1859 | @c malloc dup @ascuheap @acsmem | |
1860 | @c stpcpy dup ok | |
1861 | @c w_addword dup @ascuheap @acsmem | |
1862 | @c strdup dup @ascuheap @acsmem | |
1863 | @c getenv dup @mtsenv | |
1864 | @c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1865 | @c parse_tilde dup @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1866 | @c fnmatch dup @mtsenv @mtslocale @ascuheap @acsmem | |
1867 | @c mempcpy dup ok | |
1868 | @c _ dup @ascuintl | |
1869 | @c fxprintf dup @aculock | |
1870 | @c setenv dup @mtasuconst:@mtsenv @ascuheap @asulock @acucorrupt @aculock @acsmem | |
1871 | @c strspn dup ok | |
1872 | @c strcspn dup ok | |
1873 | @c parse_backtick @ascuplugin @ascuheap @aculock @acsfd @acsmem | |
1874 | @c w_newword dup ok | |
1875 | @c exec_comm dup @ascuplugin @ascuheap @aculock @acsfd @acsmem | |
1876 | @c free dup @ascuheap @acsmem | |
1877 | @c parse_qtd_backslash dup @ascuheap @acsmem | |
1878 | @c parse_backslash dup @ascuheap @acsmem | |
1879 | @c w_addchar dup @ascuheap @acsmem | |
1880 | @c parse_dquote @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1881 | @c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1882 | @c parse_backtick dup @ascuplugin @ascuheap @aculock @acsfd @acsmem | |
1883 | @c parse_qtd_backslash dup @ascuheap @acsmem | |
1884 | @c w_addchar dup @ascuheap @acsmem | |
1885 | @c w_addword dup @ascuheap @acsmem | |
1886 | @c strdup dup @ascuheap @acsmem | |
1887 | @c realloc dup @ascuheap @acsmem | |
1888 | @c free dup @ascuheap @acsmem | |
1889 | @c parse_squote dup @ascuheap @acsmem | |
1890 | @c w_addchar dup @ascuheap @acsmem | |
1891 | @c parse_tilde @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1892 | @c strchr dup ok | |
1893 | @c w_addchar dup @ascuheap @acsmem | |
1894 | @c getenv dup @mtsenv | |
1895 | @c w_addstr dup @ascuheap @acsmem | |
1896 | @c strlen dup ok | |
1897 | @c w_addmem dup @ascuheap @acsmem | |
1898 | @c realloc dup @ascuheap @acsmem | |
1899 | @c free dup @ascuheap @acsmem | |
1900 | @c mempcpy dup ok | |
1901 | @c getuid dup ok | |
1902 | @c getpwuid_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1903 | @c getpwnam_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1904 | @c parse_glob @mtasurace:utent @mtasuconst:@mtsenv @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1905 | @c strchr dup ok | |
1906 | @c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem | |
1907 | @c parse_qtd_backslash @ascuheap @acsmem | |
1908 | @c w_addchar dup @ascuheap @acsmem | |
1909 | @c parse_backslash dup @ascuheap @acsmem | |
1910 | @c w_addchar dup @ascuheap @acsmem | |
1911 | @c w_addword dup @ascuheap @acsmem | |
1912 | @c w_newword dup ok | |
1913 | @c do_parse_glob @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @aculock @acsfd @acsmem | |
1914 | @c glob dup @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @aculock @acsfd @acsmem [auto glob_t avoids @asucorrupt @acucorrupt] | |
1915 | @c w_addstr dup @ascuheap @acsmem | |
1916 | @c w_addchar dup @ascuheap @acsmem | |
1917 | @c globfree dup @ascuheap @acsmem [auto glob_t avoids @asucorrupt @acucorrupt] | |
1918 | @c free dup @ascuheap @acsmem | |
1919 | @c w_newword dup ok | |
1920 | @c strdup dup @ascuheap @acsmem | |
1921 | @c w_addword dup @ascuheap @acsmem | |
1922 | @c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem | |
1923 | @c strchr dup ok | |
1924 | @c w_addchar dup @ascuheap @acsmem | |
1925 | @c realloc dup @ascuheap @acsmem | |
1926 | @c free dup @ascuheap @acsmem | |
1927 | @c free dup @ascuheap @acsmem | |
28f540f4 RM |
1928 | Perform word expansion on the string @var{words}, putting the result in |
1929 | a newly allocated vector, and store the size and address of this vector | |
1930 | into @code{*@var{word-vector-ptr}}. The argument @var{flags} is a | |
1931 | combination of bit flags; see @ref{Flags for Wordexp}, for details of | |
1932 | the flags. | |
1933 | ||
1934 | You shouldn't use any of the characters @samp{|&;<>} in the string | |
1935 | @var{words} unless they are quoted; likewise for newline. If you use | |
1936 | these characters unquoted, you will get the @code{WRDE_BADCHAR} error | |
1937 | code. Don't use parentheses or braces unless they are quoted or part of | |
1938 | a word expansion construct. If you use quotation characters @samp{'"`}, | |
1939 | they should come in pairs that balance. | |
1940 | ||
1941 | The results of word expansion are a sequence of words. The function | |
1942 | @code{wordexp} allocates a string for each resulting word, then | |
1943 | allocates a vector of type @code{char **} to store the addresses of | |
1944 | these strings. The last element of the vector is a null pointer. | |
1945 | This vector is called the @dfn{word vector}. | |
1946 | ||
1947 | To return this vector, @code{wordexp} stores both its address and its | |
1948 | length (number of elements, not counting the terminating null pointer) | |
1949 | into @code{*@var{word-vector-ptr}}. | |
1950 | ||
1951 | If @code{wordexp} succeeds, it returns 0. Otherwise, it returns one | |
1952 | of these error codes: | |
1953 | ||
a449fc68 | 1954 | @vtable @code |
28f540f4 | 1955 | @item WRDE_BADCHAR |
d08a7e4c | 1956 | @standards{POSIX.2, wordexp.h} |
28f540f4 RM |
1957 | The input string @var{words} contains an unquoted invalid character such |
1958 | as @samp{|}. | |
1959 | ||
28f540f4 | 1960 | @item WRDE_BADVAL |
d08a7e4c | 1961 | @standards{POSIX.2, wordexp.h} |
28f540f4 RM |
1962 | The input string refers to an undefined shell variable, and you used the flag |
1963 | @code{WRDE_UNDEF} to forbid such references. | |
1964 | ||
28f540f4 | 1965 | @item WRDE_CMDSUB |
d08a7e4c | 1966 | @standards{POSIX.2, wordexp.h} |
28f540f4 RM |
1967 | The input string uses command substitution, and you used the flag |
1968 | @code{WRDE_NOCMD} to forbid command substitution. | |
1969 | ||
28f540f4 | 1970 | @item WRDE_NOSPACE |
d08a7e4c | 1971 | @standards{POSIX.2, wordexp.h} |
28f540f4 RM |
1972 | It was impossible to allocate memory to hold the result. In this case, |
1973 | @code{wordexp} can store part of the results---as much as it could | |
1974 | allocate room for. | |
1975 | ||
28f540f4 | 1976 | @item WRDE_SYNTAX |
d08a7e4c | 1977 | @standards{POSIX.2, wordexp.h} |
28f540f4 | 1978 | There was a syntax error in the input string. For example, an unmatched |
2b028564 FW |
1979 | quoting character is a syntax error. This error code is also used to |
1980 | signal division by zero and overflow in arithmetic expansion. | |
a449fc68 | 1981 | @end vtable |
28f540f4 RM |
1982 | @end deftypefun |
1983 | ||
28f540f4 | 1984 | @deftypefun void wordfree (wordexp_t *@var{word-vector-ptr}) |
d08a7e4c | 1985 | @standards{POSIX.2, wordexp.h} |
03483ada AO |
1986 | @safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} |
1987 | @c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem | |
1988 | @c free dup @ascuheap @acsmem | |
28f540f4 RM |
1989 | Free the storage used for the word-strings and vector that |
1990 | @code{*@var{word-vector-ptr}} points to. This does not free the | |
1991 | structure @code{*@var{word-vector-ptr}} itself---only the other | |
1992 | data it points to. | |
1993 | @end deftypefun | |
1994 | ||
1995 | @node Flags for Wordexp | |
1996 | @subsection Flags for Word Expansion | |
1997 | ||
6d52618b | 1998 | This section describes the flags that you can specify in the |
28f540f4 RM |
1999 | @var{flags} argument to @code{wordexp}. Choose the flags you want, |
2000 | and combine them with the C operator @code{|}. | |
2001 | ||
a449fc68 | 2002 | @vtable @code |
28f540f4 | 2003 | @item WRDE_APPEND |
d08a7e4c | 2004 | @standards{POSIX.2, wordexp.h} |
28f540f4 RM |
2005 | Append the words from this expansion to the vector of words produced by |
2006 | previous calls to @code{wordexp}. This way you can effectively expand | |
2007 | several words as if they were concatenated with spaces between them. | |
2008 | ||
2009 | In order for appending to work, you must not modify the contents of the | |
2010 | word vector structure between calls to @code{wordexp}. And, if you set | |
2011 | @code{WRDE_DOOFFS} in the first call to @code{wordexp}, you must also | |
2012 | set it when you append to the results. | |
2013 | ||
28f540f4 | 2014 | @item WRDE_DOOFFS |
d08a7e4c | 2015 | @standards{POSIX.2, wordexp.h} |
28f540f4 RM |
2016 | Leave blank slots at the beginning of the vector of words. |
2017 | The @code{we_offs} field says how many slots to leave. | |
2018 | The blank slots contain null pointers. | |
2019 | ||
28f540f4 | 2020 | @item WRDE_NOCMD |
d08a7e4c | 2021 | @standards{POSIX.2, wordexp.h} |
28f540f4 RM |
2022 | Don't do command substitution; if the input requests command substitution, |
2023 | report an error. | |
2024 | ||
28f540f4 | 2025 | @item WRDE_REUSE |
d08a7e4c | 2026 | @standards{POSIX.2, wordexp.h} |
28f540f4 RM |
2027 | Reuse a word vector made by a previous call to @code{wordexp}. |
2028 | Instead of allocating a new vector of words, this call to @code{wordexp} | |
2029 | will use the vector that already exists (making it larger if necessary). | |
2030 | ||
2031 | Note that the vector may move, so it is not safe to save an old pointer | |
2032 | and use it again after calling @code{wordexp}. You must fetch | |
2033 | @code{we_pathv} anew after each call. | |
2034 | ||
28f540f4 | 2035 | @item WRDE_SHOWERR |
d08a7e4c | 2036 | @standards{POSIX.2, wordexp.h} |
28f540f4 RM |
2037 | Do show any error messages printed by commands run by command substitution. |
2038 | More precisely, allow these commands to inherit the standard error output | |
2039 | stream of the current process. By default, @code{wordexp} gives these | |
2040 | commands a standard error stream that discards all output. | |
2041 | ||
28f540f4 | 2042 | @item WRDE_UNDEF |
d08a7e4c | 2043 | @standards{POSIX.2, wordexp.h} |
28f540f4 RM |
2044 | If the input refers to a shell variable that is not defined, report an |
2045 | error. | |
a449fc68 | 2046 | @end vtable |
28f540f4 RM |
2047 | |
2048 | @node Wordexp Example | |
2049 | @subsection @code{wordexp} Example | |
2050 | ||
2051 | Here is an example of using @code{wordexp} to expand several strings | |
2052 | and use the results to run a shell command. It also shows the use of | |
2053 | @code{WRDE_APPEND} to concatenate the expansions and of @code{wordfree} | |
2054 | to free the space allocated by @code{wordexp}. | |
2055 | ||
2056 | @smallexample | |
2057 | int | |
1daa8164 | 2058 | expand_and_execute (const char *program, const char **options) |
28f540f4 RM |
2059 | @{ |
2060 | wordexp_t result; | |
2061 | pid_t pid | |
2062 | int status, i; | |
2063 | ||
2064 | /* @r{Expand the string for the program to run.} */ | |
2065 | switch (wordexp (program, &result, 0)) | |
2066 | @{ | |
2067 | case 0: /* @r{Successful}. */ | |
2068 | break; | |
2069 | case WRDE_NOSPACE: | |
2070 | /* @r{If the error was @code{WRDE_NOSPACE},} | |
2071 | @r{then perhaps part of the result was allocated.} */ | |
2072 | wordfree (&result); | |
2073 | default: /* @r{Some other error.} */ | |
2074 | return -1; | |
2075 | @} | |
2076 | ||
2077 | /* @r{Expand the strings specified for the arguments.} */ | |
1daa8164 | 2078 | for (i = 0; options[i] != NULL; i++) |
28f540f4 | 2079 | @{ |
1daa8164 | 2080 | if (wordexp (options[i], &result, WRDE_APPEND)) |
28f540f4 RM |
2081 | @{ |
2082 | wordfree (&result); | |
2083 | return -1; | |
2084 | @} | |
2085 | @} | |
2086 | ||
2087 | pid = fork (); | |
2088 | if (pid == 0) | |
2089 | @{ | |
2090 | /* @r{This is the child process. Execute the command.} */ | |
2091 | execv (result.we_wordv[0], result.we_wordv); | |
2092 | exit (EXIT_FAILURE); | |
2093 | @} | |
2094 | else if (pid < 0) | |
2095 | /* @r{The fork failed. Report failure.} */ | |
2096 | status = -1; | |
2097 | else | |
2098 | /* @r{This is the parent process. Wait for the child to complete.} */ | |
2099 | if (waitpid (pid, &status, 0) != pid) | |
2100 | status = -1; | |
2101 | ||
2102 | wordfree (&result); | |
2103 | return status; | |
2104 | @} | |
2105 | @end smallexample | |
2106 | ||
28f540f4 RM |
2107 | @node Tilde Expansion |
2108 | @subsection Details of Tilde Expansion | |
2109 | ||
2110 | It's a standard part of shell syntax that you can use @samp{~} at the | |
2111 | beginning of a file name to stand for your own home directory. You | |
2112 | can use @samp{~@var{user}} to stand for @var{user}'s home directory. | |
2113 | ||
2114 | @dfn{Tilde expansion} is the process of converting these abbreviations | |
2115 | to the directory names that they stand for. | |
2116 | ||
2117 | Tilde expansion applies to the @samp{~} plus all following characters up | |
2118 | to whitespace or a slash. It takes place only at the beginning of a | |
2119 | word, and only if none of the characters to be transformed is quoted in | |
2120 | any way. | |
2121 | ||
2122 | Plain @samp{~} uses the value of the environment variable @code{HOME} | |
2123 | as the proper home directory name. @samp{~} followed by a user name | |
2124 | uses @code{getpwname} to look up that user in the user database, and | |
2125 | uses whatever directory is recorded there. Thus, @samp{~} followed | |
2126 | by your own name can give different results from plain @samp{~}, if | |
2127 | the value of @code{HOME} is not really your home directory. | |
2128 | ||
2129 | @node Variable Substitution | |
2130 | @subsection Details of Variable Substitution | |
2131 | ||
2132 | Part of ordinary shell syntax is the use of @samp{$@var{variable}} to | |
2133 | substitute the value of a shell variable into a command. This is called | |
2134 | @dfn{variable substitution}, and it is one part of doing word expansion. | |
2135 | ||
2136 | There are two basic ways you can write a variable reference for | |
2137 | substitution: | |
2138 | ||
2139 | @table @code | |
2140 | @item $@{@var{variable}@} | |
2141 | If you write braces around the variable name, then it is completely | |
2142 | unambiguous where the variable name ends. You can concatenate | |
2143 | additional letters onto the end of the variable value by writing them | |
2144 | immediately after the close brace. For example, @samp{$@{foo@}s} | |
2145 | expands into @samp{tractors}. | |
2146 | ||
2147 | @item $@var{variable} | |
2148 | If you do not put braces around the variable name, then the variable | |
2149 | name consists of all the alphanumeric characters and underscores that | |
2150 | follow the @samp{$}. The next punctuation character ends the variable | |
2151 | name. Thus, @samp{$foo-bar} refers to the variable @code{foo} and expands | |
2152 | into @samp{tractor-bar}. | |
2153 | @end table | |
2154 | ||
2155 | When you use braces, you can also use various constructs to modify the | |
2156 | value that is substituted, or test it in various ways. | |
2157 | ||
2158 | @table @code | |
2159 | @item $@{@var{variable}:-@var{default}@} | |
2160 | Substitute the value of @var{variable}, but if that is empty or | |
2161 | undefined, use @var{default} instead. | |
2162 | ||
2163 | @item $@{@var{variable}:=@var{default}@} | |
2164 | Substitute the value of @var{variable}, but if that is empty or | |
2165 | undefined, use @var{default} instead and set the variable to | |
2166 | @var{default}. | |
2167 | ||
2168 | @item $@{@var{variable}:?@var{message}@} | |
2169 | If @var{variable} is defined and not empty, substitute its value. | |
2170 | ||
2171 | Otherwise, print @var{message} as an error message on the standard error | |
2172 | stream, and consider word expansion a failure. | |
2173 | ||
2174 | @c ??? How does wordexp report such an error? | |
14eb5d5d | 2175 | @c WRDE_BADVAL is returned. |
28f540f4 RM |
2176 | |
2177 | @item $@{@var{variable}:+@var{replacement}@} | |
2178 | Substitute @var{replacement}, but only if @var{variable} is defined and | |
2179 | nonempty. Otherwise, substitute nothing for this construct. | |
2180 | @end table | |
2181 | ||
2182 | @table @code | |
2183 | @item $@{#@var{variable}@} | |
2184 | Substitute a numeral which expresses in base ten the number of | |
2185 | characters in the value of @var{variable}. @samp{$@{#foo@}} stands for | |
2186 | @samp{7}, because @samp{tractor} is seven characters. | |
2187 | @end table | |
2188 | ||
2189 | These variants of variable substitution let you remove part of the | |
6d52618b | 2190 | variable's value before substituting it. The @var{prefix} and |
28f540f4 RM |
2191 | @var{suffix} are not mere strings; they are wildcard patterns, just |
2192 | like the patterns that you use to match multiple file names. But | |
2193 | in this context, they match against parts of the variable value | |
2194 | rather than against file names. | |
2195 | ||
2196 | @table @code | |
2197 | @item $@{@var{variable}%%@var{suffix}@} | |
2198 | Substitute the value of @var{variable}, but first discard from that | |
2199 | variable any portion at the end that matches the pattern @var{suffix}. | |
2200 | ||
2201 | If there is more than one alternative for how to match against | |
2202 | @var{suffix}, this construct uses the longest possible match. | |
2203 | ||
2204 | Thus, @samp{$@{foo%%r*@}} substitutes @samp{t}, because the largest | |
2205 | match for @samp{r*} at the end of @samp{tractor} is @samp{ractor}. | |
2206 | ||
2207 | @item $@{@var{variable}%@var{suffix}@} | |
2208 | Substitute the value of @var{variable}, but first discard from that | |
2209 | variable any portion at the end that matches the pattern @var{suffix}. | |
2210 | ||
2211 | If there is more than one alternative for how to match against | |
2212 | @var{suffix}, this construct uses the shortest possible alternative. | |
2213 | ||
d6b3602f | 2214 | Thus, @samp{$@{foo%r*@}} substitutes @samp{tracto}, because the shortest |
28f540f4 RM |
2215 | match for @samp{r*} at the end of @samp{tractor} is just @samp{r}. |
2216 | ||
2217 | @item $@{@var{variable}##@var{prefix}@} | |
2218 | Substitute the value of @var{variable}, but first discard from that | |
2219 | variable any portion at the beginning that matches the pattern @var{prefix}. | |
2220 | ||
2221 | If there is more than one alternative for how to match against | |
2222 | @var{prefix}, this construct uses the longest possible match. | |
2223 | ||
d6b3602f RM |
2224 | Thus, @samp{$@{foo##*t@}} substitutes @samp{or}, because the largest |
2225 | match for @samp{*t} at the beginning of @samp{tractor} is @samp{tract}. | |
28f540f4 RM |
2226 | |
2227 | @item $@{@var{variable}#@var{prefix}@} | |
2228 | Substitute the value of @var{variable}, but first discard from that | |
2229 | variable any portion at the beginning that matches the pattern @var{prefix}. | |
2230 | ||
2231 | If there is more than one alternative for how to match against | |
2232 | @var{prefix}, this construct uses the shortest possible alternative. | |
2233 | ||
d6b3602f RM |
2234 | Thus, @samp{$@{foo#*t@}} substitutes @samp{ractor}, because the shortest |
2235 | match for @samp{*t} at the beginning of @samp{tractor} is just @samp{t}. | |
28f540f4 | 2236 | |
14eb5d5d | 2237 | @end table |