]>
Commit | Line | Data |
---|---|---|
28f540f4 RM |
1 | @node String and Array Utilities, Extended Characters, Character Handling, Top |
2 | @chapter String and Array Utilities | |
3 | ||
4 | Operations on strings (or arrays of characters) are an important part of | |
5 | many programs. The GNU C library provides an extensive set of string | |
6 | utility functions, including functions for copying, concatenating, | |
7 | comparing, and searching strings. Many of these functions can also | |
8 | operate on arbitrary regions of storage; for example, the @code{memcpy} | |
a5113b14 | 9 | function can be used to copy the contents of any kind of array. |
28f540f4 RM |
10 | |
11 | It's fairly common for beginning C programmers to ``reinvent the wheel'' | |
12 | by duplicating this functionality in their own code, but it pays to | |
13 | become familiar with the library functions and to make use of them, | |
14 | since this offers benefits in maintenance, efficiency, and portability. | |
15 | ||
16 | For instance, you could easily compare one string to another in two | |
17 | lines of C code, but if you use the built-in @code{strcmp} function, | |
18 | you're less likely to make a mistake. And, since these library | |
19 | functions are typically highly optimized, your program may run faster | |
20 | too. | |
21 | ||
22 | @menu | |
23 | * Representation of Strings:: Introduction to basic concepts. | |
24 | * String/Array Conventions:: Whether to use a string function or an | |
25 | arbitrary array function. | |
26 | * String Length:: Determining the length of a string. | |
27 | * Copying and Concatenation:: Functions to copy the contents of strings | |
28 | and arrays. | |
29 | * String/Array Comparison:: Functions for byte-wise and character-wise | |
30 | comparison. | |
31 | * Collation Functions:: Functions for collating strings. | |
32 | * Search Functions:: Searching for a specific element or substring. | |
33 | * Finding Tokens in a String:: Splitting a string into tokens by looking | |
34 | for delimiters. | |
b4012b75 | 35 | * Encode Binary Data:: Encoding and Decoding of Binary Data. |
b13927da | 36 | * Argz and Envz Vectors:: Null-separated string vectors. |
28f540f4 RM |
37 | @end menu |
38 | ||
b4012b75 | 39 | @node Representation of Strings |
28f540f4 RM |
40 | @section Representation of Strings |
41 | @cindex string, representation of | |
42 | ||
43 | This section is a quick summary of string concepts for beginning C | |
44 | programmers. It describes how character strings are represented in C | |
45 | and some common pitfalls. If you are already familiar with this | |
46 | material, you can skip this section. | |
47 | ||
48 | @cindex string | |
49 | @cindex null character | |
50 | A @dfn{string} is an array of @code{char} objects. But string-valued | |
51 | variables are usually declared to be pointers of type @code{char *}. | |
52 | Such variables do not include space for the text of a string; that has | |
53 | to be stored somewhere else---in an array variable, a string constant, | |
54 | or dynamically allocated memory (@pxref{Memory Allocation}). It's up to | |
55 | you to store the address of the chosen memory space into the pointer | |
56 | variable. Alternatively you can store a @dfn{null pointer} in the | |
57 | pointer variable. The null pointer does not point anywhere, so | |
58 | attempting to reference the string it points to gets an error. | |
59 | ||
60 | By convention, a @dfn{null character}, @code{'\0'}, marks the end of a | |
61 | string. For example, in testing to see whether the @code{char *} | |
62 | variable @var{p} points to a null character marking the end of a string, | |
63 | you can write @code{!*@var{p}} or @code{*@var{p} == '\0'}. | |
64 | ||
65 | A null character is quite different conceptually from a null pointer, | |
66 | although both are represented by the integer @code{0}. | |
67 | ||
68 | @cindex string literal | |
69 | @dfn{String literals} appear in C program source as strings of | |
f65fd747 | 70 | characters between double-quote characters (@samp{"}). In @w{ISO C}, |
28f540f4 RM |
71 | string literals can also be formed by @dfn{string concatenation}: |
72 | @code{"a" "b"} is the same as @code{"ab"}. Modification of string | |
73 | literals is not allowed by the GNU C compiler, because literals | |
74 | are placed in read-only storage. | |
75 | ||
76 | Character arrays that are declared @code{const} cannot be modified | |
77 | either. It's generally good style to declare non-modifiable string | |
78 | pointers to be of type @code{const char *}, since this often allows the | |
79 | C compiler to detect accidental modifications as well as providing some | |
80 | amount of documentation about what your program intends to do with the | |
81 | string. | |
82 | ||
83 | The amount of memory allocated for the character array may extend past | |
84 | the null character that normally marks the end of the string. In this | |
85 | document, the term @dfn{allocation size} is always used to refer to the | |
86 | total amount of memory allocated for the string, while the term | |
87 | @dfn{length} refers to the number of characters up to (but not | |
88 | including) the terminating null character. | |
89 | @cindex length of string | |
90 | @cindex allocation size of string | |
91 | @cindex size of string | |
92 | @cindex string length | |
93 | @cindex string allocation | |
94 | ||
95 | A notorious source of program bugs is trying to put more characters in a | |
96 | string than fit in its allocated size. When writing code that extends | |
97 | strings or moves characters into a pre-allocated array, you should be | |
98 | very careful to keep track of the length of the text and make explicit | |
99 | checks for overflowing the array. Many of the library functions | |
100 | @emph{do not} do this for you! Remember also that you need to allocate | |
101 | an extra byte to hold the null character that marks the end of the | |
102 | string. | |
103 | ||
b4012b75 | 104 | @node String/Array Conventions |
28f540f4 RM |
105 | @section String and Array Conventions |
106 | ||
107 | This chapter describes both functions that work on arbitrary arrays or | |
108 | blocks of memory, and functions that are specific to null-terminated | |
109 | arrays of characters. | |
110 | ||
111 | Functions that operate on arbitrary blocks of memory have names | |
112 | beginning with @samp{mem} (such as @code{memcpy}) and invariably take an | |
113 | argument which specifies the size (in bytes) of the block of memory to | |
114 | operate on. The array arguments and return values for these functions | |
115 | have type @code{void *}, and as a matter of style, the elements of these | |
116 | arrays are referred to as ``bytes''. You can pass any kind of pointer | |
117 | to these functions, and the @code{sizeof} operator is useful in | |
118 | computing the value for the size argument. | |
119 | ||
120 | In contrast, functions that operate specifically on strings have names | |
121 | beginning with @samp{str} (such as @code{strcpy}) and look for a null | |
122 | character to terminate the string instead of requiring an explicit size | |
123 | argument to be passed. (Some of these functions accept a specified | |
124 | maximum length, but they also check for premature termination with a | |
125 | null character.) The array arguments and return values for these | |
126 | functions have type @code{char *}, and the array elements are referred | |
127 | to as ``characters''. | |
128 | ||
129 | In many cases, there are both @samp{mem} and @samp{str} versions of a | |
130 | function. The one that is more appropriate to use depends on the exact | |
131 | situation. When your program is manipulating arbitrary arrays or blocks of | |
132 | storage, then you should always use the @samp{mem} functions. On the | |
133 | other hand, when you are manipulating null-terminated strings it is | |
134 | usually more convenient to use the @samp{str} functions, unless you | |
135 | already know the length of the string in advance. | |
136 | ||
b4012b75 | 137 | @node String Length |
28f540f4 RM |
138 | @section String Length |
139 | ||
140 | You can get the length of a string using the @code{strlen} function. | |
141 | This function is declared in the header file @file{string.h}. | |
142 | @pindex string.h | |
143 | ||
144 | @comment string.h | |
f65fd747 | 145 | @comment ISO |
28f540f4 RM |
146 | @deftypefun size_t strlen (const char *@var{s}) |
147 | The @code{strlen} function returns the length of the null-terminated | |
148 | string @var{s}. (In other words, it returns the offset of the terminating | |
149 | null character within the array.) | |
150 | ||
151 | For example, | |
152 | @smallexample | |
153 | strlen ("hello, world") | |
154 | @result{} 12 | |
155 | @end smallexample | |
156 | ||
157 | When applied to a character array, the @code{strlen} function returns | |
158 | the length of the string stored there, not its allocation size. You can | |
159 | get the allocation size of the character array that holds a string using | |
160 | the @code{sizeof} operator: | |
161 | ||
162 | @smallexample | |
a5113b14 | 163 | char string[32] = "hello, world"; |
28f540f4 RM |
164 | sizeof (string) |
165 | @result{} 32 | |
166 | strlen (string) | |
167 | @result{} 12 | |
168 | @end smallexample | |
169 | @end deftypefun | |
170 | ||
b4012b75 | 171 | @node Copying and Concatenation |
28f540f4 RM |
172 | @section Copying and Concatenation |
173 | ||
174 | You can use the functions described in this section to copy the contents | |
175 | of strings and arrays, or to append the contents of one string to | |
176 | another. These functions are declared in the header file | |
177 | @file{string.h}. | |
178 | @pindex string.h | |
179 | @cindex copying strings and arrays | |
180 | @cindex string copy functions | |
181 | @cindex array copy functions | |
182 | @cindex concatenating strings | |
183 | @cindex string concatenation functions | |
184 | ||
185 | A helpful way to remember the ordering of the arguments to the functions | |
186 | in this section is that it corresponds to an assignment expression, with | |
187 | the destination array specified to the left of the source array. All | |
188 | of these functions return the address of the destination array. | |
189 | ||
190 | Most of these functions do not work properly if the source and | |
191 | destination arrays overlap. For example, if the beginning of the | |
192 | destination array overlaps the end of the source array, the original | |
193 | contents of that part of the source array may get overwritten before it | |
194 | is copied. Even worse, in the case of the string functions, the null | |
195 | character marking the end of the string may be lost, and the copy | |
196 | function might get stuck in a loop trashing all the memory allocated to | |
197 | your program. | |
198 | ||
199 | All functions that have problems copying between overlapping arrays are | |
200 | explicitly identified in this manual. In addition to functions in this | |
201 | section, there are a few others like @code{sprintf} (@pxref{Formatted | |
202 | Output Functions}) and @code{scanf} (@pxref{Formatted Input | |
203 | Functions}). | |
204 | ||
205 | @comment string.h | |
f65fd747 | 206 | @comment ISO |
28f540f4 RM |
207 | @deftypefun {void *} memcpy (void *@var{to}, const void *@var{from}, size_t @var{size}) |
208 | The @code{memcpy} function copies @var{size} bytes from the object | |
209 | beginning at @var{from} into the object beginning at @var{to}. The | |
210 | behavior of this function is undefined if the two arrays @var{to} and | |
211 | @var{from} overlap; use @code{memmove} instead if overlapping is possible. | |
212 | ||
213 | The value returned by @code{memcpy} is the value of @var{to}. | |
214 | ||
215 | Here is an example of how you might use @code{memcpy} to copy the | |
216 | contents of an array: | |
217 | ||
218 | @smallexample | |
219 | struct foo *oldarray, *newarray; | |
220 | int arraysize; | |
221 | @dots{} | |
222 | memcpy (new, old, arraysize * sizeof (struct foo)); | |
223 | @end smallexample | |
224 | @end deftypefun | |
225 | ||
226 | @comment string.h | |
f65fd747 | 227 | @comment ISO |
28f540f4 RM |
228 | @deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size}) |
229 | @code{memmove} copies the @var{size} bytes at @var{from} into the | |
230 | @var{size} bytes at @var{to}, even if those two blocks of space | |
231 | overlap. In the case of overlap, @code{memmove} is careful to copy the | |
232 | original values of the bytes in the block at @var{from}, including those | |
233 | bytes which also belong to the block at @var{to}. | |
234 | @end deftypefun | |
235 | ||
236 | @comment string.h | |
237 | @comment SVID | |
238 | @deftypefun {void *} memccpy (void *@var{to}, const void *@var{from}, int @var{c}, size_t @var{size}) | |
239 | This function copies no more than @var{size} bytes from @var{from} to | |
240 | @var{to}, stopping if a byte matching @var{c} is found. The return | |
241 | value is a pointer into @var{to} one byte past where @var{c} was copied, | |
242 | or a null pointer if no byte matching @var{c} appeared in the first | |
243 | @var{size} bytes of @var{from}. | |
244 | @end deftypefun | |
245 | ||
246 | @comment string.h | |
f65fd747 | 247 | @comment ISO |
28f540f4 RM |
248 | @deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size}) |
249 | This function copies the value of @var{c} (converted to an | |
250 | @code{unsigned char}) into each of the first @var{size} bytes of the | |
251 | object beginning at @var{block}. It returns the value of @var{block}. | |
252 | @end deftypefun | |
253 | ||
254 | @comment string.h | |
f65fd747 | 255 | @comment ISO |
28f540f4 RM |
256 | @deftypefun {char *} strcpy (char *@var{to}, const char *@var{from}) |
257 | This copies characters from the string @var{from} (up to and including | |
258 | the terminating null character) into the string @var{to}. Like | |
259 | @code{memcpy}, this function has undefined results if the strings | |
260 | overlap. The return value is the value of @var{to}. | |
261 | @end deftypefun | |
262 | ||
263 | @comment string.h | |
f65fd747 | 264 | @comment ISO |
28f540f4 RM |
265 | @deftypefun {char *} strncpy (char *@var{to}, const char *@var{from}, size_t @var{size}) |
266 | This function is similar to @code{strcpy} but always copies exactly | |
267 | @var{size} characters into @var{to}. | |
268 | ||
269 | If the length of @var{from} is more than @var{size}, then @code{strncpy} | |
270 | copies just the first @var{size} characters. Note that in this case | |
271 | there is no null terminator written into @var{to}. | |
272 | ||
273 | If the length of @var{from} is less than @var{size}, then @code{strncpy} | |
274 | copies all of @var{from}, followed by enough null characters to add up | |
275 | to @var{size} characters in all. This behavior is rarely useful, but it | |
f65fd747 | 276 | is specified by the @w{ISO C} standard. |
28f540f4 RM |
277 | |
278 | The behavior of @code{strncpy} is undefined if the strings overlap. | |
279 | ||
280 | Using @code{strncpy} as opposed to @code{strcpy} is a way to avoid bugs | |
281 | relating to writing past the end of the allocated space for @var{to}. | |
282 | However, it can also make your program much slower in one common case: | |
283 | copying a string which is probably small into a potentially large buffer. | |
284 | In this case, @var{size} may be large, and when it is, @code{strncpy} will | |
285 | waste a considerable amount of time copying null characters. | |
286 | @end deftypefun | |
287 | ||
288 | @comment string.h | |
289 | @comment SVID | |
290 | @deftypefun {char *} strdup (const char *@var{s}) | |
291 | This function copies the null-terminated string @var{s} into a newly | |
292 | allocated string. The string is allocated using @code{malloc}; see | |
293 | @ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space | |
294 | for the new string, @code{strdup} returns a null pointer. Otherwise it | |
295 | returns a pointer to the new string. | |
296 | @end deftypefun | |
297 | ||
706074a5 UD |
298 | @comment string.h |
299 | @comment GNU | |
300 | @deftypefun {char *} strndup (const char *@var{s}, size_t @var{size}) | |
301 | This function is similar to @code{strdup} but always copies at most | |
302 | @var{size} characters into the newly allocated string. | |
303 | ||
304 | If the length of @var{s} is more than @var{size}, then @code{strndup} | |
305 | copies just the first @var{size} characters and adds a closing null | |
306 | terminator. Otherwise all characters are copied and the string is | |
307 | terminated. | |
308 | ||
309 | This function is different to @code{strncpy} in that it always | |
310 | terminates the destination string. | |
311 | @end deftypefun | |
312 | ||
28f540f4 RM |
313 | @comment string.h |
314 | @comment Unknown origin | |
315 | @deftypefun {char *} stpcpy (char *@var{to}, const char *@var{from}) | |
316 | This function is like @code{strcpy}, except that it returns a pointer to | |
317 | the end of the string @var{to} (that is, the address of the terminating | |
318 | null character) rather than the beginning. | |
319 | ||
320 | For example, this program uses @code{stpcpy} to concatenate @samp{foo} | |
321 | and @samp{bar} to produce @samp{foobar}, which it then prints. | |
322 | ||
323 | @smallexample | |
324 | @include stpcpy.c.texi | |
325 | @end smallexample | |
326 | ||
f65fd747 | 327 | This function is not part of the ISO or POSIX standards, and is not |
28f540f4 RM |
328 | customary on Unix systems, but we did not invent it either. Perhaps it |
329 | comes from MS-DOG. | |
330 | ||
331 | Its behavior is undefined if the strings overlap. | |
332 | @end deftypefun | |
333 | ||
706074a5 UD |
334 | @comment string.h |
335 | @comment GNU | |
336 | @deftypefun {char *} stpncpy (char *@var{to}, const char *@var{from}, size_t @var{size}) | |
337 | This function is similar to @code{stpcpy} but copies always exactly | |
338 | @var{size} characters into @var{to}. | |
339 | ||
340 | If the length of @var{from} is more then @var{size}, then @code{stpncpy} | |
341 | copies just the first @var{size} characters and returns a pointer to the | |
342 | character directly following the one which was copied last. Note that in | |
343 | this case there is no null terminator written into @var{to}. | |
344 | ||
345 | If the length of @var{from} is less than @var{size}, then @code{stpncpy} | |
346 | copies all of @var{from}, followed by enough null characters to add up | |
347 | to @var{size} characters in all. This behaviour is rarely useful, but it | |
348 | is implemented to be useful in contexts where this behaviour of the | |
349 | @code{strncpy} is used. @code{stpncpy} returns a pointer to the | |
350 | @emph{first} written null character. | |
351 | ||
f65fd747 | 352 | This function is not part of ISO or POSIX but was found useful while |
706074a5 UD |
353 | developing GNU C Library itself. |
354 | ||
355 | Its behaviour is undefined if the strings overlap. | |
356 | @end deftypefun | |
357 | ||
358 | @comment string.h | |
359 | @comment GNU | |
360 | @deftypefun {char *} strdupa (const char *@var{s}) | |
361 | This function is similar to @code{strdup} but allocates the new string | |
362 | using @code{alloca} instead of @code{malloc} | |
363 | @pxref{Variable Size Automatic}. This means of course the returned | |
364 | string has the same limitations as any block of memory allocated using | |
365 | @code{alloca}. | |
366 | ||
367 | For obvious reasons @code{strdupa} is implemented only as a macro. I.e., | |
40a55d20 | 368 | you cannot get the address of this function. Despite this limitation |
706074a5 UD |
369 | it is a useful function. The following code shows a situation where |
370 | using @code{malloc} would be a lot more expensive. | |
371 | ||
372 | @smallexample | |
373 | @include strdupa.c.texi | |
374 | @end smallexample | |
375 | ||
376 | Please note that calling @code{strtok} using @var{path} directly is | |
40a55d20 | 377 | invalid. |
706074a5 UD |
378 | |
379 | This function is only available if GNU CC is used. | |
380 | @end deftypefun | |
381 | ||
382 | @comment string.h | |
383 | @comment GNU | |
384 | @deftypefun {char *} strndupa (const char *@var{s}, size_t @var{size}) | |
385 | This function is similar to @code{strndup} but like @code{strdupa} it | |
386 | allocates the new string using @code{alloca} | |
387 | @pxref{Variable Size Automatic}. The same advantages and limitations | |
388 | of @code{strdupa} are valid for @code{strndupa}, too. | |
389 | ||
390 | This function is implemented only as a macro which means one cannot | |
391 | get the address of it. | |
392 | ||
393 | @code{strndupa} is only available if GNU CC is used. | |
394 | @end deftypefun | |
395 | ||
28f540f4 | 396 | @comment string.h |
f65fd747 | 397 | @comment ISO |
28f540f4 RM |
398 | @deftypefun {char *} strcat (char *@var{to}, const char *@var{from}) |
399 | The @code{strcat} function is similar to @code{strcpy}, except that the | |
400 | characters from @var{from} are concatenated or appended to the end of | |
401 | @var{to}, instead of overwriting it. That is, the first character from | |
402 | @var{from} overwrites the null character marking the end of @var{to}. | |
403 | ||
404 | An equivalent definition for @code{strcat} would be: | |
405 | ||
406 | @smallexample | |
407 | char * | |
408 | strcat (char *to, const char *from) | |
409 | @{ | |
410 | strcpy (to + strlen (to), from); | |
411 | return to; | |
412 | @} | |
413 | @end smallexample | |
414 | ||
415 | This function has undefined results if the strings overlap. | |
416 | @end deftypefun | |
417 | ||
418 | @comment string.h | |
f65fd747 | 419 | @comment ISO |
28f540f4 RM |
420 | @deftypefun {char *} strncat (char *@var{to}, const char *@var{from}, size_t @var{size}) |
421 | This function is like @code{strcat} except that not more than @var{size} | |
422 | characters from @var{from} are appended to the end of @var{to}. A | |
423 | single null character is also always appended to @var{to}, so the total | |
424 | allocated size of @var{to} must be at least @code{@var{size} + 1} bytes | |
425 | longer than its initial length. | |
426 | ||
427 | The @code{strncat} function could be implemented like this: | |
428 | ||
429 | @smallexample | |
430 | @group | |
431 | char * | |
432 | strncat (char *to, const char *from, size_t size) | |
433 | @{ | |
434 | strncpy (to + strlen (to), from, size); | |
435 | return to; | |
436 | @} | |
437 | @end group | |
438 | @end smallexample | |
439 | ||
440 | The behavior of @code{strncat} is undefined if the strings overlap. | |
441 | @end deftypefun | |
442 | ||
443 | Here is an example showing the use of @code{strncpy} and @code{strncat}. | |
444 | Notice how, in the call to @code{strncat}, the @var{size} parameter | |
445 | is computed to avoid overflowing the character array @code{buffer}. | |
446 | ||
447 | @smallexample | |
448 | @include strncat.c.texi | |
449 | @end smallexample | |
450 | ||
451 | @noindent | |
452 | The output produced by this program looks like: | |
453 | ||
454 | @smallexample | |
455 | hello | |
456 | hello, wo | |
457 | @end smallexample | |
458 | ||
459 | @comment string.h | |
460 | @comment BSD | |
461 | @deftypefun {void *} bcopy (void *@var{from}, const void *@var{to}, size_t @var{size}) | |
462 | This is a partially obsolete alternative for @code{memmove}, derived from | |
463 | BSD. Note that it is not quite equivalent to @code{memmove}, because the | |
464 | arguments are not in the same order. | |
465 | @end deftypefun | |
466 | ||
467 | @comment string.h | |
468 | @comment BSD | |
469 | @deftypefun {void *} bzero (void *@var{block}, size_t @var{size}) | |
470 | This is a partially obsolete alternative for @code{memset}, derived from | |
471 | BSD. Note that it is not as general as @code{memset}, because the only | |
472 | value it can store is zero. | |
473 | @end deftypefun | |
474 | ||
b4012b75 | 475 | @node String/Array Comparison |
28f540f4 RM |
476 | @section String/Array Comparison |
477 | @cindex comparing strings and arrays | |
478 | @cindex string comparison functions | |
479 | @cindex array comparison functions | |
480 | @cindex predicates on strings | |
481 | @cindex predicates on arrays | |
482 | ||
483 | You can use the functions in this section to perform comparisons on the | |
484 | contents of strings and arrays. As well as checking for equality, these | |
485 | functions can also be used as the ordering functions for sorting | |
486 | operations. @xref{Searching and Sorting}, for an example of this. | |
487 | ||
488 | Unlike most comparison operations in C, the string comparison functions | |
489 | return a nonzero value if the strings are @emph{not} equivalent rather | |
490 | than if they are. The sign of the value indicates the relative ordering | |
491 | of the first characters in the strings that are not equivalent: a | |
492 | negative value indicates that the first string is ``less'' than the | |
a5113b14 | 493 | second, while a positive value indicates that the first string is |
28f540f4 RM |
494 | ``greater''. |
495 | ||
496 | The most common use of these functions is to check only for equality. | |
497 | This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}. | |
498 | ||
499 | All of these functions are declared in the header file @file{string.h}. | |
500 | @pindex string.h | |
501 | ||
502 | @comment string.h | |
f65fd747 | 503 | @comment ISO |
28f540f4 RM |
504 | @deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size}) |
505 | The function @code{memcmp} compares the @var{size} bytes of memory | |
506 | beginning at @var{a1} against the @var{size} bytes of memory beginning | |
507 | at @var{a2}. The value returned has the same sign as the difference | |
508 | between the first differing pair of bytes (interpreted as @code{unsigned | |
509 | char} objects, then promoted to @code{int}). | |
510 | ||
511 | If the contents of the two blocks are equal, @code{memcmp} returns | |
512 | @code{0}. | |
513 | @end deftypefun | |
514 | ||
515 | On arbitrary arrays, the @code{memcmp} function is mostly useful for | |
516 | testing equality. It usually isn't meaningful to do byte-wise ordering | |
517 | comparisons on arrays of things other than bytes. For example, a | |
518 | byte-wise comparison on the bytes that make up floating-point numbers | |
519 | isn't likely to tell you anything about the relationship between the | |
520 | values of the floating-point numbers. | |
521 | ||
522 | You should also be careful about using @code{memcmp} to compare objects | |
523 | that can contain ``holes'', such as the padding inserted into structure | |
524 | objects to enforce alignment requirements, extra space at the end of | |
525 | unions, and extra characters at the ends of strings whose length is less | |
526 | than their allocated size. The contents of these ``holes'' are | |
527 | indeterminate and may cause strange behavior when performing byte-wise | |
528 | comparisons. For more predictable results, perform an explicit | |
529 | component-wise comparison. | |
530 | ||
531 | For example, given a structure type definition like: | |
532 | ||
533 | @smallexample | |
534 | struct foo | |
535 | @{ | |
536 | unsigned char tag; | |
537 | union | |
538 | @{ | |
539 | double f; | |
540 | long i; | |
541 | char *p; | |
542 | @} value; | |
543 | @}; | |
544 | @end smallexample | |
545 | ||
546 | @noindent | |
547 | you are better off writing a specialized comparison function to compare | |
548 | @code{struct foo} objects instead of comparing them with @code{memcmp}. | |
549 | ||
550 | @comment string.h | |
f65fd747 | 551 | @comment ISO |
28f540f4 RM |
552 | @deftypefun int strcmp (const char *@var{s1}, const char *@var{s2}) |
553 | The @code{strcmp} function compares the string @var{s1} against | |
554 | @var{s2}, returning a value that has the same sign as the difference | |
555 | between the first differing pair of characters (interpreted as | |
556 | @code{unsigned char} objects, then promoted to @code{int}). | |
557 | ||
558 | If the two strings are equal, @code{strcmp} returns @code{0}. | |
559 | ||
560 | A consequence of the ordering used by @code{strcmp} is that if @var{s1} | |
561 | is an initial substring of @var{s2}, then @var{s1} is considered to be | |
562 | ``less than'' @var{s2}. | |
563 | @end deftypefun | |
564 | ||
565 | @comment string.h | |
566 | @comment BSD | |
567 | @deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2}) | |
568 | This function is like @code{strcmp}, except that differences in case | |
569 | are ignored. | |
570 | ||
571 | @code{strcasecmp} is derived from BSD. | |
572 | @end deftypefun | |
573 | ||
574 | @comment string.h | |
575 | @comment BSD | |
576 | @deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n}) | |
577 | This function is like @code{strncmp}, except that differences in case | |
578 | are ignored. | |
579 | ||
580 | @code{strncasecmp} is a GNU extension. | |
581 | @end deftypefun | |
582 | ||
583 | @comment string.h | |
f65fd747 | 584 | @comment ISO |
28f540f4 RM |
585 | @deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size}) |
586 | This function is the similar to @code{strcmp}, except that no more than | |
587 | @var{size} characters are compared. In other words, if the two strings are | |
588 | the same in their first @var{size} characters, the return value is zero. | |
589 | @end deftypefun | |
590 | ||
591 | Here are some examples showing the use of @code{strcmp} and @code{strncmp}. | |
592 | These examples assume the use of the ASCII character set. (If some | |
593 | other character set---say, EBCDIC---is used instead, then the glyphs | |
594 | are associated with different numeric codes, and the return values | |
595 | and ordering may differ.) | |
596 | ||
597 | @smallexample | |
598 | strcmp ("hello", "hello") | |
599 | @result{} 0 /* @r{These two strings are the same.} */ | |
600 | strcmp ("hello", "Hello") | |
601 | @result{} 32 /* @r{Comparisons are case-sensitive.} */ | |
602 | strcmp ("hello", "world") | |
603 | @result{} -15 /* @r{The character @code{'h'} comes before @code{'w'}.} */ | |
604 | strcmp ("hello", "hello, world") | |
605 | @result{} -44 /* @r{Comparing a null character against a comma.} */ | |
6952e59e | 606 | strncmp ("hello", "hello, world", 5) |
28f540f4 RM |
607 | @result{} 0 /* @r{The initial 5 characters are the same.} */ |
608 | strncmp ("hello, world", "hello, stupid world!!!", 5) | |
609 | @result{} 0 /* @r{The initial 5 characters are the same.} */ | |
610 | @end smallexample | |
611 | ||
1f205a47 UD |
612 | @comment string.h |
613 | @comment GNU | |
614 | @deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2}) | |
615 | The @code{strverscmp} function compares the string @var{s1} against | |
616 | @var{s2}, considering them as holding indices/version numbers. Return | |
617 | value follows the same conventions as found in the @code{strverscmp} | |
618 | function. In fact, if @var{s1} and @var{s2} contain no digits, | |
619 | @code{strverscmp} behaves like @code{strcmp}. | |
620 | ||
621 | Basically, we compare strings normaly (character by character), until | |
622 | we find a digit in each string - then we enter a special comparison | |
623 | mode, where each sequence of digit is taken as a whole. If we reach the | |
624 | end of these two parts without noticing a difference, we return to the | |
625 | standard comparison mode. There are two types of numeric parts: | |
626 | "integral" and "fractionnal" (these laters begins with a '0'). The types | |
627 | of the numeric parts affect the way we sort them: | |
628 | ||
629 | @itemize @bullet | |
630 | @item | |
631 | integral/integral: we compare values as you would expect. | |
632 | ||
633 | @item | |
634 | fractionnal/integral: the fractionnal part is less than the integral one. | |
635 | Again, no surprise. | |
636 | ||
637 | @item | |
638 | fractionnal/fractionnal: the things become a bit more complex. | |
639 | if the common prefix contains only leading zeroes, the longest part is less | |
640 | than the other one; else the comparison behaves normaly. | |
641 | @end itemize | |
642 | ||
643 | @smallexample | |
644 | strverscmp ("no digit", "no digit") | |
645 | @result{} 0 /* @r{same behaviour as strverscmp.} */ | |
646 | strverscmp ("item#99", "item#100") | |
647 | @result{} <0 /* @r{same prefix, but 99 < 100.} */ | |
648 | strverscmp ("alpha1", "alpha001") | |
649 | @result{} >0 /* @r{fractionnal part inferior to integral one.} */ | |
650 | strverscmp ("part1_f012", "part1_f01") | |
651 | @result{} >0 /* @r{two fractionnal parts.} */ | |
652 | strverscmp ("foo.009", "foo.0") | |
653 | @result{} <0 /* @r{idem, but with leading zeroes only.} */ | |
654 | @end smallexample | |
655 | ||
656 | This function is especially usefull when dealing with filename sorting, | |
657 | because filenames frequently hold indices/version numbers. | |
658 | ||
659 | @code{strverscmp} is a GNU extension. | |
660 | @end deftypefun | |
661 | ||
28f540f4 RM |
662 | @comment string.h |
663 | @comment BSD | |
664 | @deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size}) | |
665 | This is an obsolete alias for @code{memcmp}, derived from BSD. | |
666 | @end deftypefun | |
667 | ||
b4012b75 | 668 | @node Collation Functions |
28f540f4 RM |
669 | @section Collation Functions |
670 | ||
671 | @cindex collating strings | |
672 | @cindex string collation functions | |
673 | ||
674 | In some locales, the conventions for lexicographic ordering differ from | |
675 | the strict numeric ordering of character codes. For example, in Spanish | |
676 | most glyphs with diacritical marks such as accents are not considered | |
677 | distinct letters for the purposes of collation. On the other hand, the | |
678 | two-character sequence @samp{ll} is treated as a single letter that is | |
679 | collated immediately after @samp{l}. | |
680 | ||
681 | You can use the functions @code{strcoll} and @code{strxfrm} (declared in | |
682 | the header file @file{string.h}) to compare strings using a collation | |
683 | ordering appropriate for the current locale. The locale used by these | |
684 | functions in particular can be specified by setting the locale for the | |
685 | @code{LC_COLLATE} category; see @ref{Locales}. | |
686 | @pindex string.h | |
687 | ||
688 | In the standard C locale, the collation sequence for @code{strcoll} is | |
689 | the same as that for @code{strcmp}. | |
690 | ||
691 | Effectively, the way these functions work is by applying a mapping to | |
692 | transform the characters in a string to a byte sequence that represents | |
693 | the string's position in the collating sequence of the current locale. | |
694 | Comparing two such byte sequences in a simple fashion is equivalent to | |
695 | comparing the strings with the locale's collating sequence. | |
696 | ||
697 | The function @code{strcoll} performs this translation implicitly, in | |
698 | order to do one comparison. By contrast, @code{strxfrm} performs the | |
699 | mapping explicitly. If you are making multiple comparisons using the | |
700 | same string or set of strings, it is likely to be more efficient to use | |
701 | @code{strxfrm} to transform all the strings just once, and subsequently | |
702 | compare the transformed strings with @code{strcmp}. | |
703 | ||
704 | @comment string.h | |
f65fd747 | 705 | @comment ISO |
28f540f4 RM |
706 | @deftypefun int strcoll (const char *@var{s1}, const char *@var{s2}) |
707 | The @code{strcoll} function is similar to @code{strcmp} but uses the | |
708 | collating sequence of the current locale for collation (the | |
709 | @code{LC_COLLATE} locale). | |
710 | @end deftypefun | |
711 | ||
712 | Here is an example of sorting an array of strings, using @code{strcoll} | |
713 | to compare them. The actual sort algorithm is not written here; it | |
714 | comes from @code{qsort} (@pxref{Array Sort Function}). The job of the | |
715 | code shown here is to say how to compare the strings while sorting them. | |
716 | (Later on in this section, we will show a way to do this more | |
717 | efficiently using @code{strxfrm}.) | |
718 | ||
719 | @smallexample | |
720 | /* @r{This is the comparison function used with @code{qsort}.} */ | |
721 | ||
722 | int | |
723 | compare_elements (char **p1, char **p2) | |
724 | @{ | |
725 | return strcoll (*p1, *p2); | |
726 | @} | |
727 | ||
728 | /* @r{This is the entry point---the function to sort} | |
729 | @r{strings using the locale's collating sequence.} */ | |
730 | ||
731 | void | |
732 | sort_strings (char **array, int nstrings) | |
733 | @{ | |
734 | /* @r{Sort @code{temp_array} by comparing the strings.} */ | |
735 | qsort (array, sizeof (char *), | |
736 | nstrings, compare_elements); | |
737 | @} | |
738 | @end smallexample | |
739 | ||
740 | @cindex converting string to collation order | |
741 | @comment string.h | |
f65fd747 | 742 | @comment ISO |
28f540f4 RM |
743 | @deftypefun size_t strxfrm (char *@var{to}, const char *@var{from}, size_t @var{size}) |
744 | The function @code{strxfrm} transforms @var{string} using the collation | |
745 | transformation determined by the locale currently selected for | |
746 | collation, and stores the transformed string in the array @var{to}. Up | |
747 | to @var{size} characters (including a terminating null character) are | |
748 | stored. | |
749 | ||
750 | The behavior is undefined if the strings @var{to} and @var{from} | |
751 | overlap; see @ref{Copying and Concatenation}. | |
752 | ||
753 | The return value is the length of the entire transformed string. This | |
754 | value is not affected by the value of @var{size}, but if it is greater | |
a5113b14 UD |
755 | or equal than @var{size}, it means that the transformed string did not |
756 | entirely fit in the array @var{to}. In this case, only as much of the | |
757 | string as actually fits was stored. To get the whole transformed | |
758 | string, call @code{strxfrm} again with a bigger output array. | |
28f540f4 RM |
759 | |
760 | The transformed string may be longer than the original string, and it | |
761 | may also be shorter. | |
762 | ||
763 | If @var{size} is zero, no characters are stored in @var{to}. In this | |
764 | case, @code{strxfrm} simply returns the number of characters that would | |
765 | be the length of the transformed string. This is useful for determining | |
766 | what size string to allocate. It does not matter what @var{to} is if | |
767 | @var{size} is zero; @var{to} may even be a null pointer. | |
768 | @end deftypefun | |
769 | ||
770 | Here is an example of how you can use @code{strxfrm} when | |
771 | you plan to do many comparisons. It does the same thing as the previous | |
772 | example, but much faster, because it has to transform each string only | |
773 | once, no matter how many times it is compared with other strings. Even | |
774 | the time needed to allocate and free storage is much less than the time | |
775 | we save, when there are many strings. | |
776 | ||
777 | @smallexample | |
778 | struct sorter @{ char *input; char *transformed; @}; | |
779 | ||
780 | /* @r{This is the comparison function used with @code{qsort}} | |
781 | @r{to sort an array of @code{struct sorter}.} */ | |
782 | ||
783 | int | |
784 | compare_elements (struct sorter *p1, struct sorter *p2) | |
785 | @{ | |
786 | return strcmp (p1->transformed, p2->transformed); | |
787 | @} | |
788 | ||
789 | /* @r{This is the entry point---the function to sort} | |
790 | @r{strings using the locale's collating sequence.} */ | |
791 | ||
792 | void | |
793 | sort_strings_fast (char **array, int nstrings) | |
794 | @{ | |
795 | struct sorter temp_array[nstrings]; | |
796 | int i; | |
797 | ||
798 | /* @r{Set up @code{temp_array}. Each element contains} | |
799 | @r{one input string and its transformed string.} */ | |
800 | for (i = 0; i < nstrings; i++) | |
801 | @{ | |
802 | size_t length = strlen (array[i]) * 2; | |
a5113b14 UD |
803 | char *transformed; |
804 | size_t transformed_lenght; | |
28f540f4 RM |
805 | |
806 | temp_array[i].input = array[i]; | |
807 | ||
a5113b14 UD |
808 | /* @r{First try a buffer perhaps big enough.} */ |
809 | transformed = (char *) xmalloc (length); | |
810 | ||
811 | /* @r{Transform @code{array[i]}.} */ | |
812 | transformed_length = strxfrm (transformed, array[i], length); | |
813 | ||
814 | /* @r{If the buffer was not large enough, resize it} | |
815 | @r{and try again.} */ | |
816 | if (transformed_length >= length) | |
28f540f4 | 817 | @{ |
a5113b14 UD |
818 | /* @r{Allocate the needed space. +1 for terminating} |
819 | @r{@code{NUL} character.} */ | |
820 | transformed = (char *) xrealloc (transformed, | |
821 | transformed_length + 1); | |
822 | ||
823 | /* @r{The return value is not interesting because we know} | |
824 | @r{how long the transformed string is.} */ | |
825 | (void) strxfrm (transformed, array[i], transformed_length + 1); | |
28f540f4 | 826 | @} |
a5113b14 UD |
827 | |
828 | temp_array[i].transformed = transformed; | |
28f540f4 RM |
829 | @} |
830 | ||
831 | /* @r{Sort @code{temp_array} by comparing transformed strings.} */ | |
832 | qsort (temp_array, sizeof (struct sorter), | |
833 | nstrings, compare_elements); | |
834 | ||
835 | /* @r{Put the elements back in the permanent array} | |
836 | @r{in their sorted order.} */ | |
837 | for (i = 0; i < nstrings; i++) | |
838 | array[i] = temp_array[i].input; | |
839 | ||
840 | /* @r{Free the strings we allocated.} */ | |
841 | for (i = 0; i < nstrings; i++) | |
842 | free (temp_array[i].transformed); | |
843 | @} | |
844 | @end smallexample | |
845 | ||
846 | @strong{Compatibility Note:} The string collation functions are a new | |
b4012b75 | 847 | feature of @w{ISO C 89}. Older C dialects have no equivalent feature. |
28f540f4 | 848 | |
b4012b75 | 849 | @node Search Functions |
28f540f4 RM |
850 | @section Search Functions |
851 | ||
852 | This section describes library functions which perform various kinds | |
853 | of searching operations on strings and arrays. These functions are | |
854 | declared in the header file @file{string.h}. | |
855 | @pindex string.h | |
856 | @cindex search functions (for strings) | |
857 | @cindex string search functions | |
858 | ||
859 | @comment string.h | |
f65fd747 | 860 | @comment ISO |
28f540f4 RM |
861 | @deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size}) |
862 | This function finds the first occurrence of the byte @var{c} (converted | |
863 | to an @code{unsigned char}) in the initial @var{size} bytes of the | |
864 | object beginning at @var{block}. The return value is a pointer to the | |
865 | located byte, or a null pointer if no match was found. | |
866 | @end deftypefun | |
867 | ||
868 | @comment string.h | |
f65fd747 | 869 | @comment ISO |
28f540f4 RM |
870 | @deftypefun {char *} strchr (const char *@var{string}, int @var{c}) |
871 | The @code{strchr} function finds the first occurrence of the character | |
872 | @var{c} (converted to a @code{char}) in the null-terminated string | |
873 | beginning at @var{string}. The return value is a pointer to the located | |
874 | character, or a null pointer if no match was found. | |
875 | ||
876 | For example, | |
877 | @smallexample | |
878 | strchr ("hello, world", 'l') | |
879 | @result{} "llo, world" | |
880 | strchr ("hello, world", '?') | |
881 | @result{} NULL | |
a5113b14 | 882 | @end smallexample |
28f540f4 RM |
883 | |
884 | The terminating null character is considered to be part of the string, | |
885 | so you can use this function get a pointer to the end of a string by | |
886 | specifying a null character as the value of the @var{c} argument. | |
887 | @end deftypefun | |
888 | ||
889 | @comment string.h | |
890 | @comment BSD | |
891 | @deftypefun {char *} index (const char *@var{string}, int @var{c}) | |
892 | @code{index} is another name for @code{strchr}; they are exactly the same. | |
5649a1d6 UD |
893 | New code should always use @code{strchr} since this name is defined in |
894 | @w{ISO C} while @code{index} is a BSD invention which never was available | |
895 | on @w{System V} derived systems. | |
28f540f4 RM |
896 | @end deftypefun |
897 | ||
898 | @comment string.h | |
f65fd747 | 899 | @comment ISO |
28f540f4 RM |
900 | @deftypefun {char *} strrchr (const char *@var{string}, int @var{c}) |
901 | The function @code{strrchr} is like @code{strchr}, except that it searches | |
902 | backwards from the end of the string @var{string} (instead of forwards | |
903 | from the front). | |
904 | ||
905 | For example, | |
906 | @smallexample | |
907 | strrchr ("hello, world", 'l') | |
908 | @result{} "ld" | |
909 | @end smallexample | |
910 | @end deftypefun | |
911 | ||
912 | @comment string.h | |
913 | @comment BSD | |
914 | @deftypefun {char *} rindex (const char *@var{string}, int @var{c}) | |
915 | @code{rindex} is another name for @code{strrchr}; they are exactly the same. | |
5649a1d6 UD |
916 | New code should always use @code{strrchr} since this name is defined in |
917 | @w{ISO C} while @code{rindex} is a BSD invention which never was available | |
918 | on @w{System V} derived systems. | |
28f540f4 RM |
919 | @end deftypefun |
920 | ||
921 | @comment string.h | |
f65fd747 | 922 | @comment ISO |
28f540f4 RM |
923 | @deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle}) |
924 | This is like @code{strchr}, except that it searches @var{haystack} for a | |
925 | substring @var{needle} rather than just a single character. It | |
926 | returns a pointer into the string @var{haystack} that is the first | |
927 | character of the substring, or a null pointer if no match was found. If | |
928 | @var{needle} is an empty string, the function returns @var{haystack}. | |
929 | ||
930 | For example, | |
931 | @smallexample | |
932 | strstr ("hello, world", "l") | |
933 | @result{} "llo, world" | |
934 | strstr ("hello, world", "wo") | |
935 | @result{} "world" | |
936 | @end smallexample | |
937 | @end deftypefun | |
938 | ||
939 | ||
940 | @comment string.h | |
941 | @comment GNU | |
63551311 | 942 | @deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len}) |
28f540f4 RM |
943 | This is like @code{strstr}, but @var{needle} and @var{haystack} are byte |
944 | arrays rather than null-terminated strings. @var{needle-len} is the | |
945 | length of @var{needle} and @var{haystack-len} is the length of | |
946 | @var{haystack}.@refill | |
947 | ||
948 | This function is a GNU extension. | |
949 | @end deftypefun | |
950 | ||
951 | @comment string.h | |
f65fd747 | 952 | @comment ISO |
28f540f4 RM |
953 | @deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset}) |
954 | The @code{strspn} (``string span'') function returns the length of the | |
955 | initial substring of @var{string} that consists entirely of characters that | |
956 | are members of the set specified by the string @var{skipset}. The order | |
957 | of the characters in @var{skipset} is not important. | |
958 | ||
959 | For example, | |
960 | @smallexample | |
961 | strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz") | |
962 | @result{} 5 | |
963 | @end smallexample | |
964 | @end deftypefun | |
965 | ||
966 | @comment string.h | |
f65fd747 | 967 | @comment ISO |
28f540f4 RM |
968 | @deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset}) |
969 | The @code{strcspn} (``string complement span'') function returns the length | |
970 | of the initial substring of @var{string} that consists entirely of characters | |
971 | that are @emph{not} members of the set specified by the string @var{stopset}. | |
972 | (In other words, it returns the offset of the first character in @var{string} | |
973 | that is a member of the set @var{stopset}.) | |
974 | ||
975 | For example, | |
976 | @smallexample | |
977 | strcspn ("hello, world", " \t\n,.;!?") | |
978 | @result{} 5 | |
979 | @end smallexample | |
980 | @end deftypefun | |
981 | ||
982 | @comment string.h | |
f65fd747 | 983 | @comment ISO |
28f540f4 RM |
984 | @deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset}) |
985 | The @code{strpbrk} (``string pointer break'') function is related to | |
986 | @code{strcspn}, except that it returns a pointer to the first character | |
987 | in @var{string} that is a member of the set @var{stopset} instead of the | |
988 | length of the initial substring. It returns a null pointer if no such | |
989 | character from @var{stopset} is found. | |
990 | ||
991 | @c @group Invalid outside the example. | |
992 | For example, | |
993 | ||
994 | @smallexample | |
995 | strpbrk ("hello, world", " \t\n,.;!?") | |
996 | @result{} ", world" | |
997 | @end smallexample | |
998 | @c @end group | |
999 | @end deftypefun | |
1000 | ||
b4012b75 | 1001 | @node Finding Tokens in a String |
28f540f4 RM |
1002 | @section Finding Tokens in a String |
1003 | ||
28f540f4 RM |
1004 | @cindex tokenizing strings |
1005 | @cindex breaking a string into tokens | |
1006 | @cindex parsing tokens from a string | |
1007 | It's fairly common for programs to have a need to do some simple kinds | |
1008 | of lexical analysis and parsing, such as splitting a command string up | |
1009 | into tokens. You can do this with the @code{strtok} function, declared | |
1010 | in the header file @file{string.h}. | |
1011 | @pindex string.h | |
1012 | ||
1013 | @comment string.h | |
f65fd747 | 1014 | @comment ISO |
28f540f4 RM |
1015 | @deftypefun {char *} strtok (char *@var{newstring}, const char *@var{delimiters}) |
1016 | A string can be split into tokens by making a series of calls to the | |
1017 | function @code{strtok}. | |
1018 | ||
1019 | The string to be split up is passed as the @var{newstring} argument on | |
1020 | the first call only. The @code{strtok} function uses this to set up | |
1021 | some internal state information. Subsequent calls to get additional | |
1022 | tokens from the same string are indicated by passing a null pointer as | |
1023 | the @var{newstring} argument. Calling @code{strtok} with another | |
1024 | non-null @var{newstring} argument reinitializes the state information. | |
1025 | It is guaranteed that no other library function ever calls @code{strtok} | |
1026 | behind your back (which would mess up this internal state information). | |
1027 | ||
1028 | The @var{delimiters} argument is a string that specifies a set of delimiters | |
1029 | that may surround the token being extracted. All the initial characters | |
1030 | that are members of this set are discarded. The first character that is | |
1031 | @emph{not} a member of this set of delimiters marks the beginning of the | |
1032 | next token. The end of the token is found by looking for the next | |
1033 | character that is a member of the delimiter set. This character in the | |
1034 | original string @var{newstring} is overwritten by a null character, and the | |
1035 | pointer to the beginning of the token in @var{newstring} is returned. | |
1036 | ||
1037 | On the next call to @code{strtok}, the searching begins at the next | |
1038 | character beyond the one that marked the end of the previous token. | |
1039 | Note that the set of delimiters @var{delimiters} do not have to be the | |
1040 | same on every call in a series of calls to @code{strtok}. | |
1041 | ||
1042 | If the end of the string @var{newstring} is reached, or if the remainder of | |
1043 | string consists only of delimiter characters, @code{strtok} returns | |
1044 | a null pointer. | |
1045 | @end deftypefun | |
1046 | ||
1047 | @strong{Warning:} Since @code{strtok} alters the string it is parsing, | |
1048 | you always copy the string to a temporary buffer before parsing it with | |
1049 | @code{strtok}. If you allow @code{strtok} to modify a string that came | |
1050 | from another part of your program, you are asking for trouble; that | |
1051 | string may be part of a data structure that could be used for other | |
1052 | purposes during the parsing, when alteration by @code{strtok} makes the | |
1053 | data structure temporarily inaccurate. | |
1054 | ||
1055 | The string that you are operating on might even be a constant. Then | |
1056 | when @code{strtok} tries to modify it, your program will get a fatal | |
1057 | signal for writing in read-only memory. @xref{Program Error Signals}. | |
1058 | ||
1059 | This is a special case of a general principle: if a part of a program | |
1060 | does not have as its purpose the modification of a certain data | |
1061 | structure, then it is error-prone to modify the data structure | |
1062 | temporarily. | |
1063 | ||
1064 | The function @code{strtok} is not reentrant. @xref{Nonreentrancy}, for | |
1065 | a discussion of where and why reentrancy is important. | |
1066 | ||
1067 | Here is a simple example showing the use of @code{strtok}. | |
1068 | ||
1069 | @comment Yes, this example has been tested. | |
1070 | @smallexample | |
1071 | #include <string.h> | |
1072 | #include <stddef.h> | |
1073 | ||
1074 | @dots{} | |
1075 | ||
5649a1d6 | 1076 | const char string[] = "words separated by spaces -- and, punctuation!"; |
28f540f4 | 1077 | const char delimiters[] = " .,;:!-"; |
5649a1d6 | 1078 | char *token, *cp; |
28f540f4 RM |
1079 | |
1080 | @dots{} | |
1081 | ||
5649a1d6 UD |
1082 | cp = strdupa (string); /* Make writable copy. */ |
1083 | token = strtok (cp, delimiters); /* token => "words" */ | |
28f540f4 RM |
1084 | token = strtok (NULL, delimiters); /* token => "separated" */ |
1085 | token = strtok (NULL, delimiters); /* token => "by" */ | |
1086 | token = strtok (NULL, delimiters); /* token => "spaces" */ | |
1087 | token = strtok (NULL, delimiters); /* token => "and" */ | |
1088 | token = strtok (NULL, delimiters); /* token => "punctuation" */ | |
1089 | token = strtok (NULL, delimiters); /* token => NULL */ | |
1090 | @end smallexample | |
a5113b14 UD |
1091 | |
1092 | The GNU C library contains two more functions for tokenizing a string | |
1093 | which overcome the limitation of non-reentrancy. | |
1094 | ||
1095 | @comment string.h | |
1096 | @comment POSIX | |
1097 | @deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr}) | |
1098 | Just like @code{strtok} this function splits the string into several | |
1099 | tokens which can be accessed be successive calls to @code{strtok_r}. | |
1100 | The difference is that the information about the next token is not set | |
1101 | up in some internal state information. Instead the caller has to | |
1102 | provide another argument @var{save_ptr} which is a pointer to a string | |
1103 | pointer. Calling @code{strtok_r} with a null pointer for | |
1104 | @var{newstring} and leaving @var{save_ptr} between the calls unchanged | |
1105 | does the job without limiting reentrancy. | |
1106 | ||
5649a1d6 | 1107 | This function is defined in POSIX-1 and can be found on many systems |
a5113b14 UD |
1108 | which support multi-threading. |
1109 | @end deftypefun | |
1110 | ||
1111 | @comment string.h | |
1112 | @comment BSD | |
1113 | @deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter}) | |
1114 | A second reentrant approach is to avoid the additional first argument. | |
1115 | The initialization of the moving pointer has to be done by the user. | |
1116 | Successive calls of @code{strsep} move the pointer along the tokens | |
1117 | separated by @var{delimiter}, returning the address of the next token | |
1118 | and updating @var{string_ptr} to point to the beginning of the next | |
1119 | token. | |
1120 | ||
1121 | This function was introduced in 4.3BSD and therefore is widely available. | |
1122 | @end deftypefun | |
1123 | ||
1124 | Here is how the above example looks like when @code{strsep} is used. | |
1125 | ||
1126 | @comment Yes, this example has been tested. | |
1127 | @smallexample | |
1128 | #include <string.h> | |
1129 | #include <stddef.h> | |
1130 | ||
1131 | @dots{} | |
1132 | ||
5649a1d6 | 1133 | const char string[] = "words separated by spaces -- and, punctuation!"; |
a5113b14 UD |
1134 | const char delimiters[] = " .,;:!-"; |
1135 | char *running; | |
1136 | char *token; | |
1137 | ||
1138 | @dots{} | |
1139 | ||
5649a1d6 | 1140 | running = strdupa (string); |
a5113b14 UD |
1141 | token = strsep (&running, delimiters); /* token => "words" */ |
1142 | token = strsep (&running, delimiters); /* token => "separated" */ | |
1143 | token = strsep (&running, delimiters); /* token => "by" */ | |
1144 | token = strsep (&running, delimiters); /* token => "spaces" */ | |
1145 | token = strsep (&running, delimiters); /* token => "and" */ | |
1146 | token = strsep (&running, delimiters); /* token => "punctuation" */ | |
1147 | token = strsep (&running, delimiters); /* token => NULL */ | |
1148 | @end smallexample | |
b4012b75 UD |
1149 | |
1150 | @node Encode Binary Data | |
1151 | @section Encode Binary Data | |
1152 | ||
1153 | To store or transfer binary data in environments which only support text | |
1154 | one has to encode the binary data by mapping the input bytes to | |
1155 | characters in the range allowed for storing or transfering. SVID | |
1156 | systems (and nowadays XPG compliant systems) have such a function in the | |
1157 | C library. | |
1158 | ||
1159 | @comment stdlib.h | |
1160 | @comment XPG | |
1161 | @deftypefun {char *} l64a (long int @var{n}) | |
1162 | This function encodes an input value with 32 bits using characters from | |
1163 | the basic character set. Groups of 6 bits are encoded using the | |
1164 | following table: | |
1165 | ||
1166 | @multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} | |
1167 | @item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7 | |
1168 | @item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1} | |
1169 | @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5} | |
1170 | @item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9} | |
1171 | @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D} | |
1172 | @item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H} | |
1173 | @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L} | |
1174 | @item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P} | |
1175 | @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T} | |
1176 | @item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X} | |
1177 | @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b} | |
1178 | @item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f} | |
1179 | @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j} | |
1180 | @item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n} | |
1181 | @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r} | |
1182 | @item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v} | |
1183 | @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z} | |
1184 | @end multitable | |
1185 | ||
1186 | The function returns a pointer to a static buffer which contains the | |
1187 | string representing of the encoding of @var{n}. To encoded a series of | |
1188 | bytes the use should append the new string to the destination buffer. | |
1189 | @emph{Warning:} Since a static buffer is used this function should not | |
5649a1d6 | 1190 | be used in multi-threaded programs. There is no thread-safe alternative |
b4012b75 UD |
1191 | to this function in the C library. |
1192 | @end deftypefun | |
1193 | ||
5649a1d6 UD |
1194 | Alone the @code{l64a} function is not usable. To encode arbitrary |
1195 | sequences of bytes one needs some more code and this could look like | |
1196 | this: | |
1197 | ||
1198 | @smallexample | |
1199 | char * | |
1200 | encode (const void *buf, size_t len) | |
1201 | @{ | |
1202 | /* @r{We know in advance how long the buffer has to be.} */ | |
1203 | unsigned char *in = (unsigned char *) buf; | |
1204 | char *out = malloc (6 + ((len + 3) / 4) * 6 + 1); | |
1205 | char *cp = out; | |
1206 | ||
1207 | /* @r{Encode the length.} */ | |
1208 | memcpy (cp, l64a (len), 6); | |
1209 | cp += 6; | |
1210 | ||
1211 | while (len > 3) | |
1212 | @{ | |
1213 | unsigned long int n = *in++; | |
1214 | n = (n << 8) | *in++; | |
1215 | n = (n << 8) | *in++; | |
1216 | n = (n << 8) | *in++; | |
1217 | len -= 4; | |
1218 | /* @r{Using `htonl' is necessary so that the data can be} | |
1219 | @r{decoded even on machines with different byte order.} */ | |
1220 | memcpy (cp, l64a (htonl (n)), 6); | |
1221 | cp += 6; | |
1222 | @} | |
1223 | if (len > 0) | |
1224 | @{ | |
1225 | unsigned long int n = *in++; | |
1226 | if (--len > 0) | |
1227 | @{ | |
1228 | n = (n << 8) | *in++; | |
1229 | if (--len > 0) | |
1230 | n = (n << 8) | *in; | |
1231 | @} | |
1232 | memcpy (cp, l64a (htonl (n)), 6); | |
1233 | cp += 6; | |
1234 | @} | |
1235 | *cp = '\0'; | |
1236 | return out; | |
1237 | @} | |
1238 | @end smallexample | |
1239 | ||
1240 | It is strange that the library does not provide the complete | |
1241 | functionality needed but so be it. There are some other encoding | |
1242 | methods which are much more widely used (UU encoding, Base64 encoding). | |
1243 | Generally, it is better to use one of these encodings. | |
1244 | ||
b4012b75 UD |
1245 | To decode data produced with @code{l64a} the following function should be |
1246 | used. | |
1247 | ||
5649a1d6 UD |
1248 | @comment stdlib.h |
1249 | @comment XPG | |
b4012b75 UD |
1250 | @deftypefun {long int} a64l (const char *@var{string}) |
1251 | The parameter @var{string} should contain a string which was produced by | |
1252 | a call to @code{l64a}. The function processes the next 6 characters and | |
1253 | decodes the characters it finds according to the table above. | |
1254 | Characters not in the conversion table are simply ignored. This is | |
1255 | useful for breaking the information in lines in which case the end of | |
1256 | line characters are simply ignored. | |
1257 | ||
1258 | The decoded number is returned at the end as a @code{long int} value. | |
1259 | Consecutive calls to this function are possible but the caller must make | |
1260 | sure the buffer pointer is update after each call to @code{a64l} since | |
1261 | this function does not modify the buffer pointer. Every call consumes 6 | |
1262 | characters. | |
1263 | @end deftypefun | |
b13927da UD |
1264 | |
1265 | @node Argz and Envz Vectors | |
1266 | @section Argz and Envz Vectors | |
1267 | ||
5649a1d6 | 1268 | @cindex argz vectors (string vectors) |
b13927da UD |
1269 | @cindex string vectors, null-character separated |
1270 | @cindex argument vectors, null-character separated | |
1271 | @dfn{argz vectors} are vectors of strings in a contiguous block of | |
1272 | memory, each element separated from its neighbors by null-characters | |
1273 | (@code{'\0'}). | |
1274 | ||
5649a1d6 | 1275 | @cindex envz vectors (environment vectors) |
b13927da UD |
1276 | @cindex environment vectors, null-character separated |
1277 | @dfn{Envz vectors} are an extension of argz vectors where each element is a | |
5649a1d6 | 1278 | name-value pair, separated by a @code{'='} character (as in a Unix |
b13927da UD |
1279 | environment). |
1280 | ||
1281 | @menu | |
1282 | * Argz Functions:: Operations on argz vectors. | |
1283 | * Envz Functions:: Additional operations on environment vectors. | |
1284 | @end menu | |
1285 | ||
1286 | @node Argz Functions, Envz Functions, , Argz and Envz Vectors | |
1287 | @subsection Argz Functions | |
1288 | ||
1289 | Each argz vector is represented by a pointer to the first element, of | |
1290 | type @code{char *}, and a size, of type @code{size_t}, both of which can | |
1291 | be initialized to @code{0} to represent an empty argz vector. All argz | |
1292 | functions accept either a pointer and a size argument, or pointers to | |
1293 | them, if they will be modified. | |
1294 | ||
1295 | The argz functions use @code{malloc}/@code{realloc} to allocate/grow | |
1296 | argz vectors, and so any argz vector creating using these functions may | |
1297 | be freed by using @code{free}; conversely, any argz function that may | |
1298 | grow a string expects that string to have been allocated using | |
1299 | @code{malloc} (those argz functions that only examine their arguments or | |
1300 | modify them in place will work on any sort of memory). | |
1301 | @xref{Unconstrained Allocation}. | |
1302 | ||
1303 | All argz functions that do memory allocation have a return type of | |
1304 | @code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an | |
1305 | allocation error occurs. | |
1306 | ||
1307 | @pindex argz.h | |
1308 | These functions are declared in the standard include file @file{argz.h}. | |
1309 | ||
5649a1d6 UD |
1310 | @comment argz.h |
1311 | @comment GNU | |
b13927da | 1312 | @deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len}) |
5649a1d6 | 1313 | The @code{argz_create} function converts the Unix-style argument vector |
b13927da UD |
1314 | @var{argv} (a vector of pointers to normal C strings, terminated by |
1315 | @code{(char *)0}; @pxref{Program Arguments}) into an argz vector with | |
1316 | the same elements, which is returned in @var{argz} and @var{argz_len}. | |
1317 | @end deftypefun | |
1318 | ||
5649a1d6 UD |
1319 | @comment argz.h |
1320 | @comment GNU | |
b13927da UD |
1321 | @deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len}) |
1322 | The @code{argz_create_sep} function converts the null-terminated string | |
1323 | @var{string} into an argz vector (returned in @var{argz} and | |
1324 | @var{argz_len}) by splitting it into elements at every occurance of the | |
1325 | character @var{sep}. | |
1326 | @end deftypefun | |
1327 | ||
5649a1d6 UD |
1328 | @comment argz.h |
1329 | @comment GNU | |
b13927da UD |
1330 | @deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{arg_len}) |
1331 | Returns the number of elements in the argz vector @var{argz} and | |
1332 | @var{argz_len}. | |
1333 | @end deftypefun | |
1334 | ||
5649a1d6 UD |
1335 | @comment argz.h |
1336 | @comment GNU | |
b13927da UD |
1337 | @deftypefun {void} argz_extract (char *@var{argz}, size_t @var{argz_len}, char **@var{argv}) |
1338 | The @code{argz_extract} function converts the argz vector @var{argz} and | |
5649a1d6 | 1339 | @var{argz_len} into a Unix-style argument vector stored in @var{argv}, |
b13927da UD |
1340 | by putting pointers to every element in @var{argz} into successive |
1341 | positions in @var{argv}, followed by a terminator of @code{0}. | |
1342 | @var{Argv} must be pre-allocated with enough space to hold all the | |
1343 | elements in @var{argz} plus the terminating @code{(char *)0} | |
1344 | (@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)} | |
1345 | bytes should be enough). Note that the string pointers stored into | |
1346 | @var{argv} point into @var{argz}---they are not copies---and so | |
1347 | @var{argz} must be copied if it will be changed while @var{argv} is | |
1348 | still active. This function is useful for passing the elements in | |
1349 | @var{argz} to an exec function (@pxref{Executing a File}). | |
1350 | @end deftypefun | |
1351 | ||
5649a1d6 UD |
1352 | @comment argz.h |
1353 | @comment GNU | |
b13927da UD |
1354 | @deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep}) |
1355 | The @code{argz_stringify} converts @var{argz} into a normal string with | |
1356 | the elements separated by the character @var{sep}, by replacing each | |
1357 | @code{'\0'} inside @var{argz} (except the last one, which terminates the | |
1358 | string) with @var{sep}. This is handy for printing @var{argz} in a | |
1359 | readable manner. | |
1360 | @end deftypefun | |
1361 | ||
5649a1d6 UD |
1362 | @comment argz.h |
1363 | @comment GNU | |
b13927da UD |
1364 | @deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}) |
1365 | The @code{argz_add} function adds the string @var{str} to the end of the | |
1366 | argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and | |
1367 | @code{*@var{argz_len}} accordingly. | |
1368 | @end deftypefun | |
1369 | ||
5649a1d6 UD |
1370 | @comment argz.h |
1371 | @comment GNU | |
b13927da UD |
1372 | @deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim}) |
1373 | The @code{argz_add_sep} function is similar to @code{argz_add}, but | |
1374 | @var{str} is split into separate elements in the result at occurances of | |
1375 | the character @var{delim}. This is useful, for instance, for | |
5649a1d6 | 1376 | adding the components of a Unix search path to an argz vector, by using |
b13927da UD |
1377 | a value of @code{':'} for @var{delim}. |
1378 | @end deftypefun | |
1379 | ||
5649a1d6 UD |
1380 | @comment argz.h |
1381 | @comment GNU | |
b13927da UD |
1382 | @deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len}) |
1383 | The @code{argz_append} function appends @var{buf_len} bytes starting at | |
1384 | @var{buf} to the argz vector @code{*@var{argz}}, reallocating | |
1385 | @code{*@var{argz}} to accommodate it, and adding @var{buf_len} to | |
1386 | @code{*@var{argz_len}}. | |
1387 | @end deftypefun | |
1388 | ||
5649a1d6 UD |
1389 | @comment argz.h |
1390 | @comment GNU | |
b13927da UD |
1391 | @deftypefun {error_t} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry}) |
1392 | If @var{entry} points to the beginning of one of the elements in the | |
1393 | argz vector @code{*@var{argz}}, the @code{argz_delete} function will | |
1394 | remove this entry and reallocate @code{*@var{argz}}, modifying | |
1395 | @code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as | |
1396 | destructive argz functions usually reallocate their argz argument, | |
1397 | pointers into argz vectors such as @var{entry} will then become invalid. | |
1398 | @end deftypefun | |
1399 | ||
5649a1d6 UD |
1400 | @comment argz.h |
1401 | @comment GNU | |
b13927da UD |
1402 | @deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry}) |
1403 | The @code{argz_insert} function inserts the string @var{entry} into the | |
1404 | argz vector @code{*@var{argz}} at a point just before the existing | |
1405 | element pointed to by @var{before}, reallocating @code{*@var{argz}} and | |
1406 | updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before} | |
1407 | is @code{0}, @var{entry} is added to the end instead (as if by | |
1408 | @code{argz_add}). Since the first element is in fact the same as | |
1409 | @code{*@var{argz}}, passing in @code{*@var{argz}} as the value of | |
1410 | @var{before} will result in @var{entry} being inserted at the beginning. | |
1411 | @end deftypefun | |
1412 | ||
5649a1d6 UD |
1413 | @comment argz.h |
1414 | @comment GNU | |
b13927da UD |
1415 | @deftypefun {char *} argz_next (char *@var{argz}, size_t @var{argz_len}, const char *@var{entry}) |
1416 | The @code{argz_next} function provides a convenient way of iterating | |
1417 | over the elements in the argz vector @var{argz}. It returns a pointer | |
1418 | to the next element in @var{argz} after the element @var{entry}, or | |
1419 | @code{0} if there are no elements following @var{entry}. If @var{entry} | |
1420 | is @code{0}, the first element of @var{argz} is returned. | |
1421 | ||
1422 | This behavior suggests two styles of iteration: | |
1423 | ||
1424 | @smallexample | |
1425 | char *entry = 0; | |
1426 | while ((entry = argz_next (@var{argz}, @var{argz_len}, entry))) | |
1427 | @var{action}; | |
1428 | @end smallexample | |
1429 | ||
1430 | (the double parentheses are necessary to make some C compilers shut up | |
1431 | about what they consider a questionable @code{while}-test) and: | |
1432 | ||
1433 | @smallexample | |
1434 | char *entry; | |
1435 | for (entry = @var{argz}; | |
1436 | entry; | |
1437 | entry = argz_next (@var{argz}, @var{argz_len}, entry)) | |
1438 | @var{action}; | |
1439 | @end smallexample | |
1440 | ||
1441 | Note that the latter depends on @var{argz} having a value of @code{0} if | |
1442 | it is empty (rather than a pointer to an empty block of memory); this | |
1443 | invariant is maintained for argz vectors created by the functions here. | |
1444 | @end deftypefun | |
1445 | ||
d705269e UD |
1446 | @comment argz.h |
1447 | @comment GNU | |
1448 | @deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}}) | |
1449 | Replace any occurances of the string @var{str} in @var{argz} with | |
1450 | @var{with}, reallocating @var{argz} as necessary. If | |
1451 | @var{replace_count} is non-zero, @code{*@var{replace_count}} will be | |
1452 | incremented by number of replacements performed. | |
1453 | @end deftypefun | |
1454 | ||
b13927da UD |
1455 | @node Envz Functions, , Argz Functions, Argz and Envz Vectors |
1456 | @subsection Envz Functions | |
1457 | ||
1458 | Envz vectors are just argz vectors with additional constraints on the form | |
1459 | of each element; as such, argz functions can also be used on them, where it | |
1460 | makes sense. | |
1461 | ||
1462 | Each element in an envz vector is a name-value pair, separated by a @code{'='} | |
1463 | character; if multiple @code{'='} characters are present in an element, those | |
1464 | after the first are considered part of the value, and treated like all other | |
1465 | non-@code{'\0'} characters. | |
1466 | ||
1467 | If @emph{no} @code{'='} characters are present in an element, that element is | |
1468 | considered the name of a ``null'' entry, as distinct from an entry with an | |
1469 | empty value: @code{envz_get} will return @code{0} if given the name of null | |
1470 | entry, whereas an entry with an empty value would result in a value of | |
1471 | @code{""}; @code{envz_entry} will still find such entries, however. Null | |
1472 | entries can be removed with @code{envz_strip} function. | |
1473 | ||
1474 | As with argz functions, envz functions that may allocate memory (and thus | |
1475 | fail) have a return type of @code{error_t}, and return either @code{0} or | |
1476 | @code{ENOMEM}. | |
1477 | ||
1478 | @pindex envz.h | |
1479 | These functions are declared in the standard include file @file{envz.h}. | |
1480 | ||
5649a1d6 UD |
1481 | @comment envz.h |
1482 | @comment GNU | |
b13927da UD |
1483 | @deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name}) |
1484 | The @code{envz_entry} function finds the entry in @var{envz} with the name | |
1485 | @var{name}, and returns a pointer to the whole entry---that is, the argz | |
1486 | element which begins with @var{name} followed by a @code{'='} character. If | |
1487 | there is no entry with that name, @code{0} is returned. | |
1488 | @end deftypefun | |
1489 | ||
5649a1d6 UD |
1490 | @comment envz.h |
1491 | @comment GNU | |
b13927da UD |
1492 | @deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name}) |
1493 | The @code{envz_get} function finds the entry in @var{envz} with the name | |
1494 | @var{name} (like @code{envz_entry}), and returns a pointer to the value | |
1495 | portion of that entry (following the @code{'='}). If there is no entry with | |
1496 | that name (or only a null entry), @code{0} is returned. | |
1497 | @end deftypefun | |
1498 | ||
5649a1d6 UD |
1499 | @comment envz.h |
1500 | @comment GNU | |
b13927da UD |
1501 | @deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value}) |
1502 | The @code{envz_add} function adds an entry to @code{*@var{envz}} | |
1503 | (updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name | |
1504 | @var{name}, and value @var{value}. If an entry with the same name | |
1505 | already exists in @var{envz}, it is removed first. If @var{value} is | |
1506 | @code{0}, then the new entry will the special null type of entry | |
1507 | (mentioned above). | |
1508 | @end deftypefun | |
1509 | ||
5649a1d6 UD |
1510 | @comment envz.h |
1511 | @comment GNU | |
b13927da UD |
1512 | @deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override}) |
1513 | The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz}, | |
1514 | as if with @code{envz_add}, updating @code{*@var{envz}} and | |
1515 | @code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2} | |
1516 | will supersede those with the same name in @var{envz}, otherwise not. | |
1517 | ||
1518 | Null entries are treated just like other entries in this respect, so a null | |
1519 | entry in @var{envz} can prevent an entry of the same name in @var{envz2} from | |
1520 | being added to @var{envz}, if @var{override} is false. | |
1521 | @end deftypefun | |
1522 | ||
5649a1d6 UD |
1523 | @comment envz.h |
1524 | @comment GNU | |
b13927da UD |
1525 | @deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len}) |
1526 | The @code{envz_strip} function removes any null entries from @var{envz}, | |
1527 | updating @code{*@var{envz}} and @code{*@var{envz_len}}. | |
1528 | @end deftypefun |