]> git.ipfire.org Git - thirdparty/glibc.git/blame - manual/string.texi
Update.
[thirdparty/glibc.git] / manual / string.texi
CommitLineData
28f540f4
RM
1@node String and Array Utilities, Extended Characters, Character Handling, Top
2@chapter String and Array Utilities
3
4Operations on strings (or arrays of characters) are an important part of
5many programs. The GNU C library provides an extensive set of string
6utility functions, including functions for copying, concatenating,
7comparing, and searching strings. Many of these functions can also
8operate on arbitrary regions of storage; for example, the @code{memcpy}
a5113b14 9function can be used to copy the contents of any kind of array.
28f540f4
RM
10
11It's fairly common for beginning C programmers to ``reinvent the wheel''
12by duplicating this functionality in their own code, but it pays to
13become familiar with the library functions and to make use of them,
14since this offers benefits in maintenance, efficiency, and portability.
15
16For instance, you could easily compare one string to another in two
17lines of C code, but if you use the built-in @code{strcmp} function,
18you're less likely to make a mistake. And, since these library
19functions are typically highly optimized, your program may run faster
20too.
21
22@menu
23* Representation of Strings:: Introduction to basic concepts.
24* String/Array Conventions:: Whether to use a string function or an
25 arbitrary array function.
26* String Length:: Determining the length of a string.
27* Copying and Concatenation:: Functions to copy the contents of strings
28 and arrays.
29* String/Array Comparison:: Functions for byte-wise and character-wise
30 comparison.
31* Collation Functions:: Functions for collating strings.
32* Search Functions:: Searching for a specific element or substring.
33* Finding Tokens in a String:: Splitting a string into tokens by looking
34 for delimiters.
b4012b75 35* Encode Binary Data:: Encoding and Decoding of Binary Data.
b13927da 36* Argz and Envz Vectors:: Null-separated string vectors.
28f540f4
RM
37@end menu
38
b4012b75 39@node Representation of Strings
28f540f4
RM
40@section Representation of Strings
41@cindex string, representation of
42
43This section is a quick summary of string concepts for beginning C
44programmers. It describes how character strings are represented in C
45and some common pitfalls. If you are already familiar with this
46material, you can skip this section.
47
48@cindex string
49@cindex null character
50A @dfn{string} is an array of @code{char} objects. But string-valued
51variables are usually declared to be pointers of type @code{char *}.
52Such variables do not include space for the text of a string; that has
53to be stored somewhere else---in an array variable, a string constant,
54or dynamically allocated memory (@pxref{Memory Allocation}). It's up to
55you to store the address of the chosen memory space into the pointer
56variable. Alternatively you can store a @dfn{null pointer} in the
57pointer variable. The null pointer does not point anywhere, so
58attempting to reference the string it points to gets an error.
59
60By convention, a @dfn{null character}, @code{'\0'}, marks the end of a
61string. For example, in testing to see whether the @code{char *}
62variable @var{p} points to a null character marking the end of a string,
63you can write @code{!*@var{p}} or @code{*@var{p} == '\0'}.
64
65A null character is quite different conceptually from a null pointer,
66although both are represented by the integer @code{0}.
67
68@cindex string literal
69@dfn{String literals} appear in C program source as strings of
f65fd747 70characters between double-quote characters (@samp{"}). In @w{ISO C},
28f540f4
RM
71string literals can also be formed by @dfn{string concatenation}:
72@code{"a" "b"} is the same as @code{"ab"}. Modification of string
73literals is not allowed by the GNU C compiler, because literals
74are placed in read-only storage.
75
76Character arrays that are declared @code{const} cannot be modified
77either. It's generally good style to declare non-modifiable string
78pointers to be of type @code{const char *}, since this often allows the
79C compiler to detect accidental modifications as well as providing some
80amount of documentation about what your program intends to do with the
81string.
82
83The amount of memory allocated for the character array may extend past
84the null character that normally marks the end of the string. In this
85document, the term @dfn{allocation size} is always used to refer to the
86total amount of memory allocated for the string, while the term
87@dfn{length} refers to the number of characters up to (but not
88including) the terminating null character.
89@cindex length of string
90@cindex allocation size of string
91@cindex size of string
92@cindex string length
93@cindex string allocation
94
95A notorious source of program bugs is trying to put more characters in a
96string than fit in its allocated size. When writing code that extends
97strings or moves characters into a pre-allocated array, you should be
98very careful to keep track of the length of the text and make explicit
99checks for overflowing the array. Many of the library functions
100@emph{do not} do this for you! Remember also that you need to allocate
101an extra byte to hold the null character that marks the end of the
102string.
103
b4012b75 104@node String/Array Conventions
28f540f4
RM
105@section String and Array Conventions
106
107This chapter describes both functions that work on arbitrary arrays or
108blocks of memory, and functions that are specific to null-terminated
109arrays of characters.
110
111Functions that operate on arbitrary blocks of memory have names
112beginning with @samp{mem} (such as @code{memcpy}) and invariably take an
113argument which specifies the size (in bytes) of the block of memory to
114operate on. The array arguments and return values for these functions
115have type @code{void *}, and as a matter of style, the elements of these
116arrays are referred to as ``bytes''. You can pass any kind of pointer
117to these functions, and the @code{sizeof} operator is useful in
118computing the value for the size argument.
119
120In contrast, functions that operate specifically on strings have names
121beginning with @samp{str} (such as @code{strcpy}) and look for a null
122character to terminate the string instead of requiring an explicit size
123argument to be passed. (Some of these functions accept a specified
124maximum length, but they also check for premature termination with a
125null character.) The array arguments and return values for these
126functions have type @code{char *}, and the array elements are referred
127to as ``characters''.
128
129In many cases, there are both @samp{mem} and @samp{str} versions of a
130function. The one that is more appropriate to use depends on the exact
131situation. When your program is manipulating arbitrary arrays or blocks of
132storage, then you should always use the @samp{mem} functions. On the
133other hand, when you are manipulating null-terminated strings it is
134usually more convenient to use the @samp{str} functions, unless you
135already know the length of the string in advance.
136
b4012b75 137@node String Length
28f540f4
RM
138@section String Length
139
140You can get the length of a string using the @code{strlen} function.
141This function is declared in the header file @file{string.h}.
142@pindex string.h
143
144@comment string.h
f65fd747 145@comment ISO
28f540f4
RM
146@deftypefun size_t strlen (const char *@var{s})
147The @code{strlen} function returns the length of the null-terminated
148string @var{s}. (In other words, it returns the offset of the terminating
149null character within the array.)
150
151For example,
152@smallexample
153strlen ("hello, world")
154 @result{} 12
155@end smallexample
156
157When applied to a character array, the @code{strlen} function returns
158the length of the string stored there, not its allocation size. You can
159get the allocation size of the character array that holds a string using
160the @code{sizeof} operator:
161
162@smallexample
a5113b14 163char string[32] = "hello, world";
28f540f4
RM
164sizeof (string)
165 @result{} 32
166strlen (string)
167 @result{} 12
168@end smallexample
169@end deftypefun
170
b4012b75 171@node Copying and Concatenation
28f540f4
RM
172@section Copying and Concatenation
173
174You can use the functions described in this section to copy the contents
175of strings and arrays, or to append the contents of one string to
176another. These functions are declared in the header file
177@file{string.h}.
178@pindex string.h
179@cindex copying strings and arrays
180@cindex string copy functions
181@cindex array copy functions
182@cindex concatenating strings
183@cindex string concatenation functions
184
185A helpful way to remember the ordering of the arguments to the functions
186in this section is that it corresponds to an assignment expression, with
187the destination array specified to the left of the source array. All
188of these functions return the address of the destination array.
189
190Most of these functions do not work properly if the source and
191destination arrays overlap. For example, if the beginning of the
192destination array overlaps the end of the source array, the original
193contents of that part of the source array may get overwritten before it
194is copied. Even worse, in the case of the string functions, the null
195character marking the end of the string may be lost, and the copy
196function might get stuck in a loop trashing all the memory allocated to
197your program.
198
199All functions that have problems copying between overlapping arrays are
200explicitly identified in this manual. In addition to functions in this
201section, there are a few others like @code{sprintf} (@pxref{Formatted
202Output Functions}) and @code{scanf} (@pxref{Formatted Input
203Functions}).
204
205@comment string.h
f65fd747 206@comment ISO
28f540f4
RM
207@deftypefun {void *} memcpy (void *@var{to}, const void *@var{from}, size_t @var{size})
208The @code{memcpy} function copies @var{size} bytes from the object
209beginning at @var{from} into the object beginning at @var{to}. The
210behavior of this function is undefined if the two arrays @var{to} and
211@var{from} overlap; use @code{memmove} instead if overlapping is possible.
212
213The value returned by @code{memcpy} is the value of @var{to}.
214
215Here is an example of how you might use @code{memcpy} to copy the
216contents of an array:
217
218@smallexample
219struct foo *oldarray, *newarray;
220int arraysize;
221@dots{}
222memcpy (new, old, arraysize * sizeof (struct foo));
223@end smallexample
224@end deftypefun
225
226@comment string.h
f65fd747 227@comment ISO
28f540f4
RM
228@deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size})
229@code{memmove} copies the @var{size} bytes at @var{from} into the
230@var{size} bytes at @var{to}, even if those two blocks of space
231overlap. In the case of overlap, @code{memmove} is careful to copy the
232original values of the bytes in the block at @var{from}, including those
233bytes which also belong to the block at @var{to}.
234@end deftypefun
235
236@comment string.h
237@comment SVID
238@deftypefun {void *} memccpy (void *@var{to}, const void *@var{from}, int @var{c}, size_t @var{size})
239This function copies no more than @var{size} bytes from @var{from} to
240@var{to}, stopping if a byte matching @var{c} is found. The return
241value is a pointer into @var{to} one byte past where @var{c} was copied,
242or a null pointer if no byte matching @var{c} appeared in the first
243@var{size} bytes of @var{from}.
244@end deftypefun
245
246@comment string.h
f65fd747 247@comment ISO
28f540f4
RM
248@deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size})
249This function copies the value of @var{c} (converted to an
250@code{unsigned char}) into each of the first @var{size} bytes of the
251object beginning at @var{block}. It returns the value of @var{block}.
252@end deftypefun
253
254@comment string.h
f65fd747 255@comment ISO
28f540f4
RM
256@deftypefun {char *} strcpy (char *@var{to}, const char *@var{from})
257This copies characters from the string @var{from} (up to and including
258the terminating null character) into the string @var{to}. Like
259@code{memcpy}, this function has undefined results if the strings
260overlap. The return value is the value of @var{to}.
261@end deftypefun
262
263@comment string.h
f65fd747 264@comment ISO
28f540f4
RM
265@deftypefun {char *} strncpy (char *@var{to}, const char *@var{from}, size_t @var{size})
266This function is similar to @code{strcpy} but always copies exactly
267@var{size} characters into @var{to}.
268
269If the length of @var{from} is more than @var{size}, then @code{strncpy}
270copies just the first @var{size} characters. Note that in this case
271there is no null terminator written into @var{to}.
272
273If the length of @var{from} is less than @var{size}, then @code{strncpy}
274copies all of @var{from}, followed by enough null characters to add up
275to @var{size} characters in all. This behavior is rarely useful, but it
f65fd747 276is specified by the @w{ISO C} standard.
28f540f4
RM
277
278The behavior of @code{strncpy} is undefined if the strings overlap.
279
280Using @code{strncpy} as opposed to @code{strcpy} is a way to avoid bugs
281relating to writing past the end of the allocated space for @var{to}.
282However, it can also make your program much slower in one common case:
283copying a string which is probably small into a potentially large buffer.
284In this case, @var{size} may be large, and when it is, @code{strncpy} will
285waste a considerable amount of time copying null characters.
286@end deftypefun
287
288@comment string.h
289@comment SVID
290@deftypefun {char *} strdup (const char *@var{s})
291This function copies the null-terminated string @var{s} into a newly
292allocated string. The string is allocated using @code{malloc}; see
293@ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space
294for the new string, @code{strdup} returns a null pointer. Otherwise it
295returns a pointer to the new string.
296@end deftypefun
297
706074a5
UD
298@comment string.h
299@comment GNU
300@deftypefun {char *} strndup (const char *@var{s}, size_t @var{size})
301This function is similar to @code{strdup} but always copies at most
302@var{size} characters into the newly allocated string.
303
304If the length of @var{s} is more than @var{size}, then @code{strndup}
305copies just the first @var{size} characters and adds a closing null
306terminator. Otherwise all characters are copied and the string is
307terminated.
308
309This function is different to @code{strncpy} in that it always
310terminates the destination string.
311@end deftypefun
312
28f540f4
RM
313@comment string.h
314@comment Unknown origin
315@deftypefun {char *} stpcpy (char *@var{to}, const char *@var{from})
316This function is like @code{strcpy}, except that it returns a pointer to
317the end of the string @var{to} (that is, the address of the terminating
318null character) rather than the beginning.
319
320For example, this program uses @code{stpcpy} to concatenate @samp{foo}
321and @samp{bar} to produce @samp{foobar}, which it then prints.
322
323@smallexample
324@include stpcpy.c.texi
325@end smallexample
326
f65fd747 327This function is not part of the ISO or POSIX standards, and is not
28f540f4
RM
328customary on Unix systems, but we did not invent it either. Perhaps it
329comes from MS-DOG.
330
331Its behavior is undefined if the strings overlap.
332@end deftypefun
333
706074a5
UD
334@comment string.h
335@comment GNU
336@deftypefun {char *} stpncpy (char *@var{to}, const char *@var{from}, size_t @var{size})
337This function is similar to @code{stpcpy} but copies always exactly
338@var{size} characters into @var{to}.
339
340If the length of @var{from} is more then @var{size}, then @code{stpncpy}
341copies just the first @var{size} characters and returns a pointer to the
342character directly following the one which was copied last. Note that in
343this case there is no null terminator written into @var{to}.
344
345If the length of @var{from} is less than @var{size}, then @code{stpncpy}
346copies all of @var{from}, followed by enough null characters to add up
347to @var{size} characters in all. This behaviour is rarely useful, but it
348is implemented to be useful in contexts where this behaviour of the
349@code{strncpy} is used. @code{stpncpy} returns a pointer to the
350@emph{first} written null character.
351
f65fd747 352This function is not part of ISO or POSIX but was found useful while
706074a5
UD
353developing GNU C Library itself.
354
355Its behaviour is undefined if the strings overlap.
356@end deftypefun
357
358@comment string.h
359@comment GNU
360@deftypefun {char *} strdupa (const char *@var{s})
361This function is similar to @code{strdup} but allocates the new string
362using @code{alloca} instead of @code{malloc}
363@pxref{Variable Size Automatic}. This means of course the returned
364string has the same limitations as any block of memory allocated using
365@code{alloca}.
366
367For obvious reasons @code{strdupa} is implemented only as a macro. I.e.,
40a55d20 368you cannot get the address of this function. Despite this limitation
706074a5
UD
369it is a useful function. The following code shows a situation where
370using @code{malloc} would be a lot more expensive.
371
372@smallexample
373@include strdupa.c.texi
374@end smallexample
375
376Please note that calling @code{strtok} using @var{path} directly is
40a55d20 377invalid.
706074a5
UD
378
379This function is only available if GNU CC is used.
380@end deftypefun
381
382@comment string.h
383@comment GNU
384@deftypefun {char *} strndupa (const char *@var{s}, size_t @var{size})
385This function is similar to @code{strndup} but like @code{strdupa} it
386allocates the new string using @code{alloca}
387@pxref{Variable Size Automatic}. The same advantages and limitations
388of @code{strdupa} are valid for @code{strndupa}, too.
389
390This function is implemented only as a macro which means one cannot
391get the address of it.
392
393@code{strndupa} is only available if GNU CC is used.
394@end deftypefun
395
28f540f4 396@comment string.h
f65fd747 397@comment ISO
28f540f4
RM
398@deftypefun {char *} strcat (char *@var{to}, const char *@var{from})
399The @code{strcat} function is similar to @code{strcpy}, except that the
400characters from @var{from} are concatenated or appended to the end of
401@var{to}, instead of overwriting it. That is, the first character from
402@var{from} overwrites the null character marking the end of @var{to}.
403
404An equivalent definition for @code{strcat} would be:
405
406@smallexample
407char *
408strcat (char *to, const char *from)
409@{
410 strcpy (to + strlen (to), from);
411 return to;
412@}
413@end smallexample
414
415This function has undefined results if the strings overlap.
416@end deftypefun
417
418@comment string.h
f65fd747 419@comment ISO
28f540f4
RM
420@deftypefun {char *} strncat (char *@var{to}, const char *@var{from}, size_t @var{size})
421This function is like @code{strcat} except that not more than @var{size}
422characters from @var{from} are appended to the end of @var{to}. A
423single null character is also always appended to @var{to}, so the total
424allocated size of @var{to} must be at least @code{@var{size} + 1} bytes
425longer than its initial length.
426
427The @code{strncat} function could be implemented like this:
428
429@smallexample
430@group
431char *
432strncat (char *to, const char *from, size_t size)
433@{
434 strncpy (to + strlen (to), from, size);
435 return to;
436@}
437@end group
438@end smallexample
439
440The behavior of @code{strncat} is undefined if the strings overlap.
441@end deftypefun
442
443Here is an example showing the use of @code{strncpy} and @code{strncat}.
444Notice how, in the call to @code{strncat}, the @var{size} parameter
445is computed to avoid overflowing the character array @code{buffer}.
446
447@smallexample
448@include strncat.c.texi
449@end smallexample
450
451@noindent
452The output produced by this program looks like:
453
454@smallexample
455hello
456hello, wo
457@end smallexample
458
459@comment string.h
460@comment BSD
461@deftypefun {void *} bcopy (void *@var{from}, const void *@var{to}, size_t @var{size})
462This is a partially obsolete alternative for @code{memmove}, derived from
463BSD. Note that it is not quite equivalent to @code{memmove}, because the
464arguments are not in the same order.
465@end deftypefun
466
467@comment string.h
468@comment BSD
469@deftypefun {void *} bzero (void *@var{block}, size_t @var{size})
470This is a partially obsolete alternative for @code{memset}, derived from
471BSD. Note that it is not as general as @code{memset}, because the only
472value it can store is zero.
473@end deftypefun
474
b4012b75 475@node String/Array Comparison
28f540f4
RM
476@section String/Array Comparison
477@cindex comparing strings and arrays
478@cindex string comparison functions
479@cindex array comparison functions
480@cindex predicates on strings
481@cindex predicates on arrays
482
483You can use the functions in this section to perform comparisons on the
484contents of strings and arrays. As well as checking for equality, these
485functions can also be used as the ordering functions for sorting
486operations. @xref{Searching and Sorting}, for an example of this.
487
488Unlike most comparison operations in C, the string comparison functions
489return a nonzero value if the strings are @emph{not} equivalent rather
490than if they are. The sign of the value indicates the relative ordering
491of the first characters in the strings that are not equivalent: a
492negative value indicates that the first string is ``less'' than the
a5113b14 493second, while a positive value indicates that the first string is
28f540f4
RM
494``greater''.
495
496The most common use of these functions is to check only for equality.
497This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}.
498
499All of these functions are declared in the header file @file{string.h}.
500@pindex string.h
501
502@comment string.h
f65fd747 503@comment ISO
28f540f4
RM
504@deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
505The function @code{memcmp} compares the @var{size} bytes of memory
506beginning at @var{a1} against the @var{size} bytes of memory beginning
507at @var{a2}. The value returned has the same sign as the difference
508between the first differing pair of bytes (interpreted as @code{unsigned
509char} objects, then promoted to @code{int}).
510
511If the contents of the two blocks are equal, @code{memcmp} returns
512@code{0}.
513@end deftypefun
514
515On arbitrary arrays, the @code{memcmp} function is mostly useful for
516testing equality. It usually isn't meaningful to do byte-wise ordering
517comparisons on arrays of things other than bytes. For example, a
518byte-wise comparison on the bytes that make up floating-point numbers
519isn't likely to tell you anything about the relationship between the
520values of the floating-point numbers.
521
522You should also be careful about using @code{memcmp} to compare objects
523that can contain ``holes'', such as the padding inserted into structure
524objects to enforce alignment requirements, extra space at the end of
525unions, and extra characters at the ends of strings whose length is less
526than their allocated size. The contents of these ``holes'' are
527indeterminate and may cause strange behavior when performing byte-wise
528comparisons. For more predictable results, perform an explicit
529component-wise comparison.
530
531For example, given a structure type definition like:
532
533@smallexample
534struct foo
535 @{
536 unsigned char tag;
537 union
538 @{
539 double f;
540 long i;
541 char *p;
542 @} value;
543 @};
544@end smallexample
545
546@noindent
547you are better off writing a specialized comparison function to compare
548@code{struct foo} objects instead of comparing them with @code{memcmp}.
549
550@comment string.h
f65fd747 551@comment ISO
28f540f4
RM
552@deftypefun int strcmp (const char *@var{s1}, const char *@var{s2})
553The @code{strcmp} function compares the string @var{s1} against
554@var{s2}, returning a value that has the same sign as the difference
555between the first differing pair of characters (interpreted as
556@code{unsigned char} objects, then promoted to @code{int}).
557
558If the two strings are equal, @code{strcmp} returns @code{0}.
559
560A consequence of the ordering used by @code{strcmp} is that if @var{s1}
561is an initial substring of @var{s2}, then @var{s1} is considered to be
562``less than'' @var{s2}.
563@end deftypefun
564
565@comment string.h
566@comment BSD
567@deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2})
568This function is like @code{strcmp}, except that differences in case
569are ignored.
570
571@code{strcasecmp} is derived from BSD.
572@end deftypefun
573
574@comment string.h
575@comment BSD
576@deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})
577This function is like @code{strncmp}, except that differences in case
578are ignored.
579
580@code{strncasecmp} is a GNU extension.
581@end deftypefun
582
583@comment string.h
f65fd747 584@comment ISO
28f540f4
RM
585@deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size})
586This function is the similar to @code{strcmp}, except that no more than
587@var{size} characters are compared. In other words, if the two strings are
588the same in their first @var{size} characters, the return value is zero.
589@end deftypefun
590
591Here are some examples showing the use of @code{strcmp} and @code{strncmp}.
592These examples assume the use of the ASCII character set. (If some
593other character set---say, EBCDIC---is used instead, then the glyphs
594are associated with different numeric codes, and the return values
595and ordering may differ.)
596
597@smallexample
598strcmp ("hello", "hello")
599 @result{} 0 /* @r{These two strings are the same.} */
600strcmp ("hello", "Hello")
601 @result{} 32 /* @r{Comparisons are case-sensitive.} */
602strcmp ("hello", "world")
603 @result{} -15 /* @r{The character @code{'h'} comes before @code{'w'}.} */
604strcmp ("hello", "hello, world")
605 @result{} -44 /* @r{Comparing a null character against a comma.} */
6952e59e 606strncmp ("hello", "hello, world", 5)
28f540f4
RM
607 @result{} 0 /* @r{The initial 5 characters are the same.} */
608strncmp ("hello, world", "hello, stupid world!!!", 5)
609 @result{} 0 /* @r{The initial 5 characters are the same.} */
610@end smallexample
611
1f205a47
UD
612@comment string.h
613@comment GNU
614@deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2})
615The @code{strverscmp} function compares the string @var{s1} against
616@var{s2}, considering them as holding indices/version numbers. Return
617value follows the same conventions as found in the @code{strverscmp}
618function. In fact, if @var{s1} and @var{s2} contain no digits,
619@code{strverscmp} behaves like @code{strcmp}.
620
621Basically, we compare strings normaly (character by character), until
622we find a digit in each string - then we enter a special comparison
623mode, where each sequence of digit is taken as a whole. If we reach the
624end of these two parts without noticing a difference, we return to the
625standard comparison mode. There are two types of numeric parts:
626"integral" and "fractionnal" (these laters begins with a '0'). The types
627of the numeric parts affect the way we sort them:
628
629@itemize @bullet
630@item
631integral/integral: we compare values as you would expect.
632
633@item
634fractionnal/integral: the fractionnal part is less than the integral one.
635Again, no surprise.
636
637@item
638fractionnal/fractionnal: the things become a bit more complex.
639if the common prefix contains only leading zeroes, the longest part is less
640than the other one; else the comparison behaves normaly.
641@end itemize
642
643@smallexample
644strverscmp ("no digit", "no digit")
645 @result{} 0 /* @r{same behaviour as strverscmp.} */
646strverscmp ("item#99", "item#100")
647 @result{} <0 /* @r{same prefix, but 99 < 100.} */
648strverscmp ("alpha1", "alpha001")
649 @result{} >0 /* @r{fractionnal part inferior to integral one.} */
650strverscmp ("part1_f012", "part1_f01")
651 @result{} >0 /* @r{two fractionnal parts.} */
652strverscmp ("foo.009", "foo.0")
653 @result{} <0 /* @r{idem, but with leading zeroes only.} */
654@end smallexample
655
656This function is especially usefull when dealing with filename sorting,
657because filenames frequently hold indices/version numbers.
658
659@code{strverscmp} is a GNU extension.
660@end deftypefun
661
28f540f4
RM
662@comment string.h
663@comment BSD
664@deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
665This is an obsolete alias for @code{memcmp}, derived from BSD.
666@end deftypefun
667
b4012b75 668@node Collation Functions
28f540f4
RM
669@section Collation Functions
670
671@cindex collating strings
672@cindex string collation functions
673
674In some locales, the conventions for lexicographic ordering differ from
675the strict numeric ordering of character codes. For example, in Spanish
676most glyphs with diacritical marks such as accents are not considered
677distinct letters for the purposes of collation. On the other hand, the
678two-character sequence @samp{ll} is treated as a single letter that is
679collated immediately after @samp{l}.
680
681You can use the functions @code{strcoll} and @code{strxfrm} (declared in
682the header file @file{string.h}) to compare strings using a collation
683ordering appropriate for the current locale. The locale used by these
684functions in particular can be specified by setting the locale for the
685@code{LC_COLLATE} category; see @ref{Locales}.
686@pindex string.h
687
688In the standard C locale, the collation sequence for @code{strcoll} is
689the same as that for @code{strcmp}.
690
691Effectively, the way these functions work is by applying a mapping to
692transform the characters in a string to a byte sequence that represents
693the string's position in the collating sequence of the current locale.
694Comparing two such byte sequences in a simple fashion is equivalent to
695comparing the strings with the locale's collating sequence.
696
697The function @code{strcoll} performs this translation implicitly, in
698order to do one comparison. By contrast, @code{strxfrm} performs the
699mapping explicitly. If you are making multiple comparisons using the
700same string or set of strings, it is likely to be more efficient to use
701@code{strxfrm} to transform all the strings just once, and subsequently
702compare the transformed strings with @code{strcmp}.
703
704@comment string.h
f65fd747 705@comment ISO
28f540f4
RM
706@deftypefun int strcoll (const char *@var{s1}, const char *@var{s2})
707The @code{strcoll} function is similar to @code{strcmp} but uses the
708collating sequence of the current locale for collation (the
709@code{LC_COLLATE} locale).
710@end deftypefun
711
712Here is an example of sorting an array of strings, using @code{strcoll}
713to compare them. The actual sort algorithm is not written here; it
714comes from @code{qsort} (@pxref{Array Sort Function}). The job of the
715code shown here is to say how to compare the strings while sorting them.
716(Later on in this section, we will show a way to do this more
717efficiently using @code{strxfrm}.)
718
719@smallexample
720/* @r{This is the comparison function used with @code{qsort}.} */
721
722int
723compare_elements (char **p1, char **p2)
724@{
725 return strcoll (*p1, *p2);
726@}
727
728/* @r{This is the entry point---the function to sort}
729 @r{strings using the locale's collating sequence.} */
730
731void
732sort_strings (char **array, int nstrings)
733@{
734 /* @r{Sort @code{temp_array} by comparing the strings.} */
735 qsort (array, sizeof (char *),
736 nstrings, compare_elements);
737@}
738@end smallexample
739
740@cindex converting string to collation order
741@comment string.h
f65fd747 742@comment ISO
28f540f4
RM
743@deftypefun size_t strxfrm (char *@var{to}, const char *@var{from}, size_t @var{size})
744The function @code{strxfrm} transforms @var{string} using the collation
745transformation determined by the locale currently selected for
746collation, and stores the transformed string in the array @var{to}. Up
747to @var{size} characters (including a terminating null character) are
748stored.
749
750The behavior is undefined if the strings @var{to} and @var{from}
751overlap; see @ref{Copying and Concatenation}.
752
753The return value is the length of the entire transformed string. This
754value is not affected by the value of @var{size}, but if it is greater
a5113b14
UD
755or equal than @var{size}, it means that the transformed string did not
756entirely fit in the array @var{to}. In this case, only as much of the
757string as actually fits was stored. To get the whole transformed
758string, call @code{strxfrm} again with a bigger output array.
28f540f4
RM
759
760The transformed string may be longer than the original string, and it
761may also be shorter.
762
763If @var{size} is zero, no characters are stored in @var{to}. In this
764case, @code{strxfrm} simply returns the number of characters that would
765be the length of the transformed string. This is useful for determining
766what size string to allocate. It does not matter what @var{to} is if
767@var{size} is zero; @var{to} may even be a null pointer.
768@end deftypefun
769
770Here is an example of how you can use @code{strxfrm} when
771you plan to do many comparisons. It does the same thing as the previous
772example, but much faster, because it has to transform each string only
773once, no matter how many times it is compared with other strings. Even
774the time needed to allocate and free storage is much less than the time
775we save, when there are many strings.
776
777@smallexample
778struct sorter @{ char *input; char *transformed; @};
779
780/* @r{This is the comparison function used with @code{qsort}}
781 @r{to sort an array of @code{struct sorter}.} */
782
783int
784compare_elements (struct sorter *p1, struct sorter *p2)
785@{
786 return strcmp (p1->transformed, p2->transformed);
787@}
788
789/* @r{This is the entry point---the function to sort}
790 @r{strings using the locale's collating sequence.} */
791
792void
793sort_strings_fast (char **array, int nstrings)
794@{
795 struct sorter temp_array[nstrings];
796 int i;
797
798 /* @r{Set up @code{temp_array}. Each element contains}
799 @r{one input string and its transformed string.} */
800 for (i = 0; i < nstrings; i++)
801 @{
802 size_t length = strlen (array[i]) * 2;
a5113b14
UD
803 char *transformed;
804 size_t transformed_lenght;
28f540f4
RM
805
806 temp_array[i].input = array[i];
807
a5113b14
UD
808 /* @r{First try a buffer perhaps big enough.} */
809 transformed = (char *) xmalloc (length);
810
811 /* @r{Transform @code{array[i]}.} */
812 transformed_length = strxfrm (transformed, array[i], length);
813
814 /* @r{If the buffer was not large enough, resize it}
815 @r{and try again.} */
816 if (transformed_length >= length)
28f540f4 817 @{
a5113b14
UD
818 /* @r{Allocate the needed space. +1 for terminating}
819 @r{@code{NUL} character.} */
820 transformed = (char *) xrealloc (transformed,
821 transformed_length + 1);
822
823 /* @r{The return value is not interesting because we know}
824 @r{how long the transformed string is.} */
825 (void) strxfrm (transformed, array[i], transformed_length + 1);
28f540f4 826 @}
a5113b14
UD
827
828 temp_array[i].transformed = transformed;
28f540f4
RM
829 @}
830
831 /* @r{Sort @code{temp_array} by comparing transformed strings.} */
832 qsort (temp_array, sizeof (struct sorter),
833 nstrings, compare_elements);
834
835 /* @r{Put the elements back in the permanent array}
836 @r{in their sorted order.} */
837 for (i = 0; i < nstrings; i++)
838 array[i] = temp_array[i].input;
839
840 /* @r{Free the strings we allocated.} */
841 for (i = 0; i < nstrings; i++)
842 free (temp_array[i].transformed);
843@}
844@end smallexample
845
846@strong{Compatibility Note:} The string collation functions are a new
b4012b75 847feature of @w{ISO C 89}. Older C dialects have no equivalent feature.
28f540f4 848
b4012b75 849@node Search Functions
28f540f4
RM
850@section Search Functions
851
852This section describes library functions which perform various kinds
853of searching operations on strings and arrays. These functions are
854declared in the header file @file{string.h}.
855@pindex string.h
856@cindex search functions (for strings)
857@cindex string search functions
858
859@comment string.h
f65fd747 860@comment ISO
28f540f4
RM
861@deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size})
862This function finds the first occurrence of the byte @var{c} (converted
863to an @code{unsigned char}) in the initial @var{size} bytes of the
864object beginning at @var{block}. The return value is a pointer to the
865located byte, or a null pointer if no match was found.
866@end deftypefun
867
868@comment string.h
f65fd747 869@comment ISO
28f540f4
RM
870@deftypefun {char *} strchr (const char *@var{string}, int @var{c})
871The @code{strchr} function finds the first occurrence of the character
872@var{c} (converted to a @code{char}) in the null-terminated string
873beginning at @var{string}. The return value is a pointer to the located
874character, or a null pointer if no match was found.
875
876For example,
877@smallexample
878strchr ("hello, world", 'l')
879 @result{} "llo, world"
880strchr ("hello, world", '?')
881 @result{} NULL
a5113b14 882@end smallexample
28f540f4
RM
883
884The terminating null character is considered to be part of the string,
885so you can use this function get a pointer to the end of a string by
886specifying a null character as the value of the @var{c} argument.
887@end deftypefun
888
889@comment string.h
890@comment BSD
891@deftypefun {char *} index (const char *@var{string}, int @var{c})
892@code{index} is another name for @code{strchr}; they are exactly the same.
5649a1d6
UD
893New code should always use @code{strchr} since this name is defined in
894@w{ISO C} while @code{index} is a BSD invention which never was available
895on @w{System V} derived systems.
28f540f4
RM
896@end deftypefun
897
898@comment string.h
f65fd747 899@comment ISO
28f540f4
RM
900@deftypefun {char *} strrchr (const char *@var{string}, int @var{c})
901The function @code{strrchr} is like @code{strchr}, except that it searches
902backwards from the end of the string @var{string} (instead of forwards
903from the front).
904
905For example,
906@smallexample
907strrchr ("hello, world", 'l')
908 @result{} "ld"
909@end smallexample
910@end deftypefun
911
912@comment string.h
913@comment BSD
914@deftypefun {char *} rindex (const char *@var{string}, int @var{c})
915@code{rindex} is another name for @code{strrchr}; they are exactly the same.
5649a1d6
UD
916New code should always use @code{strrchr} since this name is defined in
917@w{ISO C} while @code{rindex} is a BSD invention which never was available
918on @w{System V} derived systems.
28f540f4
RM
919@end deftypefun
920
921@comment string.h
f65fd747 922@comment ISO
28f540f4
RM
923@deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle})
924This is like @code{strchr}, except that it searches @var{haystack} for a
925substring @var{needle} rather than just a single character. It
926returns a pointer into the string @var{haystack} that is the first
927character of the substring, or a null pointer if no match was found. If
928@var{needle} is an empty string, the function returns @var{haystack}.
929
930For example,
931@smallexample
932strstr ("hello, world", "l")
933 @result{} "llo, world"
934strstr ("hello, world", "wo")
935 @result{} "world"
936@end smallexample
937@end deftypefun
938
939
940@comment string.h
941@comment GNU
63551311 942@deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len})
28f540f4
RM
943This is like @code{strstr}, but @var{needle} and @var{haystack} are byte
944arrays rather than null-terminated strings. @var{needle-len} is the
945length of @var{needle} and @var{haystack-len} is the length of
946@var{haystack}.@refill
947
948This function is a GNU extension.
949@end deftypefun
950
951@comment string.h
f65fd747 952@comment ISO
28f540f4
RM
953@deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset})
954The @code{strspn} (``string span'') function returns the length of the
955initial substring of @var{string} that consists entirely of characters that
956are members of the set specified by the string @var{skipset}. The order
957of the characters in @var{skipset} is not important.
958
959For example,
960@smallexample
961strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz")
962 @result{} 5
963@end smallexample
964@end deftypefun
965
966@comment string.h
f65fd747 967@comment ISO
28f540f4
RM
968@deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset})
969The @code{strcspn} (``string complement span'') function returns the length
970of the initial substring of @var{string} that consists entirely of characters
971that are @emph{not} members of the set specified by the string @var{stopset}.
972(In other words, it returns the offset of the first character in @var{string}
973that is a member of the set @var{stopset}.)
974
975For example,
976@smallexample
977strcspn ("hello, world", " \t\n,.;!?")
978 @result{} 5
979@end smallexample
980@end deftypefun
981
982@comment string.h
f65fd747 983@comment ISO
28f540f4
RM
984@deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset})
985The @code{strpbrk} (``string pointer break'') function is related to
986@code{strcspn}, except that it returns a pointer to the first character
987in @var{string} that is a member of the set @var{stopset} instead of the
988length of the initial substring. It returns a null pointer if no such
989character from @var{stopset} is found.
990
991@c @group Invalid outside the example.
992For example,
993
994@smallexample
995strpbrk ("hello, world", " \t\n,.;!?")
996 @result{} ", world"
997@end smallexample
998@c @end group
999@end deftypefun
1000
b4012b75 1001@node Finding Tokens in a String
28f540f4
RM
1002@section Finding Tokens in a String
1003
28f540f4
RM
1004@cindex tokenizing strings
1005@cindex breaking a string into tokens
1006@cindex parsing tokens from a string
1007It's fairly common for programs to have a need to do some simple kinds
1008of lexical analysis and parsing, such as splitting a command string up
1009into tokens. You can do this with the @code{strtok} function, declared
1010in the header file @file{string.h}.
1011@pindex string.h
1012
1013@comment string.h
f65fd747 1014@comment ISO
28f540f4
RM
1015@deftypefun {char *} strtok (char *@var{newstring}, const char *@var{delimiters})
1016A string can be split into tokens by making a series of calls to the
1017function @code{strtok}.
1018
1019The string to be split up is passed as the @var{newstring} argument on
1020the first call only. The @code{strtok} function uses this to set up
1021some internal state information. Subsequent calls to get additional
1022tokens from the same string are indicated by passing a null pointer as
1023the @var{newstring} argument. Calling @code{strtok} with another
1024non-null @var{newstring} argument reinitializes the state information.
1025It is guaranteed that no other library function ever calls @code{strtok}
1026behind your back (which would mess up this internal state information).
1027
1028The @var{delimiters} argument is a string that specifies a set of delimiters
1029that may surround the token being extracted. All the initial characters
1030that are members of this set are discarded. The first character that is
1031@emph{not} a member of this set of delimiters marks the beginning of the
1032next token. The end of the token is found by looking for the next
1033character that is a member of the delimiter set. This character in the
1034original string @var{newstring} is overwritten by a null character, and the
1035pointer to the beginning of the token in @var{newstring} is returned.
1036
1037On the next call to @code{strtok}, the searching begins at the next
1038character beyond the one that marked the end of the previous token.
1039Note that the set of delimiters @var{delimiters} do not have to be the
1040same on every call in a series of calls to @code{strtok}.
1041
1042If the end of the string @var{newstring} is reached, or if the remainder of
1043string consists only of delimiter characters, @code{strtok} returns
1044a null pointer.
1045@end deftypefun
1046
1047@strong{Warning:} Since @code{strtok} alters the string it is parsing,
1048you always copy the string to a temporary buffer before parsing it with
1049@code{strtok}. If you allow @code{strtok} to modify a string that came
1050from another part of your program, you are asking for trouble; that
1051string may be part of a data structure that could be used for other
1052purposes during the parsing, when alteration by @code{strtok} makes the
1053data structure temporarily inaccurate.
1054
1055The string that you are operating on might even be a constant. Then
1056when @code{strtok} tries to modify it, your program will get a fatal
1057signal for writing in read-only memory. @xref{Program Error Signals}.
1058
1059This is a special case of a general principle: if a part of a program
1060does not have as its purpose the modification of a certain data
1061structure, then it is error-prone to modify the data structure
1062temporarily.
1063
1064The function @code{strtok} is not reentrant. @xref{Nonreentrancy}, for
1065a discussion of where and why reentrancy is important.
1066
1067Here is a simple example showing the use of @code{strtok}.
1068
1069@comment Yes, this example has been tested.
1070@smallexample
1071#include <string.h>
1072#include <stddef.h>
1073
1074@dots{}
1075
5649a1d6 1076const char string[] = "words separated by spaces -- and, punctuation!";
28f540f4 1077const char delimiters[] = " .,;:!-";
5649a1d6 1078char *token, *cp;
28f540f4
RM
1079
1080@dots{}
1081
5649a1d6
UD
1082cp = strdupa (string); /* Make writable copy. */
1083token = strtok (cp, delimiters); /* token => "words" */
28f540f4
RM
1084token = strtok (NULL, delimiters); /* token => "separated" */
1085token = strtok (NULL, delimiters); /* token => "by" */
1086token = strtok (NULL, delimiters); /* token => "spaces" */
1087token = strtok (NULL, delimiters); /* token => "and" */
1088token = strtok (NULL, delimiters); /* token => "punctuation" */
1089token = strtok (NULL, delimiters); /* token => NULL */
1090@end smallexample
a5113b14
UD
1091
1092The GNU C library contains two more functions for tokenizing a string
1093which overcome the limitation of non-reentrancy.
1094
1095@comment string.h
1096@comment POSIX
1097@deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr})
1098Just like @code{strtok} this function splits the string into several
1099tokens which can be accessed be successive calls to @code{strtok_r}.
1100The difference is that the information about the next token is not set
1101up in some internal state information. Instead the caller has to
1102provide another argument @var{save_ptr} which is a pointer to a string
1103pointer. Calling @code{strtok_r} with a null pointer for
1104@var{newstring} and leaving @var{save_ptr} between the calls unchanged
1105does the job without limiting reentrancy.
1106
5649a1d6 1107This function is defined in POSIX-1 and can be found on many systems
a5113b14
UD
1108which support multi-threading.
1109@end deftypefun
1110
1111@comment string.h
1112@comment BSD
1113@deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter})
1114A second reentrant approach is to avoid the additional first argument.
1115The initialization of the moving pointer has to be done by the user.
1116Successive calls of @code{strsep} move the pointer along the tokens
1117separated by @var{delimiter}, returning the address of the next token
1118and updating @var{string_ptr} to point to the beginning of the next
1119token.
1120
1121This function was introduced in 4.3BSD and therefore is widely available.
1122@end deftypefun
1123
1124Here is how the above example looks like when @code{strsep} is used.
1125
1126@comment Yes, this example has been tested.
1127@smallexample
1128#include <string.h>
1129#include <stddef.h>
1130
1131@dots{}
1132
5649a1d6 1133const char string[] = "words separated by spaces -- and, punctuation!";
a5113b14
UD
1134const char delimiters[] = " .,;:!-";
1135char *running;
1136char *token;
1137
1138@dots{}
1139
5649a1d6 1140running = strdupa (string);
a5113b14
UD
1141token = strsep (&running, delimiters); /* token => "words" */
1142token = strsep (&running, delimiters); /* token => "separated" */
1143token = strsep (&running, delimiters); /* token => "by" */
1144token = strsep (&running, delimiters); /* token => "spaces" */
1145token = strsep (&running, delimiters); /* token => "and" */
1146token = strsep (&running, delimiters); /* token => "punctuation" */
1147token = strsep (&running, delimiters); /* token => NULL */
1148@end smallexample
b4012b75
UD
1149
1150@node Encode Binary Data
1151@section Encode Binary Data
1152
1153To store or transfer binary data in environments which only support text
1154one has to encode the binary data by mapping the input bytes to
1155characters in the range allowed for storing or transfering. SVID
1156systems (and nowadays XPG compliant systems) have such a function in the
1157C library.
1158
1159@comment stdlib.h
1160@comment XPG
1161@deftypefun {char *} l64a (long int @var{n})
1162This function encodes an input value with 32 bits using characters from
1163the basic character set. Groups of 6 bits are encoded using the
1164following table:
1165
1166@multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx}
1167@item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7
1168@item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1}
1169 @tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5}
1170@item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9}
1171 @tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D}
1172@item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H}
1173 @tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L}
1174@item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P}
1175 @tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T}
1176@item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X}
1177 @tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b}
1178@item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f}
1179 @tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j}
1180@item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n}
1181 @tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r}
1182@item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v}
1183 @tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z}
1184@end multitable
1185
1186The function returns a pointer to a static buffer which contains the
1187string representing of the encoding of @var{n}. To encoded a series of
1188bytes the use should append the new string to the destination buffer.
1189@emph{Warning:} Since a static buffer is used this function should not
5649a1d6 1190be used in multi-threaded programs. There is no thread-safe alternative
b4012b75
UD
1191to this function in the C library.
1192@end deftypefun
1193
5649a1d6
UD
1194Alone the @code{l64a} function is not usable. To encode arbitrary
1195sequences of bytes one needs some more code and this could look like
1196this:
1197
1198@smallexample
1199char *
1200encode (const void *buf, size_t len)
1201@{
1202 /* @r{We know in advance how long the buffer has to be.} */
1203 unsigned char *in = (unsigned char *) buf;
1204 char *out = malloc (6 + ((len + 3) / 4) * 6 + 1);
1205 char *cp = out;
1206
1207 /* @r{Encode the length.} */
1208 memcpy (cp, l64a (len), 6);
1209 cp += 6;
1210
1211 while (len > 3)
1212 @{
1213 unsigned long int n = *in++;
1214 n = (n << 8) | *in++;
1215 n = (n << 8) | *in++;
1216 n = (n << 8) | *in++;
1217 len -= 4;
1218 /* @r{Using `htonl' is necessary so that the data can be}
1219 @r{decoded even on machines with different byte order.} */
1220 memcpy (cp, l64a (htonl (n)), 6);
1221 cp += 6;
1222 @}
1223 if (len > 0)
1224 @{
1225 unsigned long int n = *in++;
1226 if (--len > 0)
1227 @{
1228 n = (n << 8) | *in++;
1229 if (--len > 0)
1230 n = (n << 8) | *in;
1231 @}
1232 memcpy (cp, l64a (htonl (n)), 6);
1233 cp += 6;
1234 @}
1235 *cp = '\0';
1236 return out;
1237@}
1238@end smallexample
1239
1240It is strange that the library does not provide the complete
1241functionality needed but so be it. There are some other encoding
1242methods which are much more widely used (UU encoding, Base64 encoding).
1243Generally, it is better to use one of these encodings.
1244
b4012b75
UD
1245To decode data produced with @code{l64a} the following function should be
1246used.
1247
5649a1d6
UD
1248@comment stdlib.h
1249@comment XPG
b4012b75
UD
1250@deftypefun {long int} a64l (const char *@var{string})
1251The parameter @var{string} should contain a string which was produced by
1252a call to @code{l64a}. The function processes the next 6 characters and
1253decodes the characters it finds according to the table above.
1254Characters not in the conversion table are simply ignored. This is
1255useful for breaking the information in lines in which case the end of
1256line characters are simply ignored.
1257
1258The decoded number is returned at the end as a @code{long int} value.
1259Consecutive calls to this function are possible but the caller must make
1260sure the buffer pointer is update after each call to @code{a64l} since
1261this function does not modify the buffer pointer. Every call consumes 6
1262characters.
1263@end deftypefun
b13927da
UD
1264
1265@node Argz and Envz Vectors
1266@section Argz and Envz Vectors
1267
5649a1d6 1268@cindex argz vectors (string vectors)
b13927da
UD
1269@cindex string vectors, null-character separated
1270@cindex argument vectors, null-character separated
1271@dfn{argz vectors} are vectors of strings in a contiguous block of
1272memory, each element separated from its neighbors by null-characters
1273(@code{'\0'}).
1274
5649a1d6 1275@cindex envz vectors (environment vectors)
b13927da
UD
1276@cindex environment vectors, null-character separated
1277@dfn{Envz vectors} are an extension of argz vectors where each element is a
5649a1d6 1278name-value pair, separated by a @code{'='} character (as in a Unix
b13927da
UD
1279environment).
1280
1281@menu
1282* Argz Functions:: Operations on argz vectors.
1283* Envz Functions:: Additional operations on environment vectors.
1284@end menu
1285
1286@node Argz Functions, Envz Functions, , Argz and Envz Vectors
1287@subsection Argz Functions
1288
1289Each argz vector is represented by a pointer to the first element, of
1290type @code{char *}, and a size, of type @code{size_t}, both of which can
1291be initialized to @code{0} to represent an empty argz vector. All argz
1292functions accept either a pointer and a size argument, or pointers to
1293them, if they will be modified.
1294
1295The argz functions use @code{malloc}/@code{realloc} to allocate/grow
1296argz vectors, and so any argz vector creating using these functions may
1297be freed by using @code{free}; conversely, any argz function that may
1298grow a string expects that string to have been allocated using
1299@code{malloc} (those argz functions that only examine their arguments or
1300modify them in place will work on any sort of memory).
1301@xref{Unconstrained Allocation}.
1302
1303All argz functions that do memory allocation have a return type of
1304@code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an
1305allocation error occurs.
1306
1307@pindex argz.h
1308These functions are declared in the standard include file @file{argz.h}.
1309
5649a1d6
UD
1310@comment argz.h
1311@comment GNU
b13927da 1312@deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len})
5649a1d6 1313The @code{argz_create} function converts the Unix-style argument vector
b13927da
UD
1314@var{argv} (a vector of pointers to normal C strings, terminated by
1315@code{(char *)0}; @pxref{Program Arguments}) into an argz vector with
1316the same elements, which is returned in @var{argz} and @var{argz_len}.
1317@end deftypefun
1318
5649a1d6
UD
1319@comment argz.h
1320@comment GNU
b13927da
UD
1321@deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len})
1322The @code{argz_create_sep} function converts the null-terminated string
1323@var{string} into an argz vector (returned in @var{argz} and
1324@var{argz_len}) by splitting it into elements at every occurance of the
1325character @var{sep}.
1326@end deftypefun
1327
5649a1d6
UD
1328@comment argz.h
1329@comment GNU
b13927da
UD
1330@deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{arg_len})
1331Returns the number of elements in the argz vector @var{argz} and
1332@var{argz_len}.
1333@end deftypefun
1334
5649a1d6
UD
1335@comment argz.h
1336@comment GNU
b13927da
UD
1337@deftypefun {void} argz_extract (char *@var{argz}, size_t @var{argz_len}, char **@var{argv})
1338The @code{argz_extract} function converts the argz vector @var{argz} and
5649a1d6 1339@var{argz_len} into a Unix-style argument vector stored in @var{argv},
b13927da
UD
1340by putting pointers to every element in @var{argz} into successive
1341positions in @var{argv}, followed by a terminator of @code{0}.
1342@var{Argv} must be pre-allocated with enough space to hold all the
1343elements in @var{argz} plus the terminating @code{(char *)0}
1344(@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)}
1345bytes should be enough). Note that the string pointers stored into
1346@var{argv} point into @var{argz}---they are not copies---and so
1347@var{argz} must be copied if it will be changed while @var{argv} is
1348still active. This function is useful for passing the elements in
1349@var{argz} to an exec function (@pxref{Executing a File}).
1350@end deftypefun
1351
5649a1d6
UD
1352@comment argz.h
1353@comment GNU
b13927da
UD
1354@deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep})
1355The @code{argz_stringify} converts @var{argz} into a normal string with
1356the elements separated by the character @var{sep}, by replacing each
1357@code{'\0'} inside @var{argz} (except the last one, which terminates the
1358string) with @var{sep}. This is handy for printing @var{argz} in a
1359readable manner.
1360@end deftypefun
1361
5649a1d6
UD
1362@comment argz.h
1363@comment GNU
b13927da
UD
1364@deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str})
1365The @code{argz_add} function adds the string @var{str} to the end of the
1366argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and
1367@code{*@var{argz_len}} accordingly.
1368@end deftypefun
1369
5649a1d6
UD
1370@comment argz.h
1371@comment GNU
b13927da
UD
1372@deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim})
1373The @code{argz_add_sep} function is similar to @code{argz_add}, but
1374@var{str} is split into separate elements in the result at occurances of
1375the character @var{delim}. This is useful, for instance, for
5649a1d6 1376adding the components of a Unix search path to an argz vector, by using
b13927da
UD
1377a value of @code{':'} for @var{delim}.
1378@end deftypefun
1379
5649a1d6
UD
1380@comment argz.h
1381@comment GNU
b13927da
UD
1382@deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len})
1383The @code{argz_append} function appends @var{buf_len} bytes starting at
1384@var{buf} to the argz vector @code{*@var{argz}}, reallocating
1385@code{*@var{argz}} to accommodate it, and adding @var{buf_len} to
1386@code{*@var{argz_len}}.
1387@end deftypefun
1388
5649a1d6
UD
1389@comment argz.h
1390@comment GNU
b13927da
UD
1391@deftypefun {error_t} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry})
1392If @var{entry} points to the beginning of one of the elements in the
1393argz vector @code{*@var{argz}}, the @code{argz_delete} function will
1394remove this entry and reallocate @code{*@var{argz}}, modifying
1395@code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as
1396destructive argz functions usually reallocate their argz argument,
1397pointers into argz vectors such as @var{entry} will then become invalid.
1398@end deftypefun
1399
5649a1d6
UD
1400@comment argz.h
1401@comment GNU
b13927da
UD
1402@deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry})
1403The @code{argz_insert} function inserts the string @var{entry} into the
1404argz vector @code{*@var{argz}} at a point just before the existing
1405element pointed to by @var{before}, reallocating @code{*@var{argz}} and
1406updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before}
1407is @code{0}, @var{entry} is added to the end instead (as if by
1408@code{argz_add}). Since the first element is in fact the same as
1409@code{*@var{argz}}, passing in @code{*@var{argz}} as the value of
1410@var{before} will result in @var{entry} being inserted at the beginning.
1411@end deftypefun
1412
5649a1d6
UD
1413@comment argz.h
1414@comment GNU
b13927da
UD
1415@deftypefun {char *} argz_next (char *@var{argz}, size_t @var{argz_len}, const char *@var{entry})
1416The @code{argz_next} function provides a convenient way of iterating
1417over the elements in the argz vector @var{argz}. It returns a pointer
1418to the next element in @var{argz} after the element @var{entry}, or
1419@code{0} if there are no elements following @var{entry}. If @var{entry}
1420is @code{0}, the first element of @var{argz} is returned.
1421
1422This behavior suggests two styles of iteration:
1423
1424@smallexample
1425 char *entry = 0;
1426 while ((entry = argz_next (@var{argz}, @var{argz_len}, entry)))
1427 @var{action};
1428@end smallexample
1429
1430(the double parentheses are necessary to make some C compilers shut up
1431about what they consider a questionable @code{while}-test) and:
1432
1433@smallexample
1434 char *entry;
1435 for (entry = @var{argz};
1436 entry;
1437 entry = argz_next (@var{argz}, @var{argz_len}, entry))
1438 @var{action};
1439@end smallexample
1440
1441Note that the latter depends on @var{argz} having a value of @code{0} if
1442it is empty (rather than a pointer to an empty block of memory); this
1443invariant is maintained for argz vectors created by the functions here.
1444@end deftypefun
1445
d705269e
UD
1446@comment argz.h
1447@comment GNU
1448@deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}})
1449Replace any occurances of the string @var{str} in @var{argz} with
1450@var{with}, reallocating @var{argz} as necessary. If
1451@var{replace_count} is non-zero, @code{*@var{replace_count}} will be
1452incremented by number of replacements performed.
1453@end deftypefun
1454
b13927da
UD
1455@node Envz Functions, , Argz Functions, Argz and Envz Vectors
1456@subsection Envz Functions
1457
1458Envz vectors are just argz vectors with additional constraints on the form
1459of each element; as such, argz functions can also be used on them, where it
1460makes sense.
1461
1462Each element in an envz vector is a name-value pair, separated by a @code{'='}
1463character; if multiple @code{'='} characters are present in an element, those
1464after the first are considered part of the value, and treated like all other
1465non-@code{'\0'} characters.
1466
1467If @emph{no} @code{'='} characters are present in an element, that element is
1468considered the name of a ``null'' entry, as distinct from an entry with an
1469empty value: @code{envz_get} will return @code{0} if given the name of null
1470entry, whereas an entry with an empty value would result in a value of
1471@code{""}; @code{envz_entry} will still find such entries, however. Null
1472entries can be removed with @code{envz_strip} function.
1473
1474As with argz functions, envz functions that may allocate memory (and thus
1475fail) have a return type of @code{error_t}, and return either @code{0} or
1476@code{ENOMEM}.
1477
1478@pindex envz.h
1479These functions are declared in the standard include file @file{envz.h}.
1480
5649a1d6
UD
1481@comment envz.h
1482@comment GNU
b13927da
UD
1483@deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
1484The @code{envz_entry} function finds the entry in @var{envz} with the name
1485@var{name}, and returns a pointer to the whole entry---that is, the argz
1486element which begins with @var{name} followed by a @code{'='} character. If
1487there is no entry with that name, @code{0} is returned.
1488@end deftypefun
1489
5649a1d6
UD
1490@comment envz.h
1491@comment GNU
b13927da
UD
1492@deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
1493The @code{envz_get} function finds the entry in @var{envz} with the name
1494@var{name} (like @code{envz_entry}), and returns a pointer to the value
1495portion of that entry (following the @code{'='}). If there is no entry with
1496that name (or only a null entry), @code{0} is returned.
1497@end deftypefun
1498
5649a1d6
UD
1499@comment envz.h
1500@comment GNU
b13927da
UD
1501@deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value})
1502The @code{envz_add} function adds an entry to @code{*@var{envz}}
1503(updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name
1504@var{name}, and value @var{value}. If an entry with the same name
1505already exists in @var{envz}, it is removed first. If @var{value} is
1506@code{0}, then the new entry will the special null type of entry
1507(mentioned above).
1508@end deftypefun
1509
5649a1d6
UD
1510@comment envz.h
1511@comment GNU
b13927da
UD
1512@deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override})
1513The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz},
1514as if with @code{envz_add}, updating @code{*@var{envz}} and
1515@code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2}
1516will supersede those with the same name in @var{envz}, otherwise not.
1517
1518Null entries are treated just like other entries in this respect, so a null
1519entry in @var{envz} can prevent an entry of the same name in @var{envz2} from
1520being added to @var{envz}, if @var{override} is false.
1521@end deftypefun
1522
5649a1d6
UD
1523@comment envz.h
1524@comment GNU
b13927da
UD
1525@deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len})
1526The @code{envz_strip} function removes any null entries from @var{envz},
1527updating @code{*@var{envz}} and @code{*@var{envz_len}}.
1528@end deftypefun